❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdayZero Day Initiative - Blog

CVE-2022-31696: An Analysis of a VMware ESXi TCP Socket Keepalive Type Confusion LPE

22 June 2023 at 16:00

Last year we published our patch gap analysis of ESXi’s TCP/IP stack, which is forked from FreeBSD 8.2. While our focus was mainly on missing FreeBSD patches in ESXi, we also came across a type confusion bug in code introduced by VMware. This blog post details a vulnerability I discovered in ESXi’s implementation of the setsockopt system call that could lead to a sandbox escape. The vulnerability was assigned CVE-2022-31696 and disclosed as part of the advisory VMSA-2022-003. Additionally, I also explore ESXi’s kernel heap allocator and weaknesses in existing kernel mitigations.

For information regarding the initial analysis of the TCP/IP kernel module, VMkernel debug symbols, and porting type information from FreeBSD to ESXi, it is recommend to read our earlier analysis.

Comparing setsockopt in FreeBSD vs ESXi

First, let’s take a look at how ESXi 6.7 build 19195723’s setsockopt implementation differs from that of FreeBSD. Of particular note are differences in the handling of the SO_KEEPALIVE socket option. This option enables keep-alive messages on connection-oriented sockets.

Figure 1 - FreeBSD 8.2 code (left) vs ESXi 6.7 19195723 IDA Pro decompiled code (right)

In BSD systems, the TCP timer functions are registered and executed through the callout facility. ESXi added code here to check if there is an active callout for the keep-alive, by calling tcp_timer_active. If so, it resets the TCP keepidle to a newer value using tcp_timer_activate. The keepidle value determines how long TCP should wait before sending out the first keep-alive probe.

Type confusion vulnerability in SO_KEEPALIVE handling

What’s the issue with this newly added code? To understand this better, let’s take another look at the decompiled code with type information added.

Figure 2 - Protocol PCB type casted without validation

The Internet PCB structure inpcb has a pointer inp_ppcb that can point to either a TCP PCB (tcpcb) or a UDP PCB (udpcb) structure depending on the protocol. The vulnerable code shown here always type casts the pointer to tcpcb irrespective of the socket type. If the SO_KEEPALIVE option is set for a UDP socket, inp_ppcb is a pointer to a udpcb structure, but here it is casted to tcpcb structure due to the lack of validation. When the code further accesses the tcp_timer structure variable t_timers at offset 0x20, the access is out of bounds because the udpcb structure is only 0x10 bytes in size.

Figure 3 - The Kernel Data Structures for TCP and UDP Protocol Control Blocks as seen in FreeBSD

Triggering the Vulnerable Code and PSOD

In order to trigger the vulnerable code path, we need to create a UDP socket and then manipulate the socket using the setsockopt system call. Specifically, it is necessary to set the SO_KEEPALIVE option. Since ESXi does not package any build tools, we must compile the PoC statically in a Linux machine and then transfer the binary to ESXi for execution. Running the PoC will immediately trigger the Purple Screen of Death (PSOD). To trigger the bug from a sandboxed process, an attacker must be able to invoke the setsockopt system call on an existing UDP socket descriptor or create a new one for that purpose. Below is the PoC to trigger the bug:

The resulting PSOD:

Figure 4 - ESXi PSOD on TCP Timers code when running the PoC

Kernel Debug Setup for ESXi

While ESXi supports a local VMkernel debugger, VMKDBG, which can be used to inspect the PSOD, it is not as flexible as GDB. The GDB setup detailed in Attacking VMware NSX (Slides 34 – 37 in the PDF) is an excellent reference for getting started with ESXi kernel debugging. In summary, we used the GDB stubs feature provided by VMware to debug ESXi running as a guest VM on Fusion. We also disabled kASLR for ease of debugging. Since the ESXi kernel modules have symbols, it is possible to use GDB’s add-symbol-file command to load symbol information given an executable file and its base address in memory. The module base address and the path information required for add-symbol-file can be fetched using the esxcfg-info command as seen below:

While the file path to the tcpip module can be seen in the output, there is no file path entry for the VMkernel module. The VMkernel module with symbol information is found as a gzip-compressed file k.b00 within the bootbank directory of ESXi. Alternatively, to obtain the VMkernel executable with not only symbols but also type information, one can download it from the VMware WorkBench. However, in this case, the VMware WorkBench does not have debug information for the version of ESXi currently under analysis.

Once the kernel modules are available and their base addresses are known, connect the debugger and run the PoC to trigger the crash. The exception triggered may not be caught by GDB. In that case, ESXi will continue running, executing the handler for Interrupt 13 - General Protection Fault (GP), which is responsible for collecting fault information and core dumps. Should this occur, wait for the PSOD and then hit β€œControl + C” (SIGINT) to break into GDB. In the debug session shown below, you can see the symbolized stack trace obtained using GDB’s add-symbol-file command. tcp_timer_active was the last function to be executed before calling the interrupt handler. Therefore, choose the relevant frame (12 in this case) and inspect the program state. The register RAX was found to be loaded with some garbage value, leading to an invalid memory access during the execution of the mov eax,DWORD PTR [rax+0x38] instruction.

Analyzing the Exploitability of the Type Confusion Bug

Since the debug setup with symbols is now ready, let’s take another look at the crash by setting breakpoints and stepping through the code. The tcpcb structure can be inspected during the call to tcp_timer_active function, which takes it as the first argument. However, the type information is still missing within GDB. As a workaround, it is possible to use the type information from the FreeBSD kernel for debugging ESXi’s tcpip kernel module. Though some of the structure definitions vary somewhat between the FreeBSD and ESXi TCP/IP stacks, they have substantial similarities. Once again, GDB’s add-symbol-file command comes in handy. To import all structure definitions, use the add-symbol-file command but with address set to 0. Similarly, type information for VMkernel can be imported from an older version of ESXi vmkernel-visor (6.7-14320388) available through the VMware WorkBench.

Unlike the previous debug session, where the crash happened when accessing a garbage pointer, this time the t_timers variable is pointing to NULL and will result in a NULL pointer dereference. To better understand this behavior, it is necessary to examine the heap allocator used by ESXi. After some analysis on the vmkernel-visor executable, it was noticed that ESXi’s kernel heap allocator is based on Doug Lea's Malloc:

In dlmalloc, the malloc chunk headers are 32 bytes in size. The structure definition is as follows:

Figure 5 - Chunk header of Doug Lea's Malloc

The prev_foot field holds the size of previous chunk if free, whereas the head field holds the size of the current chunk. In addition to the size, the head field also holds two flag bits: PINUSE_BIT and CINUSE_BIT. The PINUSE_BIT (lowest order bit) marks if the previous chunk is in use. The CINUSE_BIT (second lowest bit) marks if the current chunk is in use. The forward fd and backward bk pointer fields are used only when the chunk is free. Otherwise the chunk data starts immediately after the head field. Now, looking back at the memory pointed to by RDI, it can be inferred that it is the data region of a dlmalloc chunk of size 32 bytes, which can hold 16 bytes of data (the udpcb structure).

As explained above, when fetching the t_timers pointer from offset +0x20, it accesses data from the adjacent chunk. This is because the allocated udpcb structure is smaller than the offset of t_timers in the tcpcb structure. Since the adjacent chunk may hold unrelated data, its contents are unpredictable (unless greater care is taken to first groom the heap). That is why the PoC crash will sometimes manifest as a NULL pointer deference and sometimes as a different kind of invalid access. Here is what the access of t_timers looks like:

Figure 6 - State of heap memory during type confusion

Assuming control of the t_timers pointer, it is possible to corrupt arbitrary memory during the write operations within the callout_stop or callout_reset functions. Alternatively, if there is control over the memory pointed to by the t_timers pointer, it is possible to control the subsequent access of the tcp_timer structure. Specifically, tcp_timer contains a callout substructure scheduled for execution by tcp_timer_activate. By targeting the c_func function pointer we can gain control of the instruction pointer. Since ESXi does not support Supervisor Mode Access Prevention (SMAP), t_timers could in fact point to user space memory instead of controlled memory in kernel space.

Figure 7 - TCP timers and Callout data structures

Note that structures such as tcpcb, tcp_timer and callout in ESXi are slightly different from the corresponding structures in FreeBSD. By comparing the decompiled ESXi code against FreeBSD 8.2, I identified new structure elements and adjusted the offsets of existing fields. For example, some global variables in FreeBSD such as tcp_keepidle, tcp_keepintvl and tcp_keepcnt were turned into fields of the tcp_timer structure in ESXi. This can be recognized by analyzing the tcp_timer_keep callout function.

In addition to lack of support for SMAP, the kASLR of kernel modules was also found to be weak. While the text base address showed significant randomization, the data segment base address did not, with as little as 1 bit of entropy in some cases. Here are the load addresses of the tcpip kernel module across multiple reboots:

Patch Analysis

To understand the fix for the type confusion bug, a patch diff was performed against ESXi 6.7 Build 20497097 (now at end-of-life). Instead of setting up the newer version of ESXi, you can just download the relevant VIB (vSphere Installation Bundle) from the ESXi Patch Tracker. In the case of tcpip, the kernel module is found within the ESXi base system esx-base VIB. This information can be queried using the esxcli command:

The diff between tcpip kernel modules from build 19195723 and 20497097 revealed an additional check added to sosetopt function. The code now checks whether the socket protocol is IPPROTO_TCP before proceeding with TCP timers. There is no explicit check to prevent a raw socket from entering the code path, but inp_ppcb is initialized only for socket types SOCK_STREAM and SOCK_DGRAM but not for type SOCK_RAW. Therefore, the timer code is reachable only when the socket type is SOCK_STREAM and the protocol is IPPROTO_TCP.

Figure 8 - Vulnerable code (left) vs Fixed code (right)

Interestingly, in 2012, the Linux kernel fixed a very similar issue in the handling of RAW sockets - CVE-2012-6657 Kernel: net: guard tcp_set_keepalive against crash:

Figure 9 - Linux patch for CVE-2012-6657

Conclusion

Historically, kernel privilege escalation vulnerabilities in ESXi have not been frequently seen. ESXi has no login shell for low-privileged users, so that entry point is eliminated. On the other hand, user-mode daemons such as SLPD run with the highest privileges (i.e., superDom), so in the case of compromise of a daemon, there is no need for further escalation. For these reasons, ESXi kernel bugs have not been a popular topic of discussion, at least not publicly. However, the situation is changing. SLP is no longer enabled by default, and ESXi is now sandboxing more and more user-mode processes. This makes us believe ESXi kernel bugs will become important in the coming years. For anyone interested, I hope this blog post will give some ideas to get started on the topic, and I’ll continue blogging about any significant findings in the future. Until then, you can follow me @renorobertr and follow the team on Twitter, Mastodon, LinkedIn, or Instagram for the latest in exploit techniques and security patches.

Bash Privileged-Mode Vulnerabilities in Parallels Desktop and CDPATH Handling in MacOS

6 April 2023 at 16:08

In the last few years, we have seen multiple vulnerabilities in Parallels Desktop leading to virtual machine escapes. Interested readers can check our previous blog posts about vulnerabilities across interfaces such as RDPMC hypercalls, the Parallels ToolGate, and the VGA virtual device. This post explores another set of issues we received last year - local privilege escalations through setuid root binaries.

Parallels Desktop has a couple of setuid binaries: prl_update_helper and Parallels Service. Both binaries run with root privileges and both invoke bash scripts to run commands with the privileges of root. For such use cases, bash specifically provides a privileged mode using the β€œ-p” flag. Parallels Desktop prior to version 18.1.0 does not take advantage of bash privileged mode, nor does it filter untrusted environment variables. This leads to local privilege escalation.

In the case of Parallels Desktop, the setuid binaries use the setuid() system call to set the real user identifier to that of the effective user identifier. The problem with this implementation is that sensitive environment variables such as BASH_ENV, ENV, SHELLOPTS, BASHOPTS, CDPATH, GLOBIGNORE, and other shell functions are processed by bash. This is because bash is not aware of the setuid or setgid execution and trusts its environment. A local unprivileged user with control over environment variables can exploit this bug to execute code with the privileges of root.

The Bash Privileged Mode

Bash shell drops privileges when started with the effective user identifier not equal to that of the real user identifier. The effective user identifier is reset by setting it to the value of the real user identifier. The same is also applicable for group identifiers. In privileged mode, bash does not drop the effective privileges and ignores sensitive variables and shell functions from the environment. Here’s the relevant source code in bash that can be found in shell.c file:

The functions of interest here are uidget and disable_priv_mode. The uidget function sets running_setuid if bash is launched from a setuid/setgid process. Later in the code, if privileged mode is not specified, the setuid and setgid calls are used to drop privileges to that of the real identifiers:

Note that, since the Bourne shell sh is linked to bash in macOS, the Apple bash code for invoking disable_priv_mode is slightly different from that of the upstream version. Interested readers can search for the __APPLE__ macro to narrow down changes made to the upstream version of bash by Apple.

The other functions of interest in the bash startup code are run_startup_files and shell_initialize, as they handle information passed through the untrusted environment variables. When privileged mode is not specified, these functions provide at least a couple of generic ways to exploit the vulnerability. To begin, the BASH_ENV is an environment variable specifying a path to a shell script that will be executed by bash during a non-interactive start-up. One can set up an arbitrary startup script to be executed by bash running without privileged mode. Shown below is the code snippet of run_startup_files in shell.c:

A second approach is by using bash shell functions. When commands are executed in bash without an absolute path, it is possible to hijack those commands by exporting shell functions having the same name as that of the command being executed. This is possible even when the PATH environment variable is set to trusted paths. The corresponding source code can be found across shell.c and variables.c files:

Knowing this, let’s take a look at some of the privileged mode bugs in Parallels Desktop and their exploitation.

CVE-2023-27322 - Local Privilege Escalation Through Parallels Service

This bug was submitted by Grisha Levit and is also identified as ZDI-23-216. Parallels Service forks a child process and executes an embedded script using a non-interactive bash shell invoked as /bin/bash -s. The parent process writes the embedded script through a pipe to the child process running the bash shell. Before invoking the bash shell, Parallels Service calls setuid(0) to set the real user identifier to the effective user identifier (root). Here is the relevant code snippet from the executable in Parallels Desktop version 17.1.4:

The execv function is a wrapper around execve, which fetches the environment using _NSGetEnviron() and passes it to execve. Therefore, the bash shell spawned as a child process has access to all the environment variables set by the user who launched Parallels Service, who may be an unprivileged user. Interestingly, the execution of an embedded shell script turned out to be not immediately vulnerable. This is because Parallels Service also has the setgid bit set and there is no corresponding call setgid(getegid()) as there was for the uid. Because of this, the real group identifier is not equal to that of the effective group identifier when bash is invoked. In such cases, bash identifies this as setgid execution, drops group privileges, and does not trust the environment. However, any further subshell launched from this bash shell will also have all the environment variables as well as the privileges of the parent shell, which is running as root and has the group privileges set after the call to disable_priv_mode. Considering this, the next interesting target is the watchdog script invoked from the embedded script as seen below:

The watchdog script uses /bin/bash as shebang and does not use privileged mode:

In this instance, bash trusts the environment. Because of this, the watchdog script can be exploited to gain root either by using the BASH_ENV environment variable or by exporting shell functions. Here is an example of exploitation using BASH_ENV:

To exploit using shell functions, we must identify a command to hijack. The watchdog script uses the echo command for printing some debug messages:

A shell function with the same name can be exported such that the malicious function is executed instead of the expected echo command. Note that exporting functions is a feature of bash. We must therefore use the bash shell to export the target function instead of using the default zsh shell in macOS.

This issue was fixed in Parallels Desktop 18.1.0 by adding the β€œ-p” flag, indicating privileged mode, to the shebang interpreter directive:

CVE-2023-27324 and CVE-2023-27325 - Local Privilege Escalation Through Parallels Updater

The next two bugs were found in the Parallels Updater prl_update_helper binary. These bugs were submitted by the researcher known as kn32 and are also identified as ZDI-23-218 and ZDI-23-219. In the case of CVE-2023-27324, the prl_update_helper binary invokes a bash script named inittool without setting privileged mode:

Before invoking the inittool script, the real user identifier is set to that of the effective user identifier, which is root. This means bash will run as root and will trust its execution environment, which can lead to local privilege escalation.

This vulnerability can be exploited by using the BASH_ENV environment variable or by exporting the shell function for the dirname command.

The next bug (CVE-2023-27325) in the Parallels Updater affects the inittool2 executable invoked from the inittool script. Like the Parallels Service, inittool2 forks a child process and executes an embedded script using a non-interactive bash shell invoked as /bin/bash -s. Exploitation is similar to that of CVE-2023-27324. In this case, the rm command can be hijacked to execute arbitrary code as root. Below is the embedded script from Parallels Desktop version 17.1.4:

Both CVE-2023-27324 and CVE-2023-27325 were fixed in Parallels Desktop 18.1.0 by clearing the environment during the call to posix_spawn. Instead of passing the environ array to the child process, the envp argument is now provided with a pointer to a NULL array during the call to posix_spawn. Below is the patch diff between 17.1.4 and 18.1.0:

Figure 1 - Patch diff of prl_update_helper executable

Additionally, the privileged mode flag β€œ-p” is also added to the shebang interpreter directive of the inittool script as well as the embedded script within inittool2. Note that the shebang of the embedded script is ignored since it is explicitly run using the bash interpreter.

CDPATH Handling in MacOS

During the analysis of these submissions, we also observed some differences in the way Apple bash handles β€œprivileged mode” as compared to the upstream bash. Apple’s bash in macOS 13.0.1 is based on GNU Bash 3.2:

The upstream bash in privileged mode ignores many variables such as SHELLOPTS, BASHOPTS, CDPATH, and GLOBIGNORE as mentioned below:

Based on the CHANGES, here is a timeline of various changes related to privileged mode. Parsing of SHELLOPTS was ignored starting from bash-2.02-alpha1 and therefore ignored in version 3.2 too.

BASHOPTS was introduced at a later stage in bash-4.1-alpha and therefore not applicable to version 3.2.

The CDPATH and GLOBIGNORE variables were ignored only since bash-4.0-beta and therefore still get processed in Apple’s bash, which is based on version 3.2.

The CDPATH environment variable can be set to a colon-separated list of directories, which can then be used as a directory root by the built-in β€œcd” command instead of the current working directory (CWD). In the case of Apple’s bash, if a bash script executed through a setuid wrapper uses β€œcd [Absolute Path to Trusted Directory]” to change the CWD and further uses β€œcd subdirectory” to change the CWD, the later cd command with the relative path can be hijacked to a location controlled by an attacker by setting the CDPATH variable. Consider the sample code below:

Here is the outcome as tested in Ubuntu:

It is seen that the CDPATH environment variable is ignored in bash privileged mode (-p). While repeating the same in macOS, it is honored.

This can become problematic when a script is written assuming bash privileged mode behavior in macOS to be the same as that of the upstream version. You may note the duplicated /tmp/secure line. This is not a typo. It comes from the POSIX standard. If CDPATH is used for a directory change, the new directory path is echoed to stdout, which is the first line /tmp/secure. The second line comes from the pwd command.

Here is the comparison of builtins/cd.def which handles CDPATH in Apple bash versus the upstream version:

Figure 2 - Missing privileged mode check when handling CDPATH

Similarly, differences in GLOBIGNORE handling can be seen by diffing variables.c source file:

Figure 3 - Missing privileged mode check when handling GLOBIGNORE

Since macOS Catalina, zsh is used as the default shell. The official announcement for the same can be found here. Bash is deprecated on macOS and likely exists only for backward compatibility. For any bash scripts executed through a setuid wrapper, one must ensure privileged mode β€œ-p” is enabled. In addition to that, beware of the differences in privileged mode between Apple bash and the upstream version. This is specifically noticed in the handling of the CDPATH and GLOBIGNORE environment variables.

Conclusion

Parallels Desktop is a popular target for researchers. We’ve already published seven advisories in the product in 2023 to go along with the 10 we published in 2022. With Parallels Desktop being one of the major virtualization solutions used in macOS, it’s understandable why it can be an enticing target for threat actors. My research into Parallels continues, and I’ll blog about any significant findings in the future. Of course, if you find similar vulnerabilities, we’d be interested in seeing those as well.

Until then, you can follow me @renorobertr and follow the team on Twitter, Mastodon, LinkedIn, or Instagram for the latest in exploit techniques and security patches.

❌
❌