RSS Security

🔒
❌ About FreshRSS
There are new articles available, click to refresh the page.
Before yesterdayZero Day Initiative - Blog

Analysis of a Parallels Desktop Stack Clash Vulnerability and Variant Hunting using Binary Ninja

9 September 2021 at 14:59

Parallels Desktop uses a paravirtual PCI device called the “Parallels ToolGate” for communication between guest and host OS. This device is identified by Vendor ID 0x1AB8 and Device ID 0x4000 in a Parallels guest.

The guest driver provided as part of Parallels Tools and the host virtual device communicate using a ToolGate messaging protocol. To provide a summary, the guest driver prepares a message and writes the physical address of the message to TG_PORT_SUBMIT [PORT IO ADDRESS+0x8]. The host then maps the guest provided physical address, parses the message, and transfers control to the respective request handlers for further processing. Many of the bugs received by the ZDI program in Parallels are in these request handlers.

ToolGate Interface and Protocol Format

The ToolGate request format is a variable size structure that could span across multiple pages. The guest sends data to the host as inline bytes or pointers to paged buffers by writing the physical address of TG_PAGED_REQUEST structure to the IO port of the virtual device.

Figure 1 - Variable size TG_PAGED_REQUEST structure in guest memory

Figure 1 - Variable size TG_PAGED_REQUEST structure in guest memory

Figure 2 - Variable size TG_PAGED_BUFFER structure in TG_PAGED_REQUEST

Figure 2 - Variable size TG_PAGED_BUFFER structure in TG_PAGED_REQUEST

The host then maps the page pointed to by the physical address, prepares a host buffer based on the information provided by the guest and then invokes the request handler. Request handlers may use inline bytes or paged buffers or both, for reading and writing data. These data buffers are accessed using a set of functions as roughly defined below:

TG_DataBuffer *TG_GetBuffer(TG_ReqBuffer *buf, uint32_t index, uint32 writable) uint32_t TG_ReadBuffer(TG_DataBuffer *buf, uint64_t offset, void *dst, size_t size) uint32_t TG_WriteBuffer(TG_DataBuffer *buf, uint64_t offset, void *src, size_t size)
void *TG_GetInlineBytes(TG_DataBuffer *buf)

Return of the Stack Clash Vulnerability

The STARLabs submission (ZDI-21-937) for Pwn2Own 2021 is an uncontrolled memory allocation vulnerability where a guest provided size value is allocated in the stack. If the size provided by the guest is more than the total size of the stack, it is possible to shift the Stack Pointer (RSP) into other regions of process memory (e.g., stack region of another thread).

Figure 3 - Normal stack operation (left) vs stack jumping due to large allocation (right)

Figure 3 - Normal stack operation (left) vs stack jumping due to large allocation (right)

When handling TG_REQUEST_INVSHARING (0x8420), Parallels Desktop does not validate the ByteCount value provided by the guest as part of TG_PAGED_BUFFER. When allocated in stack, this untrusted size value can be used for shifting the stack top of thread processing the ToolGate request into another victim thread to overwrite its contents. There is a guard page that is 4KB in size, however it is possible to jump over this small allocation without accessing it. Qualys published a detailed paper titled The Stack Clash on this bug class back in 2017, which also leads to various compiler mitigations to prevent such guard page jumps.

Below is the vulnerable section of code from Parallels Desktop 16.1.3. During the call to TG_ReadBuffer(), the stack memory of another thread can be overwritten with guest-controlled values.

Figure 4 - Vulnerability in TG_REQUEST_INVSHARING handling

Figure 4 - Vulnerability in TG_REQUEST_INVSHARING handling

Backward Compatibility and Compiler Mitigations

The most interesting question here is what happened to the stack clash compiler mitigation in Apple Clang? When a variable size allocation is made in stack like alloca(value) or char buffer[value], the Apple Clang compiler instruments the allocation with ___chkstk_darwin() to validate the size of allocation request. ___chkstk_darwin() essentially allocates a PAGE_SIZE of memory in stack, and then, for each page allocated, probes the new stack top if it is accessible. If the case guard page is reached, the probing step will fail leading to a safe crash. It is no longer possible to shift the stack pointer into an arbitrary location.

Figure 5 - Sample code with stack clash mitigation

Figure 5 - Sample code with stack clash mitigation

It is clear that Parallels Desktop did not have this mitigation enabled because there is no call to ___chkstk_darwin() during the variable size stack allocation. At this point there are couple of open questions:

        -- Did Parallels disable the mitigation using -fno-stack-check compiler flag?
        -- Did they use an old build configuration which disabled the mitigation?

Mac OS otool can be used to get a fair amount of information regarding the build environment. Specifically, LC_VERSION_MIN_MACOSX can provide information regarding the macOS versions supported. Here is the output of otool -l prl_vm_app:

Figure 6 - LC_VERSION_MIN_MACOSX  information of prl_vm_app

Figure 6 - LC_VERSION_MIN_MACOSX  information of prl_vm_app

The SDK used is 10.15 (Catalina), which is quite new. Moreover, Parallels has also set the minimal macOS version supported to 10.13, which makes it compatible with High Sierra. This aligns with the compatibility information provided in their KB article. Still, does backward compatibility for 10.13 disable the compiler mitigation? Here is a comparison of sample code compiled with and without -mmacosx-version-min=10.13:

Figure 7 - Backward compatibility with 10.13 disables ___chkstk_darwin() (right)

Figure 7 - Backward compatibility with 10.13 disables ___chkstk_darwin() (right)

It is unclear if Parallels had ___chkstk_darwin() explicitly disabled using -fno-stack-check, but setting -mmacosx-version-min=10.13 has the same effect and silently drops the mitigation. The same behavior is also observed with -mmacosx-version-min=10.14 (Mojave). Interestingly, GCC has inlined the stack probe check instead of depending on an external library function. This is possibly due to an external dependency, as compiling in Apple Clang with the backward compatibility flags (macosx-version-min, target etc.) ended up removing the mitigation.

Saying that, Mojave (10.14.6) did not give any symbol errors when running executables with calls to ___chkstk_darwin(). One can find many such issues in High Sierra (10.13.6). It is noticed that, in older compilers (e.g. Apple LLVM version 10.0.1 on Mojave), the stack clash mitigation is not enabled by default unless the -fstack-check flag is explicitly provided. Therefore, the recent compilers seem to drop the mitigation entirely when compiled with macosx-version-min for any version less than 10.15. This can be worked around by providing both the -fstack-check and -mmacosx-version-min flags together. However, High Sierra compatibility is questionable. The highlight is that using macosx-version-min alone can make the bug exploitable even on latest versions of Mac OS.

Figure 8 - Fig 8. Apple Clang calls ___chkstk_darwin (left) vs GCC mitigation inlined (right)

Figure 8 - Fig 8. Apple Clang calls ___chkstk_darwin (left) vs GCC mitigation inlined (right)

Variant hunting using Binary Ninja

The next question is, are there other similar bugs in Parallels Desktop? How can we automate this process? The size of a stack frame is generally known during compile time. Moreover, any operations that shift the stack pointer can also be tracked. Binary Ninja has static data flow capability which keeps track of the stack frame offset at any point in a function. However, when there is a variable size allocation in stack, the stack offsets cannot be determined statically. This exact property can be used to look for variants of the bug.

Figure 9 - Fig 9. Known stack offset (left) vs Undetermined value after alloca() (right)

Figure 9 - Fig 9. Known stack offset (left) vs Undetermined value after alloca() (right)

Consider the index 88 in the above Low-Level IL, where RSP register is loaded.

Here, the new stack top is calculated using a guest-provided size and loaded into RSP. Binary Ninja provides get_possible_reg_values() and get_possible_reg_values_after() python APIs to fetch statically determined values for registers. The register values are also associated with type information (RegisterValueType). Here is the stack frame offset value in RSP before and after the load operation:

The RSP is always associated with StackFrameOffset RegisterValueType. However, when the RSP value is not known, it is marked as UndeterminedValue. With this value type information, search for all references to TG_ReadBuffer() where RSP value is undetermined. If RSP is undetermined before the call to TG_ReadBuffer(), its deduced that a variable size allocation was made in stack prior to the call.

The above query yielded 3 results; one was the Pwn2Own submission and 2 other previously unknown vulnerabilities.

      0x1001c6e35 - ZDI-21-1056 - TG_REQUEST_GL_CREATE_CONTEXT
      0x10025591a - ZDI-21-1055 - TG_REQUEST_DIRECT3D_CREATE_CONTEXT
      0x10080bcd9 - ZDI-21-937 - (Pwn2Own) - TG_REQUEST_INVSHARING

Figure 10 - Pwn2Own bug and its variants found using Binary Ninja

Figure 10 - Pwn2Own bug and its variants found using Binary Ninja

Conclusion

Our research has shown that, in some cases, compiling for backward compatibility might silently drop mitigations, thus making an entire bug class exploitable. Perhaps the vendors should consider releasing separate binaries in such cases?

We also took a look at how Binary Ninja’s static data flow capability and python API’s can be useful in automating bug finding tasks. If you find any such vulnerabilities, consider submitting it to our program. Until then, you can find me on Twitter @RenoRobertr, and follow the team for the latest in exploit techniques and security patches.

Analysis of a Parallels Desktop Stack Clash Vulnerability and Variant Hunting using Binary Ninja

Parallels Desktop RDPMC Hypercall Interface and Vulnerabilities

29 April 2021 at 16:02

Parallels Desktop implements a hypercall interface using an RDPMC instruction (“Read Performance-Monitoring Counter”) for communication between guest and host. More interestingly, this interface is accessible even to an unprivileged guest user. Though the HYPER-CUBE: High-Dimensional Hypervisor Fuzzing [PDF] paper by Ruhr-University Bochum has a brief mention of this interface, we have not seen many details made public. This blog post gives a brief description of the interface and discusses a couple of vulnerabilities (CVE-2021-31424/ZDI-21-434 and CVE-2021-31427/ZDI-21-435) I found in UEFI variable services.

Parallels Desktop has support for two Virtual Machine Monitors (VMM): Apple’s built-in hypervisor and the Parallels proprietary hypervisor. Prior to macOS Big Sur, the Parallels proprietary hypervisor is used by default. With this hypervisor there is a considerable amount of guest-to-host kernel attack surface, making it an interesting target. The details in this blog correspond to Parallels Desktop 15.1.5 running on a macOS Catalina 10.15.7 host.

Dumping the VMM

The proprietary VMM is a Mach-O executable that is compressed and embedded within the user space worker process prl_vm_app. The worker process injects the VMM blob into the kernel using an IOCTL_LOAD_MONITOR request to the prl_hypervisor kernel extension. The address of the zlib-compressed VMM and its sizes are maintained in a global structure, which can be used to dump the blobs for analysis. The VMM Mach-O binary has function names, making it easier to locate the hypercall handler for RDPMC.

Figure 1 - Compressed VMM Mach-O executable

Figure 1 - Compressed VMM Mach-O executable

When the guest executes an RDPMC instruction, the VMM calls Em_RDPMC_func()->HandleOpenToolsGateRequest() to process the request. The arguments to the hypercall are expected through the general-purpose registers RAX, RBX, RCX, RDX, RDI and RSI. The status of the request is returned through register RAX. The VMM also has an alternate code path PortToolsGateOutPortFunc()->HandleOpenToolsGateRequest(), reachable by writing to I/O port 0xE4.

HandleOpenToolsGateRequest() dispatches the request based on the value of register RAX and sub-commands in other registers. The code path of interest for this writeup is Em_RDPMC_func()->HandleOpenToolsGateRequest()->OTGHandleGenericCommand(), which can be reached by setting RAX = 0x7B6AF8E and RBX = 7. OTGHandleGenericCommand() further supports multiple guest operations based on the value set in register RDX. The debug messages quickly reveal that RDX = 9 handles UEFI service requests for reading and writing UEFI variables.

The UEFI runtime variable services in Parallels Desktop include three components: UEFI firmware, a hypercall interface in the VMM, and an API through which the VMM makes requests to the host user space prl_vm_app worker process. The VMM and the worker process communicate using shared memory.

Analyzing the Firmware

The UEFI firmware that ships with Parallels Desktop (efi64d.bin and efi64.bin) is based on EDK2. Just like the VMM Mach-O binary, it is a zlib-compressed binary starting with 12 bytes of magic header. To analyze the firmware, decompress the file skipping the first 12 bytes and load it using the efiXplorer IDA Pro plugin. This may take a while, but it does work well. Once the analysis is over, search the firmware for the hypercall number for invoking OTGHandleGenericCommand (0x7B6AF8E).

Figure 2 - Search for OTGHandleGenericCommand

Figure 2 - Search for OTGHandleGenericCommand

The search returned multiple results, but the most interesting ones for the UEFI runtime variable services hypercall are part of VariableRuntime.Dxe. Note that the firmware relies on I/O port 0xE4 for the hypercall instead of RDPMC, as illustrated below.

Figure 3 - UEFI firmware invoking hypercall

Figure 3 - UEFI firmware invoking hypercall

By cross-referencing the hypercall, the UEFI variable driver entry points can be located in the firmware. Then, by comparing the decompiled code with VariableServiceInitialize() in EDK2, the handlers for UEFI runtime variable services can be easily identified. This can be done using the efiXplorer IDA plugin, which imports all the type information.

Consider the callback for GetVariable(). The firmware sets up a 48 byte request structure with an operation type (0x10) and other required fields. OTG_Hypercall() loads the address of the request structure in register RSI and triggers the hypercall as seen in Figure 3. Similarly, each variable service has an operation type associated with it. By analyzing the callbacks for SetVariable(), GetNextVariableName(), and QueryVariableInfo(), the operation type to service mapping as well as the structure of the VMM service request can be recovered.

Figure 4 - UEFI GetVariable() service

Figure 4 - UEFI GetVariable() service

Figure 5 - VMM request structure

Figure 5 - VMM request structure

Table 1 - Mapping Variable services to VMM operations

Table 1 - Mapping Variable services to VMM operations

Hypercall Vulnerabilities

We will be examining some vulnerabilities in OTGHandleGenericCommand. A simplified view of the decompiled code is shown below. Note that the Parallels VMM uses functions ReadLinear() and WriteLinear() for reading from and writing to guest memory respectively. MonRetToHostSwitch() transfers control from the VMM to the user space worker process (still on the host) to handle a specific API request, and the parameter value of 0xD7 corresponds to API_EFI_VAR_REQUEST.

CVE-2021-31424/ZDI-CAN-12848 – Heap Overflow

The first bug is a heap overflow. The size of the UEFI variable name provided by the guest is not validated. Therefore, the copy operation using ReadLinear() overflows the host kernel heap by a guest provided value * 2 (UTF-16).

CVE-2021-31427/ZDI-CAN-13082 - Time-Of-Check Time-Of-Use Information Disclosure

The second interesting observation I made during my analysis was that the data size in the UEFI service request is written to shared memory before validation. After writing, the VMM validates the data size, but only when handling SetVariable(). For read requests, such as GetVariable(), GetNextVariableName(), or QueryVariableInfo(), the validation is delegated to user mode process using the MonRetToHostSwitch(API_EFI_VAR_REQUEST) call.

After MonRetToHostSwitch(API_EFI_VAR_REQUEST) returns, the VMM checks the status set by the user space. If the status is 0, WriteLinear() fetches data size from the shared memory again for writing back to the guest. This is where things get interesting. There is a race window between the call to user space MonRetToHostSwitch() and WriteLinear() in the VMM. If the data size can be updated to some untrusted value, with status set to 0, it is possible to trigger an out-of-bounds read during WriteLinear(). To trigger the race, it is necessary to understand how the status is updated in the shared memory by the worker process. In prl_vm_app, the handler for API_EFI_VAR_REQUEST is at the address 0x1000DEDF0:

EFIVar.datasize is updated or validated in the user space and status is set to 0 only when a request is successful. Otherwise EFIVar.datasize is set to 0 and status is set to a non-zero error code. The simplest request type turned out to be QueryVariableInfo(), which returns the maximum storage size, remaining storage size, and maximum size of a single UEFI variable. It also sets the status to 0 when the expected data size equals 24. As there are no state changing operations, QueryVariableInfo() is ideal for triggering the bug. Consider the following scenario:

Thread A – Keep sending SetVariable() request with arbitrary data size value > 0x1000 bytes that updates SharedMem->EFIVar.datasize but always returns without entering the worker process due to the validation request.datasize > 0x1000.

Thread B – Keep sending QueryVariableInfo() requests, which sets status to 0. If thread A updates the SharedMem->EFIVar.datasize after the status is set by QueryVariableInfo() in the user space but before the VMM copies data using WriteLinear(), an out-of-bounds read can be triggered. Below is a debug log of the VMM page fault when the OOB read hits an unmapped kernel address.

Conclusion

What made these bugs particularly interesting is that they are reachable through a lesser-known interface. Also, they can be triggered by an unprivileged guest user to execute code in the host kernel. That said, since the introduction of macOS Big Sur, the Parallels proprietary hypervisor is not used by default. Parallels patched both these RDPMC hypercall bugs in the recently released 16.5.0 along with many other issues reported through the ZDI program.

You can find me on Twitter @RenoRobertr, and follow the team for the latest in exploit techniques and security patches.

Parallels Desktop RDPMC Hypercall Interface and Vulnerabilities

MindShaRE: Analysis of VMware Workstation and ESXi Using Debug Symbols from Flings

7 January 2021 at 17:27

The availability of debug symbols greatly assists a researcher in understanding a software architecture, performing live debugging or static analysis. An end-to-end black box analysis of a closed source hypervisor is a time-consuming process. Microsoft has made this work easier by publishing debug symbols for most of the Hyper-V components. However, there is still no debug info available for VMware Workstation (WS) or ESXi. Considering this, the Project Zero blog posts on Adobe Reader symbols greatly inspired me to carry out a similar analysis for VMware.

This blog details how VMware Flings can be useful in obtaining some of the symbol information stripped from VMware WS or ESXi. Flings are free, short-term projects released without support by VMware mostly as an enhancement to some of the existing products. The two Flings of interest for this analysis are VNC Server and VNC Client and ESXi ARM Edition, the former having DWARF debug information for SVGA device implementation and the latter having function names of many other components of the vmx worker process.      

SVGA symbols in VNC Server and VNC Client Fling

VNC Server and VNC Client Fling released in February 2016. It is a cross-platform VNC implementation with code leveraged from VMware Workstation. The Fling has VNC server and client binaries for all major operating systems – Windows, Linux, and Mac. The Windows binary is not accompanied by a corresponding PDB debug file, but the Mac and Linux binaries have the embedded debug information in them.

In order to understand the code shared by VNC Fling and WS, I decided to compare the binaries having debug information against vmware-vmx. For the comparison to be effective, it is best to choose a WS version released around the same timeline as that of the Fling. The idea behind this is to increase the likelihood of having a similar code base as well as build environment. Since the Fling was released on February 25, 2016, the following list of WS and Fusion releases seemed ideal for analysis:

WS version

Fusion version

Release date

12.0.1

8.0.2

29 Oct 2015

12.1.0

8.1.0

08 Dec 2015

12.1.1

8.1.1

21 Apr 2016

12.5.0

8.5.0

08 Sep 2016

 

IDA’s F.L.I.R.T. (version 7.5) was my first choice for performing signature matching between executables. To generate the pattern file, I modified the IDB2PAT script published by FireEye to support 64-bit mode RIP relative addressing. In RIP relative addressing, 32-bit signed displacements (+/- 2GB) are used to reference code or data. These 4 bytes of displacement are treated as variable bytes during signature generation. Below is essential part of the patch applied to find_ref_loc() function:

Three binaries are under consideration: mksVNCServer for Linux, mksVNCClient for Linux, and mksVNCServer for macOS. The mksVNCServer binary returned the best results during signature matching and also had a superset of functions available in mksVNCClient. Moreover, the availability of DWARF debug information provides rich details regarding source code, structure declarations, function inlining and other optimizations. The WS version 12.1.0 released couple of months before the Fling turned out to be the most promising one. Here is the summary of FLIRT signature matching:

WS version

FLIRT hits

12.0.1

40041

12.1.0

43283

12.1.1

43231

12.5.0

42998

 

After narrowing down the version of interest, I relied on the symbol porting feature in BinDiff to import the function and variable names from mksVNCServer.

Figure 1 - vmware-vmx after porting symbols using BinDiff

Figure 1 - vmware-vmx after porting symbols using BinDiff

Anyone who has previously looked into VMware’s SVGA attack surface will recognize where these functions originate. If you are new to this, Wandering through the Shady Corners of VMware Workstation/Fusion and Straight outta VMware [PDF] are excellent references to start with. 

What more essential information can be ported to WS from the Fling? The type information. IDA can export typeinfo as C header from mksVNCServer, which can be then loaded in vmware-vmx. There are some caveats in this approach. The exported C header needs a few fixes, like renaming variables with C++ keywords (new, template, class and private), rewriting of certain variadic function definitions and so forth to be successfully parsed by IDA. Once the typeinfo is imported, function prototypes can be ported, too. To accomplish this, first extract each prototype from mksVNCServer as a key value pair of function_name:function_type, then iterate through the extracted type information and apply it to vmware-vmx having symbols.

Figure 2 - vmware-vmx after porting function prototypes

Figure 2 - vmware-vmx after porting function prototypes

At this point, it is convenient to analyze vmware-vmx and mksVNCServer side-by-side. Moreover, there are couple other of the dwarves tools [PDF] that I find useful in static analysis of available DWARF information: pahole and pfunct.

pahole was originally developed to inspect alignment holes in structures using DWARF debug information. Considering that mksVNCServer is compiled with debug information, pahole provides a way to analyze data structures, their size, and their usage information in the source file. It is possible to either query a particular structure by name using -C or dump everything including anonymous structures using -a and then grep for information.

Similarly, pfunct provides great insights about functions. This is especially useful in recovering details regarding inlined function definitions and local variable optimizations. Consider the below case of StateFFP_TranslateSM4(), where pfunct allows us to statically map a code block from 0xe8610 - 0xe864c (60 bytes) to AddOutputDecl().

Figure 3 - Block of inlined code belonging to AddOutputDecl()

Figure 3 - Block of inlined code belonging to AddOutputDecl()

Now what? Can we put together all this information for a better understanding of past vulnerabilities or research? Yes - the first thing that comes to mind is shader translation. In fact, StateFFP_TranslateSM4() analyzed using pfunct is one of those vulnerable functions.

WS 12.5.5 released in March 2017 fixed some vulnerabilities in shader translation. We are not going to dive into the details of the bugs again. Wandering through the Shady Corners of VMware Workstation/Fusion provides a very detailed walkthrough of shader attack surface, the vulnerabilities found in opcode handling, and the proof-of-concepts to trigger them. I was more curious to check what the vulnerable code looks like after porting all the symbols and type information to WS 12.1.0.

Figure 4 - Vulnerabilities in StateFFP_TranslateSM4()

Figure 4 - Vulnerabilities in StateFFP_TranslateSM4()

Clearly, the decompiled code has more information than previously available from vmware-vmx-debug. This being the tip of iceberg, a lot more shader bugs got fixed over the years. In the current state of GPU virtualization, shaders are probably the JavaScript of hypervisors. Given the reality of this complex and ever-growing attack surface, VMware has now introduced a sandboxed graphics renderer as a security enhancement.

At this point, one might wonder if this debug information from the Fling is still relevant, given it was released 5 years back? I strongly believe that it is. Despite all the changes due to bug fixes and feature additions, the core design and APIs have not changed drastically. Also, this can be a great addition to the paper Straight outta VMware [PDF] for anyone interested in analyzing VMware’s SVGA implementation.

Symbols in ESXi ARM edition

The next Fling of choice is the more recent ESXi ARM edition released on October 6, 2020. Since ESXi ARM is bound to share a lot of code with ESXi x86, this is an easy pick for analysis when compared to the VNC Fling. But how do we set up ESXi ARM? The easily available options are installation on a Raspberry Pi or emulation with QEMU. However, a more convenient option for static analysis is to just extract the vmx executable from the ISO image. To get this working, install ESXi x86 7.0 (available for free download as a guest VM) then extract the ESXi ARM vmx executable using the vmtar utility available in ESXi x86. Note that the vmx mentioned in this section has nothing to do with Virtual Machine Extension (VMX) but refers to the VMware worker process executable.

After successfully extracting the vmx aarch64 ELF, things did not go as I hoped. The binary was completely stripped of debug information. However, the dynamic section had a lot more entries than one would generally see in an executable. A quick line count of readelf -s returned a number as high as ~25k. Below is a rough comparison of the number of entries in the dynamic symbol table of ESXi for x86 and ARM (Fling version 1.1):

Executable

Entries in x86

Entries in ARM

vmx

820

25200

vmx-debug

845

25434

vmx-stats

822

30496

 

It looks like the aarch64 executables are compiled with the linker flag --export-dynamic/-E, which has exported all non-static functions and global variables into the dynamic symbol table. Let’s do a quick grep for a known attack surface, say the virtual XHCI USB controller recently patched by VMware.

The results are surprisingly good. In case of a virtual device, these function names can help us identify a code block emulating a certain hardware specification. There are also symbols available for many other low-level interfaces such as the PhysMem family of functions mentioned in the patent for Transparent Page Sharing [TPS]. Even if a virtual device has minimal dynamic symbols (UHCI, EHCI, etc.), the presence of symbols for other low-level APIs makes it easier to understand them. 

Once the initial analysis is over, we can port the symbols from ESXi ARM to ESXi x86. Since BinDiff has the ability to compare executables from two different CPU architectures, this is a very realistic use case to try out this feature.   

Figure 5 - Symbols ported to ESXi x86 from ESXi ARM using BinDiff

Figure 5 - Symbols ported to ESXi x86 from ESXi ARM using BinDiff

In fact, the results turned out to be very satisfying. We never had so many symbols for ESXi before, and this provides a good start for side-by-side analysis. Moreover, with the availability of symbols for vmx executable, one can understand its communication with Virtual Machine Monitor (VMM) much better. 

In regards to the VMM, a couple of observations have already been made. An embedded VMM ELF in the vmx executable is loaded by a kernel driver (Hypervisor Framework [PDF]). Also, the embedded ELF has symbols (Wandering through the Shady Corners of VMware Workstation/Fusion). Dumping the VMM is a two-stage process: a loader vmmblob ELF followed by another embedded vmmmods ELF.

Figure 6 - Embedded vmmblob loader code

Figure 6 - Embedded vmmblob loader code

Figure 7 - Embedded vmmmods VMM code

Figure 7 - Embedded vmmmods VMM code

These symbols are not only available in ESXi ARM edition but across ESXi x86 and WS. What I really wanted to check was how much of the code from VMM overlaps with that of vmx. Can symbols in VMM be ported to vmx? Since the ARM edition has symbols for both vmx and VMM, it is an ideal choice to perform this comparison. We are particularly interested in BinDiff matches based on “name hash matching”. Though around 100 entries were found, only few had high similarity. Most other functions differ in their implementations, making it hard to port the symbols from the VMM.

Figure 8 - PhysMem_Get - vmx (left) vs VMM (right)

Figure 8 - PhysMem_Get - vmx (left) vs VMM (right)

Porting the symbols from VMM is not a concern anyway, since the vmx executable in ESXi ARM already has them. Evidently, the time spent on searching and matching the Flings have provided us with useful debug information beyond vmx-debug or the VMM. It also demonstrates how some less significant pieces of software can carry significant amount of information about production code. 

Conclusion

Going forward, Flings can be a great addition for anyone analyzing WS/ESXi or other VMware products. They certainly proved to be helpful in obtaining some of the symbol information stripped from VMware WS or ESXi. Understanding these debug symbols is key to understanding how the program works and where vulnerabilities may be found. Hopefully, Flings will help your research into VMware vulnerabilities as well.

You can find me on Twitter @RenoRobertr, and follow the team for the latest in exploit techniques and security patches.

MindShaRE: Analysis of VMware Workstation and ESXi Using Debug Symbols from Flings

❌