Normal view

There are new articles available, click to refresh the page.
Before yesterdayZero Day Initiative - Blog

MindShaRE: Analyzing BSD Kernels for Uninitialized Memory Disclosures using Binary Ninja

21 September 2022 at 16:39

Disclosure of uninitialized memory is one of the common problems faced when copying data across trust boundaries. This can happen between the hypervisor and guest OS, kernel and user space, or across the network. The most common bug pattern noticed among these cases is where a structure or union is allocated in memory, and some of the fields or padding bytes are not initialized before copying it across trust boundaries. The question is, is it possible to perform variant analysis of such bugs?

The idea here is to perform a control flow insensitive analysis to track all memory store operations statically. Any memory region never written to is identified as uninitialized when the data from it is copied across trust boundaries.

Generalizing the code pattern for analysis

Consider the case of CVE-2018-17155, a FreeBSD kernel memory disclosure in the getcontext() and swapcontext() system calls due to a lack of structure initialization. Shown below is the patch for sys_getcontext(). The listing on the left shows the patched code. sys_swapcontext() was patched in a similar fashion.

Figure 1 - Patch for sys_getcontext() information disclosure. Vulnerable code appears on the right.

The vulnerable code declared a ucontext_t structure on the stack, wrote to some but not all fields, and finally used copyout() to copy UC_COPY_SIZE bytes of data from the structure to userland. The problem here is that not all fields are initialized, so any data occupying the uninitialized parts of the structure memory region are disclosed. To solve the problem, the patched code zeroes out the entire structure using the bzero() function.

The generalization of the above code pattern looks like this:

• A memory region (structure, union, etc.) is declared on the stack or allocated on the heap, which could be the source of uninitialized memory.
• The memory region may get fully or partially written.
• There is an API that transfers data across trust boundaries. This could be the sink for uninitialized memory.
• The API generally takes at least 3 parameters: source buffer, destination buffer, and size. In this case, the source of the memory is a stack offset, and the size of the transfer is a constant value. A constant size of transfer means the value is either the entire size of the memory region (using sizeof operator) or a part of it until an offset.
• The memory region may be zeroed out before usage using functions like memset() or bzero().

The sink function is application-specific. To mention a few of the more likely sinks: copy_to_user() in case of Linux kernel, copyout() in case of BSD kernels, send() or sendto() for network transfers or any wrappers around them. The definitions of these functions are either documented, or else understood by reverse engineering if the target is closed source.

Searching the code pattern for analysis

Once the sink function and its definition are known, we can query for calls to the sink function with a constant size argument and source buffer pointing to a stack offset or heap memory. Querying for a pointer to stack memory is straightforward, whereas detecting heap pointers requires visiting the definition site of source variables. Consider the definition of copyout() function in BSD:

         copyout(const void *kaddr, void *uaddr, size_t len)

When looking for stack memory disclosures, search for cross-references to the copyout() function where kaddr is pointing to a stack offset and the len parameter is a constant.

Binary Ninja has a static data flow feature that propagates known values within a function, including stack frame offsets and type information. Using this feature, it is possible to narrow down calls to copyout() that satisfy our search criteria. To understand this better, let’s inspect the arguments passed to copyout() from sys_getcontext().

Figure 2 - sys_getcontext() invoking copyout(kaddr, uaddr, len)

The kaddr parameter, or params[0], holds a kernel stack pointer, is shown as the stack frame offset -0x398. The value for the len parameter, or params[1], is shown as the constant 0x330. Since Binary Ninja has no information regarding uaddr, this is shown as <undetermined>. With this register type information for kaddr and len, the following query fetches all instances of calls to copyout() with a kernel stack pointer and constant size:

Statically tracking memory stores

The core idea of the analysis is to track all the memory store operations using Binary Ninja’s static data flow capability and propagate pointers manually using Single Static Assignment (SSA) form whenever necessary. For tracking stack memory stores in local function scope, we rely on Low-Level IL (LLIL), because Medium Level IL (MLIL) abstracts stack access and might eliminate some of the memory stores. For tracking inter-procedure store operations where the address is passed to another function, we rely on the MLIL SSA form to propagate the pointers. The visitor class implemented to handle IL instructions is based on Josh Watson’s Emilator.

Tracking stack memory stores with LLIL

In LLIL, any instruction writing to memory is represented as an LLIL_STORE operation. It has a source and destination parameter. The idea is to linearly visit each LLIL instruction in a function and check if it is an LLIL_STORE operation having a stack frame offset as its destination. When a memory store writing to stack is identified, we will log the source offset of the write and its size. Consider a simple 8-byte memory move operation and its corresponding LLIL information provided by Binary Ninja:

Figure 3 - LLIL_STORE operation in freebsd32_sigtimedwait()

The StackFrameOffset value is the offset from the base of the stack and the size property gives the size of the store operation. Using this information, it is possible to know which memory address are being written. In this case, the addresses from stack base offset -116 to -109 (8 bytes) are being initialized.

Static function hooks and memory writing APIs

While memory store instructions are one way to initialize memory, functions like memset() and bzero() are frequently used to initialize a memory region with NULLs. Similarly, functions such as memcpy(), memmove(), bcopy(), strncpy(), and strlcpy() are also used to write to a memory region. All these functions have something in common: there is a destination memory pointer and a size to write. If the destination and size values are known, it is possible to know the memory region being written to. Consider the case of bzero(), which is used to clear stack memory in the patched sys_getcontext():

Figure 4 - Clearing stack memory using bzero()

By querying the destination pointer and size parameters, it is possible to know their respective values and hence the target memory region.

Now let us consider how the analyzer can handle CALL operations. Static hooks are handlers to functions which we intend to handle differently compared to other functions. For any CALL instruction with a known destination i.e., MLIL_CONST_PTR, the symbol is fetched to check for static hooks.

A JSON configuration with the function names as well their positional parameters (destination buffer and size) is provided to the analyzer for static hooking:

The copyin() function is specific to BSD kernels. It is used to initialize kernel buffers with data from user space. Any target-specific functions to hook can be added to the JSON config and handled in visit_function_hooks() as per necessity.

Handling x86 REP optimization

Many times compilers optimize memory writing functions into REP instructions or a series of store operations. While store operations introduced due to optimization can be handled like any other store operation, REP instructions requires special handling. Static function hooks are not useful in detecting memory writes due to REP. So how do we handle such optimizations and avoid missing those memory writes? First, let’s look at how Binary Ninja translates the REP instruction in LLIL or MLIL.

Figure 5 - memcpy() optimized to REP instruction

Figure 6 - REP instruction translation in MLIL

The REP instruction repeats the string operation until RCX is 0. The direction of copy operation depends on the Direction Flag (DF), hence the branching where one branch increments the source (RSI) and destination (RDI) pointers and the other decrements them. In general, it is reasonably safe to assume that DF will be 0, and that pointers are incremented.

When linearly walking through the ILs, the translated REP instruction will look no different from other instructions. The idea is to check for GOTO instruction, and for every GOTO instruction in IL, fetch the disassembly at the same address. If the disassembly is REP instruction, then fetch the destination pointer as well as size arguments and mark the memory region as initialized.

The LLIL has a get_possible_reg_values() API to read values of registers statically. The MLIL provides couple of APIs, get_var_for_reg() and get_ssa_var_version(), to map architecture registers to SSA variables. This is very useful when propagating values manually using SSA variables in the absence of RegisterValueType information (i.e. RegisterValueType.UndeterminedValue). Similar APIs are currently missing in LLIL and tracked as a feature request: API to get SSARegister for a register at a given LLIL instruction.

Tracking Inter-procedure memory stores with MLIL

At this point we can track memory store operations, CALL operations such as bzero(), memset(), and also deal with REP optimization. The next task is to track memory writes across function calls, as when a caller passes a memory address to a callee. The interesting challenge here is that once a stack pointer has been passed into another function, it can no longer be tracked using the register value type information (StackFrameOffset) as we did within the local function scope using LLIL (see above).

To solve this problem, we propagate the pointers within the callee function using MLIL SSA variables, just like propagating taint information. Whenever a MLIL_STORE_SSA instruction is encountered, we log the offset of the write operation and size values whenever the destination of the memory write operation is resolved manually based on values of SSA variables. The set_function_args() function shown below iterates through MLIL variables and assigns the value (pointer) passed by the caller:

Once the initial SSA variables are set, we visit all the instructions linearly to propagate the pointer and log memory writes. While doing this, the most common operation performed on the pointer is addition. Therefore, it is necessary to emulate MLIL_ADD instruction to handle pointer arithmetic operations. Additionally, it is also important to emulate instructions such as MLIL_SUB, MLIL_LSR and MLIL_AND to handle certain pointer-aligning operations in case of optimizations. Here is an example of how these MLIL SSA expressions are resolved to log a memory store operation:

Considering the SSA variable rax_43#65 as a manually propagated pointer value, it is possible to resolve the destination of the store operation as well as the size of the write. But when the value of the SSA variable rax_43#65 is not available, this memory is not associated with the pointer that was propagated by the caller and therefore not logged.

Handling pointer-aligning optimizations

When performing inter-procedure analysis, further optimizations were noticed in addition to the REP optimization as seen in the “Handling x86 REP optimization” section above. A variable allocated on the stack will usually be aligned to meet the needs of upcoming operations. Let’s say a stack pointer is passed to memset() and the compiler inlines the call as a REP instruction. In this case, it is very likely the memory will be allocated at an aligned address such that the fastest instructions can be used during REP operation.

However, when a pointer is received as an argument by a callee or as a return value of an allocator function, the compiler may have to generate pointer and size alignment opcodes which could rely on branching decisions before reaching REP instruction. Here is an example of such an optimization commonly found in the NetBSD kernel used for analysis:

Figure 7 - An example memset() optimization from NetBSD

When such branching decisions are involved, the pointer, as well as the size, can take multiple possible values (from the perspective of static analysis) at the point of REP instruction. This is different from what we observed in the “Handling x86 REP optimization" section where there is only one possible value for pointer and size. Our goal here is to find the actual value of the pointer and size in the absence of pointer-aligning computations. To achieve this, a couple of SSA expressions were identified that can be used to resolve the original value:

• Search for an expression involving (ADDRESS & BYTESIZE). This could be the first use of ADDRESS before taking any conditional branches.
• Search for an expression involving (SIZE >> 3). This is where the adjusted size is passed to a REP instruction.

I had a couple of ideas in mind to track back the above expressions from the point of REP instruction, one relying entirely on SSA and the other based on dominators:

• Use get_ssa_var_definition() and get_ssa_var_uses() APIs to get a variable’s definition site and its uses.
• Alternatively, get the dominators of the basic block containing the REP instruction and visit the instructions in the dominator blocks.

The function resolve_optimization() shown below uses dominators to get the basic blocks to perform the search operation. Since the pointer is manually passed by the caller, the value is fetched from the SSA variables.

In the case of a possible constant size value, we fetch the maximum from the list of available size values. Once both pointer and size values are available, we log the memory region as initialized.

Tracking memory stores in dynamic memory allocations

So far, all our analyses were concentrated on stack memory as the source buffer for information disclosure. This is largely due to the prevalence of stack memory disclosure bugs, as described in KLEAK: Practical Kernel Memory Disclosure Detection (PDF). What about other memory regions such as the heap? Can we model some of the heap memory disclosures too?

When looking for heap memory disclosures, the idea remains the same. We are still looking for calls to sink functions with known size value. But instead of the source pointer being RegisterValueType.StackFrameOffset, we check for RegisterValueType.UndeterminedValue. Consider the code for sys_statfs():

Figure 8 - Dynamic memory allocation in sys_statfs()

Here the kernel pointer rdi_1#2 in copyout() is undetermined because Binary Ninja does not know what the allocator function returns. However, by using the SSA form, we can manually track back whether rdi_1#2 is holding the return value of malloc(). For example, follow the highlighted instructions in Figure 8. - the variables are assigned as rax_1#1->r15#1->rdi_1#2. This information can be obtained programmatically using the MLIL get_ssa_var_definition() API. Once the definition site of an SSA variable is obtained, we can check whether the variable is initialized using a CALL operation as demonstrated below:

How does the analyzer know the definition of allocator functions? We can take the same approach used for providing information regarding static function hooks (see the “Static function hooks and memory writing APIs” section above). A JSON configuration with a list of allocator functions and an index of size parameters is provided to the analyzer. For any CALL instruction with a known destination (i.e., MLIL_CONST_PTR), the symbol is fetched to check for known allocator functions. Here is a sample JSON configuration used for analysis:

Once we have established the connection between the source pointer and allocator call, the next question is, what pointer value will be assigned as the return value of the allocator call? The stack pointers as tracked as negative offsets in Binary Ninja as seen below:

To have a generalized representation between the stack and heap pointers, I decided to set the return value of a heap allocator calls as a negative value of the size of the allocation. For the malloc() call in sys_statfs(), rax_1#1 is set to -0x1d8 as the starting address. Therefore, the memory region which needs to be initialized ranges from -0x1d8 to 0 [start + size of allocation]. Even when the allocation size is undetermined, starting address can be set to some arbitrary value such as -0x10000. All that matters here is to know whether the contiguous memory region accessed by copyout() is initialized or not.

Filtering memory stores using dominators and post dominators

A dominator in graph theory provides information on the order of execution of some basic blocks. While we have already used dominators for handling pointer-aligning optimizations in the “Handling pointer aligning optimizations” section, this section details the usage of dominators in detecting control flow-sensitive memory store operations.

To analyze uninitialized memory disclosures, we explore two ideas: dominators and post-dominators. A basic block X is said to dominate another basic block Y if all paths to Y should go through X. A basic block Y is said to post-dominate basic block X if all paths from X to any of the function’s return blocks should go through Y. Consider this example from Wikipedia:

Figure 9 - Graph demonstrating dominators and post dominators

In the provided graph, node B dominates nodes C, D, E, and F because all paths to these nodes must go through node B. By definition, every node dominates itself, so the set of all nodes dominated by node B will be B, C, D, E, and F. Also, node A dominates all the nodes in the graph. Therefore, the dominators of nodes C, D, E, F are A and B.

Similarly, when A is considered as the function entry node, with E and F being exit nodes, node B is the post-dominator of node A. This is because all paths from A to the exit nodes must go through B.

Now, how can dominators and post-dominators help us in this analysis?

We can perform dominator analysis on the callers of the sink function. The idea is to log only memory stores in basic blocks which dominate the basic block calling copyout(), that is, basic blocks which will be executed irrespective of branching decisions. Consider the code below:

Figure 10 - Dominators of basic block calling copyout()

Here the basic block calling copyout() is <mlil block: [email protected]> and there are five dominator blocks in the path from the function entry to copyout(). When performing dominator-based analysis, we will log only memory stores within these five dominator blocks. The memory store operations in other basic blocks might be skipped and not execute. The same is the case with the callee function. We will perform an inter-procedure analysis only when the function is called from a dominator block.

Post-dominator analysis is done on the callee function during an inter-procedure analysis. It is meant to find bugs where a callee can possibly return before initializing the memory region it is supposed to. Consider the callee function do_sys_waitid() from figure 10.

Figure 11 - Post dominators of function entry block in do_sys_waitid()

The function entry block <mlil block: [email protected]> is always executed. The other basic blocks that are executed irrespective of the branching decisions are <mlil block: [email protected]> and <mlil block: [email protected]>. Once again, memory stores and callee analysis are limited only to these three basic blocks.

Dominator- and post-dominator-based analysis tries to fill the gaps in control flow insensitive analysis performed by the analyzer. The general assumption here is that memory is initialized or cleared before performing further operations and therefore dominates other basic blocks. However, this assumption is not always true. For example, there are cases where individual code paths can perform the same operation as done in the dominators. Moreover, when a callee returns due to any error condition, the return value could be validated by the caller before calling copyout(). Consequently, dominator-based analysis as done in this implementation is prone to large numbers of false positives.

Checking for uninitialized memory disclosures

Once all the memory store operations are statically logged with information on offset and size of write, the memory region copied out to user space using copyout() can be evaluated for uninitialized memory disclosure. Consider the call to copyout() shown below:

The source pointer is -0x398 and the size copied is 0x330 bytes. Therefore, the analyzer has to validate if all the bytes in the memory range from -0x398 to (-0x398 + 0x330) are initialized, and if not, flag that as a bug.

False positives and limitations

The analyzer is written with the goal of finding memory regions that never get written to in any possible code paths. False positives occur in cases when it is unable to track a memory store operation. Below are some common false positive conditions and limitations of the implementation:

• The analyzer does not emulate branching instructions. Therefore, false positives are seen in code constructs involving control flow decisions. Consider a memory region such as an array that is initialized in a loop operation. In this case, the store operation would be detected only once because the loop body is visited only once by the analyzer, and not in a loop as it would be during execution.

• Indirect calls are not resolved statically. Consequently, any memory store done during indirect calls is not tracked.

• Optimizations may make it harder to track memory stores. Some common optimizations noticed were tackled in the “Handling x86 REP optimization” and “Handling pointer aligning optimizations” sections.

• Binary Ninja may wrongly detect type information of functions used for static hooking or sink functions like copyout(). Since our analysis relies on RegisterValueType information, any failure to accurately detect the function prototype may lead to false results. Verify the type information before analysis and update if necessary.

• The analyzer looks only for code patterns where the memory source and sink function are within the same function. There is no tracking back of memory source beyond the local function scope.

• Dominator analysis is experimental. You should use it only as a guideline to perform code reviews.

When there is access to source code, some of these false positives can be resolved by changing the optimization flags or by unrolling loops to reduce branching decisions.

Analysis and results

The target kernel executable is loaded in Binary Ninja to generate the BNDB analysis database. Then the analyzer is executed against the database for faster analysis. There are a couple of scripts: one for analyzing stack memory disclosures and another for analyzing sink functions with known size and unknown source pointer. Since the source pointer could be from a heap allocator, provide a JSON configuration with a list of allocator functions as an argument. The dominator analysis is experimental. You need to enable it using an optional argument when needed:

Conclusion

The scripts were tested on Binary Ninja version 2.4.2846 against FreeBSD 11.4, NetBSD 9.2, and OpenBSD 6.9 kernels. Amongst the results, code paths that are possibly reachable for an unprivileged user were evaluated. The OpenBSD bugs were found in sysctls related to multicast routing in IPv4 as well as IPv6, which are tracked as ZDI-22-073 and ZDI-22-012 respectively.

The four vulnerabilities (ZDI-22-075, ZDI-22-1036, ZDI-22-1037, ZDI-22-1067) found in NetBSD are related to syscalls supporting backward compatibility for older NetBSD releases ZDI-22-075 and ZDI-22-1036 are information disclosures in VFS syscalls for NetBSD 3.0 and NetBSD 5.0 respectively. Details regarding the fixes can be found here. Next, ZDI-22-1037 is an information disclosure in getkerneinfo syscall for NetBSD 4.3. This bug was fixed with many other potential issues as seen here. Finally, ZDI-22-1067 is another information disclosure related to VFS syscalls but in NetBSD 2.0 compatibility. Details regarding the fix can be found here.

The FreeBSD bug found in version 11.4 was also related to compatibility, which in this case for supporting 32-bit binaries. However, this bug was fixed without a disclosure during a large change done for the 64-bit inode. The uninitialized structure fields were cleared in the copy_stat function as part of the 64-bit inode project. Though this commit was in May 2017, it was tagged to release 12.0 and above. Therefore, the bug remained unfixed in release 11.4 until it reached EOL in September 2021, soon after our bug report.

Putting it together, most of the bugs were found in BSD’s compatibility layers. Additionally, all these bugs are stack memory disclosures. For anyone interested, the source code for the project can be found here.

 You can find me on Twitter @RenoRobertr, and follow the team on Twitter or Instagram for the latest in exploit techniques and security patches.

 

Acknowledgments and references

— Various blog posts from Trail of Bits on Binary Ninja
— Josh Watson for various projects using Binary Ninja. The visitor class implementation is based on emilator
— Jordan for all the code snippets and the Binary Ninja slack community for answering various questions
KLEAK: Practical Kernel Memory Disclosure Detection by Thomas Barabosch and Maxime Villard
Solving Uninitialized Stack Memory on Windows by Joe Bialek
Building Faster AMD64 Memset Routines by Joe Bialek

MindShaRE: Analyzing BSD Kernels for Uninitialized Memory Disclosures using Binary Ninja

Looking at Patch Gap Vulnerabilities in the VMware ESXi TCP/IP Stack

27 July 2022 at 15:14

Over the last few years, multiple VMware ESXi remote, unauthenticated code execution vulnerabilities have been publicly disclosed. Some were also found to be exploited in the wild. Since these bugs were found in ESXi’s implementation of the SLP service, VMware provided workarounds to turn off the service. VMware also disabled the service by default starting with ESX 7.0 Update 2c. In this blog post, we explore another remotely reachable attack surface: ESXi’s TCP/IP stack implemented as a VMkernel module. The most interesting outcome of this analysis is that ESXi’s TCP/IP stack is based on FreeBSD 8.2 and does not include security patches for the vulnerabilities disclosed over the years since that release of FreeBSD.

This result also prompted us to analyze the nature of vulnerabilities disclosed in other open-source components used by VMware, such as OpenSLP and ISC-DHCP. Once again, we observed that most of the disclosed vulnerabilities had upstream patches before the disclosure.

The FreeBSD Code in the ESXi TCP/IP Stack

An initial analysis was done to locate the VMkernel module implementing the TCP/IP stack. In ESXi, the module details can be enumerated using esxcfg-info or esxcli system module commands as seen below:

Checking the function names and strings available in the module, it is understood that the code is coming from FreeBSD, as shown in the following:

While the exact version of FreeBSD was not initially known, comparing the binary against the patches mentioned in advisories revealed multiple missing fixes. I performed a manual diff of FreeBSD branches 8.x, 9.x, and 10.x against the VMkernel module to narrow down the version of code used. Consider a couple of MFC (Merge From CURRENT) commits for example: 213158 and 215207. The 213158 commit was merged to FreeBSD 8.2 and later, whereas 215207 was merged to FreeBSD 8.3 and later. While 213158 was found in the VMkernel module, the 215207 commit was missing. This gave an approximate timeline of when the code was forked, which is around FreeBSD 8.2.

Porting Type Information from FreeBSD to the ESXi VMkernel Module

Once the version of the FreeBSD source is known, we can port type information from FreeBSD 8.2 to the VMkernel module. The goal here was to make function prototype and structure information available in IDA. Since creating type libraries for multiple dependent headers can be complex, I decided to rely on the FreeBSD kernel binary compiled from source with DWARF information. My first choice was to use IDA Pro’s Dump typeinfo to IDC feature. However, there were still missing structures because only assembler-level types were exported to the IDC.

The workaround for the problem is to use the TILIB idea by @rolfrolles. Using the above-mentioned technique, it is possible to create a type library from the FreeBSD 8.2 kernel binary and then load it into the VMkernel module’s IDB. This imports all the C-level types required for further analysis with the decompiler. While the Structure and Local Types view does not list all imported type information, we can apply the function prototypes as detailed in a previous blog post.

As a recap, begin by extracting the function prototype from the FreeBSD kernel as a key value pair of function_name:function_type, then iterate through the extracted type information and apply it to the VMkernel module having symbols.

Once the prototypes are applied, the Structure and Local Types views are automatically populated. Any other types required for local variables can be further added manually using the workflow mentioned in Importing types into IDB.

Figure 1 - VMkernel module before (left) and after (right) adding FreeBSD type information

Analyzing VMkernel Debug Information from VMware Workbench

VMware Workbench is an Eclipse IDE plugin provided by VMware. Amongst many other features, it also provides debug symbols for VMkernel. Jan Harrie and Matthias Luft have documented the usage of the VMkernel debug information in their research entitled Attacking VMware NSX [Slides 34 – 37 in the PDF]. The UI of the Eclipse plugin is shown below:

Figure 2 - List of VMkernel debug symbols available in VMware Workbench

Kernel modules in ESXi communicate with VMkernel using VMK APIs. By using a VMkernel build that has debug information, it becomes easier to understand the VMK API calls made by any VMkernel module, which includes the TCP/IP stack. Moreover, the type information from VMkernel can be ported to any desired VMkernel module or even to the Virtual Machine Monitor. Here is an example of type information ported to the TCP/IP module found in ESXi 6.7 Update 3 using debug information from VMkernel version 14320388:

Figure 3 - TCP/IP module before (left) and after (right) adding VMkernel type information

As soon as the function prototypes are applied to the binary, IDA helpfully propagates the type information to many of the variables found in the code. When working with code forked from FreeBSD, it is useful to place the Hex-Rays decompilation side-by-side with known FreeBSD source code to rename variables and add type information to variables that IDA did not detect automatically . However, in this case, we do not have access to local variable names or types. While some variables are given meaningful names by IDA, the rest were manually added using context information from the detected enums, structure names, and so forth.

VMware seems to have stopped publishing VMkernel debug information for more than a year now. The last public release for which I could find the debug information goes back to ESXi 6.7 16773714.

Missing FreeBSD Patches

At this point, we had enough information to perform a diff of the TCP/IP VMkernel module against the FreeBSD security advisories. Patches provided in the advisories were compared against ESXi’s version of the module to identify the missing bug fixes. Based on static analysis, patches for 16 vulnerabilities fixed in FreeBSD were found to be missing in ESXi. We reported these missing patches to VMware for further evaluation.  Shown below is the set of missing patches that we reported to VMware and their corresponding applicability to ESXi as evaluated by VMware:

CVE Disclosure Date Bug Applicable
CVE-2013-3077 8/22/13 Integer overflow in IP_MSFILTER Yes
CVE-2013-5691 9/10/13 Insufficient credential checks in network IOCTL No
CVE-2004-0230 9/16/14 Denial of Service in TCP packet processing Yes
CVE-2015-1414 2/25/15 Integer overflow in IGMP protocol Yes
CVE-2015-2923 4/7/15 Denial of Service with IPv6 Router Advertisements Yes
CVE-2015-5358 7/21/15 Resource exhaustion due to sessions stuck in LAST_ACK state Yes
CVE-2015-1417 7/28/15 Resource exhaustion in TCP reassembly No
CVE-2018-6916 3/7/18 IPSec validation and use-after-free Yes
CVE-2018-6918 4/4/18 IPSec crash or denial of service Yes
CVE-2018-6922 8/6/18 Resource exhaustion in TCP reassembly No
CVE-2018-6923 8/14/18 Resource exhaustion in IP fragment reassembly No
CVE-2019-5608 8/6/19 ICMPv6 / MLDv2 out-of-bounds memory access Yes
CVE-2019-5611 8/20/19 IPv6 remote Denial-of-Service Yes
CVE-2020-7451 3/19/20 TCP IPv6 SYN cache kernel information disclosure Yes
CVE-2020-7457 7/8/20 IPv6 socket option race condition and use after free Yes
CVE-2020-7469 12/1/20 ICMPv6 use-after-free in error message handling Yes

These issues were addressed in ESXi 7.0 Update 3f as well as ESXi 6.5 and ESXi 6.7.

Analysis of OpenSLP in ESXi  

The next component considered for analysis is the SLPD service in ESXi. ESXi’s SLP service is a fork of OpenSLP version 1.0.0. The version information and build configuration details can be fetched as below:

Just like the VMkernel modules, the SLPD service executable has symbols but not type information. Using the available function names, we observed that VMware’s forked code is not entirely based on version 1.0.0. It also has some of the functions added later, including some from version 2.0.0. There are also differences in some of the structure definitions as well as function prototypes. These observations can be made as soon as you port type information from OpenSLP and certain functions won’t decompile cleanly.

To fetch the type information, build OpenSLP from source and then use IDA Pro’s Create C header + Parse C header feature to populate the local types. The C header parses without much error. This is a quick workaround compared to creating type libraries from multiple dependent headers. Note that ESXi versions 7.0 and above run the SLPD service as a 64-bit executable. Because of this, care must be taken while using the created C header interchangeably between ESXi 7.0 and other versions. This is especially true as the header may require some modifications on data sizes, such as size_t being unsigned int vs unsigned long long.

Timeline of Upstream Patches vs Disclosures:

In the case of OpenSLP, there is enough public information regarding the vulnerabilities affecting ESXi and their relevant upstream patches, most notable being the one published by Lucas Leong on CVE-2020-3992 and CVE-2021-21974. For the scope of this blog, we will consider the following set of remote code execution vulnerabilities:

CVE VMSA Bug
CVE-2015-5177 VMSA-2015-0007 Double Free
CVE-2019-5544 VMSA-2019-0022 Heap Overflow
CVE-2020-3992 VMSA-2020-0023 Use-After-Free
CVE-2021-21974 VMSA-2021-0002 Heap Overflow

CVE-2015-5177 is a double-free vulnerability that occurs when processing a message buffer across functions: SLPDProcessMessage, ProcessDAAdvert, and SLPDKnownDAAdd. An upstream patch to fix the issue in SLPDKnownDAAdd() was already available.

Similarly, CVE-2020-3992 is a use-after-free vulnerability, again found in the handling of message buffer across functions SLPDProcessMessage, ProcessDAAdvert, and SLPDKnownDAAdd. However, this vulnerability does not affect the upstream version of OpenSLP. This bug can be traced back to CVE-2015-5177 based on the information provided in VMSA.

VMware released updates in the form of the VIB (vSphere Installation Bundle). Installing the VIBs one by one on ESXi to perform patch diffing of SLPD binary can be a tedious process. Instead, binaries can be directly extracted from the VIB to speed up the process. The relevant VIBs can be downloaded from the ESXi Patch Tracker, which lists them based on the timeline of release. Consider a diff between ESXi 6.7 build 8169922 and 8941472:

Figure 4 - ESXi Patch Tracker with VIBs

To replicate this, download the esx-base VIB to extract the slpd binary using the vmtar utility. The vmtar utility can be copied from ESXi and executed in Ubuntu since it does not have any ESXi-specific dependencies:

Similarly, in the case of the ESXi 6.7 GA ISO image, extract the S.V00 file from VMware-VMvisor-Installer-6.7.0-8169922.x86_64.iso:

A patch diff between slpd binaries confirms that CVE-2020-3992 was introduced when patching CVE-2015-5177 in ESXi versions 6.0 and above. The fix for CVE-2015-5177, in this case, was different from that of the working upstream fix in SLPDKnownDAAdd():

Figure 5 - Bindiff between ESXi 6.7.0 8169922 and 8941472

CVE-2020-3992 was in fact fixed twice (VMSA-2020-0023.1) since the initial patch did not completely address the issue. Additionally, this bug was also reported to be exploited in the wild by the Cybersecurity and Infrastructure Security Agency (CISA).

CVE-2021-21974, a heap overflow in SLPParseSrvUrl() leading to RCE, also had an upstream patch. This fix was part of a large code change in OpenSLP.

CVE-2019-5544 was the only bug that affected both the upstream and the VMware fork of OpenSLP at the time of disclosure. Though the bug has already been disclosed and fixed by many distros, the patch for this bug is still missing in the OpenSLP GitHub repository. This was another bug exploited in the wild.

Putting it all together, here is the timeline of disclosure vs availability of patches:

CVE Disclosure Date Patch Availability Patches
CVE-2015-5177 10/1/15 6/8/11 53bbd77
CVE-2019-5544 12/5/19 N/A N/A
CVE-2020-3992 10/20/20 6/8/11 53bbd77
CVE-2021-21974 2/23/21 3/1/06 3c8e45d

Analysis of ISC-DHCP in Workstation

The next component considered for analysis is the DHCP service of VMware Workstation, which is based on ISC-DHCP version 2.0. You can refer to the blog post Wandering through the Shady Corners of VMware Workstation/Fusion for an earlier analysis of this attack surface. It is noted that, though the codebase is a couple of decades old, VMware has backported fixes for known vulnerabilities. Keeping this in mind, only the recently disclosed bugs, CVE-2019-5540 and CVE-2020-3947, were analyzed.

Unlike earlier analysis where the executables had symbols, the vmnet-dhcp is completely stripped. The first task is to identify as many function names as possible to carry out the analysis. Looking for binaries with symbols led us to the very first release – VMware 1.0 Build 160. The vmnet-dhcpd binary from the first release of VMware had symbols and type information in stabs format.

For matching the available symbols with recent releases of the Workstation DHCP server, I relied on the updated version of Rizzo published by Quarkslab. Since Rizzo can be very slow on large binaries, only the string reference signatures were used in detection. With this, Rizzo identified around 69 functions using 226 matching strings available in VMware 1.0 Build 160 vmnet-dhcpd binary. A somewhat similar result can be achieved using the strings in ISC-DHCP 2.0. For anyone interested, the historical releases of ISC-DHCP can be found here.

During the patch diffing process, bug fixes in vmnet-dhcpd are easy to spot since the binary undergoes very limited changes per release. The first bug, CVE-2019-5540, is an information disclosure vulnerability due to an uninitialized field in the ICMP packet structure. This issue can be narrowed down to an upstream patch in the function icmp_echorequest().

The next bug, CVE-2020-3947, is a use-after-free vulnerability when handling the DHCPRELEASE packet. During the handling of a DHCPDISCOVER packet from a client, ISC-DHCP allocates memory in the heap. If the client identifier is long, it stores the resultant pointer as part of the lease structure. Otherwise, uid_buf is used for storage. The relevant code can be found in the ack_lease function of the server/dhcp.c source file:

Later when a DHCPRELEASE packet is received, the lease structure is adjusted. First, a copy of lease is made in the function release_lease and then supersede_lease is called. This code can be found in common/memory.c:

Note that the uid pointer is now found in both the old and the newly copied lease structures when it is passed to supersede_lease. In supersede_lease, the comp->uid pointer is freed (provided that it points to memory other than uid_buf, that is, in the case of a long client identifier):

Later, lease->uid entry is reused. See the code below assigning comp -> uid = lease -> uid, which by this time points to freed memory:

Finally, the comp structure that has a pointer to the freed uid buffer is passed to write_lease where the use-after-free happens. The bug can be triggered by adding a long dhcp-client-identifier entry to the DHCP client configuration and then initiating a DHCP request followed by a DHCP release using dhclient:

This issue was fixed in the release_lease function as part of a bigger code change by introducing a lease_copy function. The copied lease structure no longer holds a reference to the uid buffer of the old lease. Instead, a new allocation is requested and assigned. The release_lease function received further changes during later updates for handling the failover protocol. VMware’s fix for this bug is different from that of the upstream, possibly due to the divergence of the codebase.

Here is the timeline of disclosure vs availability of patches:

CVE Disclosure Date Patch Availability Patches
CVE-2019-5540 11/12/19 6/25/98 34436cc
CVE-2020-3947 3/12/20 5/17/00 20916ca

The IDA Python scripts used for the analysis can be found here.

Conclusion

Over the last few months, we analyzed three different VMware components: the ESXi networking stack, the ESXi SLPD service, and the Workstation DHCP server. The three components share the fact that they are forks of open-source code. Unlike open-source packages that can be updated to a new version with relative ease, updating a fork takes significantly more effort. This is because various challenges arise when cherry-picking patch commits.

In the ESXi FreeBSD networking stack, even the vulnerabilities with known CVE identifiers remained unfixed – the forked code was never tracked for vulnerabilities. Better tracking of FreeBSD disclosures is a good place to start and can help resolve some of the issues. Also, considering the criticality of the component, it might be possible for VMware to work with FreeBSD Security Officers to get prior information on vulnerabilities as per their Information handling policies:

The FreeBSD Security Officer has close working relationships with a number of other organizations, including third-party vendors that share code with FreeBSD…..Frequently vulnerabilities may extend beyond the scope of the FreeBSD implementation, and (perhaps less frequently) may have broad implications for the global networking community. Under such circumstances, the Security Officer may wish to disclose vulnerability information to these other organizations.”

In most cases, vendor advisories only list the supported versions of the affected software. Even if a legacy version is not listed in the advisory, the patch might still be applicable for the forked code.

In the case of OpenSLP and ISC-DHCP forks, the patch gap vulnerabilities come from bugs that do not have CVE identifiers. Some of the bug fixes have clear commit messages about the issue, whereas other fixes are done as part of large code changes or refactoring. Either way, it is a challenging process and requires a trained set of eyes to pick up security patches when the vulnerabilities are not explicitly called out. This also shows the significance of requesting CVE identifiers for security fixes since companies largely rely on them for cherry-picking patches.

As we showed, a couple of ESXi SLP patches had further issues despite the availability of working upstream fixes. To avoid this, it is important to evaluate a bug report not just against the fork but also against the upstream project. If a fix is found, developers should evaluate the upstream fix. Upstream fixes in most cases are a safe bet compared to that of custom patches, which may be written without having a comprehensive understanding of the code. However, this process of adopting patches can be further complicated if the codebases have diverged considerably.

The most important concern with forked code is the lifetime of bugs. When open-source code is used and updated as a package to the latest version, all the bug fixes are applied (i.e., vulnerabilities with or without CVE identifiers). In a forked codebase, when a patch is missed during cherry-picking, it will remain unfixed until the code is updated as a whole or someone finds and reports the issue. For example, the ISC-DHCP bugs were fixed in VMware’s DHCP server a couple of decades after the upstream fixes were available. In fact, the ISC-DHCP uninitialized memory information disclosure was fixed upstream even before the launch of CVE. Overall, forked code requires more attention compared to that of packages.

Considering the challenges of adopting security fixes and gaps in the tracking process, exploitable bugs might be still lying around without a fix. There are different types of patch gaps as well, so expect to hear more about these in the future. Until then, you can find me on Twitter @RenoRobertr, and follow the team for the latest in exploit techniques and security patches.

Looking at Patch Gap Vulnerabilities in the VMware ESXi TCP/IP Stack

Clang Checkers and CodeQL Queries for Detecting Untrusted Pointer Derefs and Tainted Loop Conditions

23 February 2022 at 17:49

In the first blog of the series, we saw how CodeQL and Clang checkers can be used to find bugs in MySQL Cluster involving untrusted size arguments leading to buffer overflow or out-of-bound array accesses. Though the majority of bugs were out-of-bounds array accesses, there were also bugs involving untrusted data being used in loop conditions or casted to pointers. In this final blog of the series, we experiment with CodeQL’s IR and Clang checkers for detecting such bug classes.

Defining taint sources - CSA vs CodeQL

The first part of the problem is defining the taint sources. Clang Static Analyzer (CSA) provides an experimental checker alpha.security.taint.TaintPropagation for performing static taint analysis. Implemented as GenericTaintChecker, it has a set of built-in taint sources. Since the Signal data used by message handlers does not come directly from these built-in taint sources, it is necessary to find an alternative approach to define taint sources. Lucas Leong had a quick approach to work around the problem - rewrite all access to signal->theData with getenv(). Since the return value of getenv() is considered tainted by GenericTaintChecker, taint propagation works after rewrite. Refer to his blog post “MindShaRE: When MySQL Cluster Encounters Taint Analysis” for details on using existing taint checkers and CodeQL to find memory corruption bugs.

One has to be careful with this approach to avoid missing potential taint sources. Though MySQL Cluster has defined APIs like Signal::getDataPtr() to fetch the Signal data pointer, various handlers don’t follow the standard. Instead, signal->theData is accessed in numerous other ways from offset 0 or from various other offsets. I rewrote some common patterns using find and sed. This may not be exhaustive but provides sufficient taint sources to experiment with the checkers.

The situation is much different when working with CodeQL, where defining taint sources is far more flexible. All we have to do is override the isSource() predicate to something like this:

CWE-822: Untrusted Pointer Dereference

In an untrusted pointer deference vulnerability, data from an untrusted source is casted to a pointer for further use. The impact of the bug depends on the usage of the casted pointer. This section provides details about detecting such pointer load operation using CSA and CodeQL.

Detecting untrusted pointer dereference using CSA

To detect this vulnerability using CSA, I relied on path-sensitive checker callback check::Location which fires every time a program accesses memory for read or write. The information below is from the checker documentation:

The SVal symbolic value Loc points to the memory region being accessed. IsLoad indicates whether the access is a read or a write. To check if a memory load is happening from a tainted memory region, we can check if Loc is tainted. If so, we can use the statement S to get the type of the expression:

Whenever the expression type is a pointer, it is logged as a bug. Scanning MySQL Cluster with this checker gave about 86 results, which is significantly high and also has false positives. Here is what the scan results look like:

The large number of results is due to the fact that any pointer derived from a tainted value (via pointer arithmetic, array indexing, etc.) is also considered tainted, irrespective of whether the tainted value is constrained by validation. In some cases, OOB reads which load pointers are also reported. Understanding the nature of false-positive results significantly improves the triage experience.

Considering the high number of results, we can sort the results by “File” which is likely to organize the result by feature, or for a less familiar code base, sorting by “Path Length” can help identify code paths reachable with minimum complexity.

scan-view results sorted by File

scan-view results sorted by Path Length

CSA reported a number of bugs in the code handling DBDIH block with a path length of 1. Here is an example bug report:

Using a known bug pattern, it is possible to apply filters to reduce the results in addition to the taint check. For example, we can check the AST statements for pointer loads which are only assignment statements like above. The checker can be tweaked as per the nature of bugs in the codebase.

Analyzing memory loads using CodeQL IR

To perform a similar analysis in CodeQL, one can possibly rely on expression classes such as PointerDereferenceExpr, VariableAccess, FieldAccess and its subtypes. But in this case, I was curious to explore CodeQL’s Intermediate Representation (IR) to track memory loads. Since the documentation around CodeQL’s IR is limited, I used the Instruction class to dump the IR for further analysis. Here is a partial dump of IR for the line EmulatedJamBuffer * jambuf = (EmulatedJamBuffer*)req->jamBufferPtr in function execDIH_SCAN_TAB_REQ():

The CodeQL query used to dump the IR along with source code information:

Consider the below set of instructions in the IR:

Here the LoadInstruction relies on 2 operands - a source address operand (r17303_4) and a source value operand (m17300_12). The source value operand is a MemoryOperand. The MemoryOperand describes the memory address accessed by the instruction, whose value gets copied to the result register (r17303_5). Moreover, the result is also associated with type information. The idea was to filter all tainted memory load operations whose return type is a pointer. However, such a query resulted in an unusably large number of results.

Adding constraints using overlap relationship and virtual variables

Consulting CodeQL’s IR SSA Construction documentation, I found a couple of features that could be useful for refining the query: the overlap relationship and VirtualVariables.

Overlap defines the relationship between the definition of a memory location and its usage. The most interesting relationship for this analysis is MustExactlyOverlap - the set of bits written by the definition is identical to the set of bits read by the use, and the data type of both the definition and the use are the same. For more clarity on the overlap relationship, refer to the SSA test case found in the repository. In the case of untrusted pointer load bugs, it is very likely that there will be a mismatch of data type between definition and usage (type casting) or that the number of bits read will be a subset of the definition. Therefore, we can skip Load operations where the source value operand has an exactly overlapping relationship with its definition. The getOperandMemoryLocation() predicate provides information regarding the MemoryLocation read by a memory operand. Consider the output of the following query:

The memory operands that are marked with tilde “~” are the ones that do not have an exactly overlapping relationship. This can also be checked using isDefinitionInexact() predicate. Putting all this together, we can override the isSink() predicate to get a meaningful number of results to work with.

Many of the false-positive results were due to multiple load operations performed using some misidentified pointer. These results can be either skipped or removed by adding further constraints to the query, such as ignoring certain variable names in the sink or checking for AST patterns.

In the case of MySQL Cluster, one such constraint explored for filtering memory access to the Signal structure is VirtualVariable. As per the documentation, “Each MemoryLocation is associated with exactly one VirtualVariable. A VirtualVariable represents a set of MemoryLocations such that any two MemoryLocations that overlap have the same VirtualVariable.” VirtualVariable information can be fetched from a MemoryLocation using the getVirtualVariable() predicate.

When the variable names are consistent, we can add a constraint to consider only Load operations where the memory operand points to the signal variable. A more generic option is to fetch the type information of the virtual variable to check if it is a Signalstructure. Such a query is still restrictive (not as much variable name) but significantly reduces false positives and returns results involving jamBufferPtr:

CWE-606: Unchecked Input for Loop Condition

In CWE-606: Unchecked Input for Loop Condition, values from an untrusted source are used for loop termination conditions. This may lead to a DoS or other issues depending on the operations done in the loop body. This section provides details about detecting such tainted loop conditions using CSA and CodeQL.

Detecting tainted loop condition using CSA

Unlike tainted memory loads, I couldn’t find any path-sensitive callback to trigger on loop conditions. Moreover, AST-based matchers without path-sensitivity are not useful in this case. Therefore, I relied on the check::BranchCondition callback which fires every time a control flow branching occurs during analysis. The following information is from the checker documentation:

The idea here is, whenever the callback triggers due to a conditional statement, walk up the AST using ParentMap and check for a loop statement. Here is what an example AST looks like for while loop:

For loop operations we are only interested in 3 AST statement classes – WhileStmtClass, DoStmtClass and ForStmtClass. Below is the loop detection code used in the checker:

Once we know that the condition statement triggering the callback is part of a loop statement, we can check if it is tainted. Specifically, I wanted to look for some common code patterns involving induction variables compared against untrusted values, or untrusted values used as induction variables to decide on loop termination. Consider a couple of common code patterns below that involve explicit binary comparison operations:

•      The induction variable is compared against a potentially untrusted value. For example, RHS could be tainted and LHS is not.
•      The induction variable could be potentially untrusted. Here LHS value is compared against constant 0 on RHS.

 The checker specifically looks for these cases to make decisions on bugs. When RHS is constant 0, check if LHS is tainted. Otherwise, check if RHS is tainted:

Another common code pattern is unary operations in loop statements, especially while and do while loops:

No special breakdown is necessary in case of unary operations in the way we handled binary operations. However, in certain cases, implicit Boolean conversions result in implicit conditional statements received by the check::BranchCondition callback. Here is what the AST looks like:

To handle these, if any non-binary conditional operations have a Boolean outcome, it is possibly due to implicit casting and hence we get the sub-expression associated with the expression.

Evaluating constraints on tainted variables

While it is possible to know if an input is tainted or not, there is no way to know if the tainted input value is constrained by some validation. Without this information, the checker is going to generate a lot of false-positive results. To solve this problem, I used the solution provided by Andrew Ruef in the blog post Trail of Bits - Using Static Analysis and Clang to Find Heartbleed. Since Signal data is treated as an unsigned 32-bit integer in most cases, I queried if the variable can take a value greater than 0x10000. If yes, consider the variable as unconstrained and log the bug.

Configuring analyzer-max-loop in CSA

Clang analyzer has a configuration parameter to choose the number of times a basic block in a loop gets executed. By default, this value is 4.

So how does it matter in our analysis? Consider the below code:

Here is buffer->index is validated and the loop operation is considered safe. But the checker still reports a false positive bug.

What seems to be happening here is, the symbolic expression gets evaluated analyzer-max-loop number of times i.e., for each visit to the basic block.

The analyzer seems to evaluate that the decrement operation can underflow resulting in a value greater than 0x10000, therefore reporting it as a bug. I’m not entirely sure of the right way to solve this issue, but a quick workaround is to set analyzer-max-loop to 1. This can also be done in scan-build using the maxloop parameter. In this case, the expression is evaluated only once and it does not report the bug.

        (reg_$3<unsigned int Element{SymRegion{conj_$2{char *, LC2, S26616, #1}},0 S64b,struct index_buf}.index>) > 65536U

The scan reported around 46 bugs, with some valid ones including previously found issues like ZDI-CAN-14488, ZDI-CAN-15120, ZDI-CAN-15121.

Here is the example bug report for vulnerability in Dbdict::execLIST_TABLES_CONF():

Detecting tainted loop condition using CodeQL

Using the analysis done earlier for CSA, an equivalent query in CodeQL can be implemented using the Loop class. The isSink() predicate essentially looks as follows:

Building and Performing the Scan

In order to build clang with custom checkers, copy the source files to the clang/lib/StaticAnalyzer/Checkers directory in the LLVM toolchain. Then add the filenames to the CMakeLists:

Also update the clang/include/clang/StaticAnalyzer/Checkers/Checkers.td file to add the package information:

Once you make clang, the new checkers will be listed alongside other alpha checkers.

Now we can use scan-build to perform the analysis. Use the maxloop and use-analyzer configurations if necessary.

Building with checkers can be slow. To speed up the scan, limit the target path by either modifying the Makefile as necessary or use the custom Makefile by Lucas. All this analysis was performed on clang 12.0 and MySQL Cluster 8.0.25

Scanning with CodeQL requires creating a database, followed by running any queries we are interested in against the created database.

The source code for the Clang checkers and CodeQL queries can be found here.

Acknowledgments and References

•      MindShaRE: When MySQL Cluster Encounters Taint Analysis
•      Clang Static Analyzer - A Checker Developer's Guide
•      A Memory Model for Static Analysis of C Programs
•      Using Static Analysis and Clang to Find Heartbleed
•      Source code of existing clang checkers
•      Vulnerability hunting with Semmle QL
•      CodeQL IR SSA Construction

Conclusion

We hope you’ve enjoyed this look at finding bugs using Clang Static Analyzer, CodeQL, and Binary Ninja. As ZDI Vulnerability Analysts, these have proved helpful in finding new bugs. If you use these (or other) tools to find bugs of your own, consider submitting them to our program. Until then, you can find me on Twitter @RenoRobertr, and follow the team for the latest in exploit techniques and security patches.

Clang Checkers and CodeQL Queries for Detecting Untrusted Pointer Derefs and Tainted Loop Conditions

Static Taint Analysis using Binary Ninja: A Case Study of MySQL Cluster Vulnerabilities

15 February 2022 at 17:04

Taint analysis is an effective technique for finding vulnerabilities, even in large codebases. My colleague, Lucas Leong, recently demonstrated how Clang Static Analyzer and CodeQL can be used to model and find vulnerabilities in MySQL NDB Cluster using taint analysis. Largely inspired by his work, I wanted to try something similar but using Binary Ninja since it can also work with closed-source programs.

These are a few things I had in mind while working:

•      Identify vulnerabilities due to uses of untrusted values without bounds checking
•      Taint propagation and filtering should be control-flow sensitive
•      Support inter-procedure analysis

I approached this as a graph reachability problem, for which Tainted Flow Analysis on e-SSA-form Programs served as an excellent reference. All the analysis in this article is based on MySQL Cluster 8.0.25 and Binary Ninja 2.4.2846.

Defining Taint Sources

To get taint analysis working, it is essential to define the taint sources clearly. MySQL Cluster has a message passing architecture, and interesting taint sources are the messages themselves. This section provides details on message handling and how the message handlers can be identified for analysis.

MySQL NDB Cluster Signals

MySQL NDB Cluster defines functionalities as “blocks” and messages passing between them as “signals”. The NDB blocks are implemented as C++ classes, and each block registers multiple signal handlers during initialization, which are also methods of the class. Most of the vulnerabilities reported were in these message handlers, also known as signal handlers.

All the blocks inherit the SimulatedBlock class to register their signals using addRecSignal(), which invokes the SimulatedBlock::addRecSignalImpl() method. The registered signal handlers are of type ExecSignalLocal, which takes a single argument. Interested readers can refer to blog posts on Ndb software architecture by Frazer Clement and Code Reading Notes – MySQL Cluster by Steinway Wu for further details. The scope of this article is limited to entry points to signal handlers. Below is an example of the code of the NDBCNTR block registering a signal handler:

The “Signal” object that each handler receives contains untrusted data. The signal handlers can access the data as signal->getDataPtr() or with a few other methods. The handlers can also further pass the “Signal” object to other functions. There are a couple of ways to proceed here. You can either analyze any function that takes Signal as an argument or analyze only the actual signal handlers by cross-referencing calls to SimulatedBlock::addRecSignalImpl() and then let inter-procedure analysis take care of the rest. I chose to start with the former since the inter-procedure analysis was implemented at a later stage.

The Signal object is 0x8030 bytes in size and not all bytes should be considered tainted. We should only define a small region of memory of the object as tainted so that only memory reads from the tainted region are propagated. Marking the entire structure as tainted will lead to a lot of false positives. In this case, the signal’s tainted data starts at offset 0x28 and any memory loads from this offset are marked tainted. Both Signal::getDataPtr() and Signal::getDataPtrSend() return a pointer to this memory.

Porting type information from IDA Pro to Binary Ninja

The executable under analysis is “ndbd”, which is the NDB Cluster Data Node Daemon built with DWARF debug information. In order to find functions that take a pointer to a Signal object as an argument, check the type information of all functions as follows:

However, at the moment, Binary Ninja does not handle DWARF information as robustly as IDA Pro does. Another issue with Binary Ninja is its failure to detect the “this” argument when analyzing C++ executables. As a result, argument detection will not be accurate, breaking our taint source analysis. An easy fix is to import type information from IDA Pro into Binary Ninja. For example, the Dblqh::prepareContinueAfterBlockedLab() method has the following type information as per IDA Pro:

The same function looks different in Binary Ninja. In this case, the “this” pointer is missing, and Signal becomes the first argument. Marking “arg1” as a taint source makes the entire analysis wrong.

Since we are only interested in the right argument position and type information of the Signal argument, we fix it using the scripts provided in ida2bn directory:

Once the type information is fixed, we are good to identify functions and mark taint sources using the Signal argument. More details on working with types in Binary Ninja are documented here Working with Types, Structures, and Symbols.

Taint Propagation and Filtering

The goals of taint propagation are simple: when a variable is assigned a value from the Signal data, mark it as tainted. If any other variable is derived from the tainted variable, also mark it tainted, and so on. The challenge comes when there are sanitizers. Say a variable is tainted, and in some code path there is a validation for that variable. In this case, the variable is no longer tainted in that code path. Taint propagation should be control-flow sensitive to avoid over tainting and false positives. This section details how I approached the problem using Binary Ninja’s IL and SSA form. For a thorough reading on the topic, please refer to blog posts Breaking Down Binary Ninja’s Low-Level IL and Vulnerability Modeling with Binary Ninja.

Binary Ninja ILs and SSA form

Binary Ninja supports a variety of Intermediate Languages (IL) like Low-Level IL (LLIL), Medium Level IL (MLIL), and High-Level IL (HLIL). Since MLIL abstracts away stack memory access with variables and has parameters associated with call sites, I found it more suitable to perform inter-procedure taint analysis. Moreover, it is better documented than HLIL.

Another powerful feature supported is the Single Static Assignment (SSA) form of the available ILs. In SSA form, each variable is defined only once. When the variable is reassigned to another value, a new version of the variable is created. Therefore, it is easy to track a tainted variable at any point in a function. Consider this minimalistic example: when variable x is reassigned a new value, a new version of the variable is created in the SSA form:

SSA variable def-use chain

Binary Ninja provides get_ssa_var_definition() and get_ssa_var_uses() APIs to get a variable’s definition site and their uses respectively. Consider the MLIL SSA code snippet of Thrman::execOVERLOAD_STATUS_REP() method below:

Here, arg2 is a pointer to a Signal object. At the address 0x00784165, the SSA variable “rax#1” is loaded with a tainted value from [arg2#0 + 0x28]. The MLIL instructions that use the tainted SSA variable rax#1 can be fetched as below:

These APIs form the building blocks of our taint analysis going further.

Taint propagation with SSA def-use chain

Binary Ninja’s ILs are structured as an expression tree such that operands of an operation can be composed of other operations. Consider the graph generated for the below MLIL SSA instruction by the BNIL Instruction Graph plugin:

The MLIL_SET_VAR_SSA operation marks the definition of a new SSA variable, which sets the dest variable to the result of the src expression. The src expression could be composed of many other operations. In this case, MLIL_ADD adds an offset 0x28 to the base address of Signal and then MLIL_LOAD_SSA reads the value from the address computed using MLIL_ADD. Effective taint propagation requires visiting all the MLIL SSA operations for every instruction expression. Josh Watson’s emILator and IL instruction counting by Jordan are good examples for visiting and processing MLIL SSA instruction expressions. What does the taint propagation algorithm look like?

        • Visit all MLIL SSA instructions in the function linearly
        • For any MLIL_SET_VAR_SSA operation, resolve the src expression to check if it is tainted data
        • If the src operand returns tainted data, get the uses of the dest SSA variable with get_ssa_var_uses()
        • Visit the instructions that use the tainted SSA variable and propagate taint when MLIL_SET_VAR_SSA is encountered
        • Once an instruction taints a variable, mark it as visited and never visit again

Constraints on SSA variables

Once we have the taint propagation algorithm in place, how do we handle sanitizers on the tainted variables? We are only interested in code paths without any validations. With this in mind, let’s revisit our taint propagation algorithm, which relies on the def-use chain. Def-use chains are sequential statements of code; therefore, taint propagation is not control-flow sensitive. Here is an example to demonstrate the issue:

The “value” variable passed to the function is tainted and gets used in two different code paths. Along the code path executing the basic block at 0x1184, the variable is validated and considered clean. The get_ssa_var_uses() for the variable returns 3 instructions:

Processing these 3 instructions linearly would lead to the incorrect conclusion that the validation precedes both usages of the tainted value. In reality, only one instruction is protected. The other two are vulnerable. This problem can be solved by taking control flow into account.

Control-flow sensitive propagation using constraint graph

The MediumLevelILInstruction class has an il_basic_block property to get the basic block information of the MLIL instruction.

Using this property, we can fetch the basic blocks of SSA variable definition and SSA variable uses, which also includes the basic blocks where the validations are done. The basic blocks are also referred to as the “constraint” blocks. Some properties of these basic blocks are as follows:

•      The definition block always dominates all uses of the SSA variable.
•      The basic block that has the definition can contain constraints. The same applies to any basic blocks of the def-use chain.
•      A definition block is always reachable and hence all the instructions in it are reachable too.

Considering this as a graph reachability problem, the question is, can we reach all the instructions in the def-use chain of the SSA variable in the presence of a constraint block? To answer this, we build a constraint graph from the CFG of the function and use a pathfinding algorithm on it:

•      Remove the outgoing edges of constraint blocks from the CFG. This is our constraint graph.
•      Find if a path exists between the definition basic block and other basic blocks of the def-use chain in the constraint graph.
•      If any def-use basic blocks are not reachable, then those instructions are not used for taint propagation.

Since each assignment is unique in SSA representation, we maintain a per-variable dictionary of the necessary information including the constraint graph for later analysis. Here is a sample pseudocode to find reachable blocks in the presence of constraint blocks:

To find if an SSA variable is tainted at a given instruction, all we need to do is check its reachable def-use chain:

Arithmetic operations as filters

Other than explicit filters, there are also arithmetic operations that can also be taint filters. For example, AND or Logical Shift Right (LSR) operations may place constraints on a value. In such circumstances, a heuristic can be used to filter out undesired results. Consider the example below:

Here, the tainted value is not explicitly compared against anything, but operations such as LSR and AND limit the input range. This is where I found the possible_values property very useful. Binary Ninja’s data flow analysis can provide possible values for an expression:

Handling Transitively Tainted Variables

The first stage of analysis propagates taint data and generates information such as a list of tainted variables, constraints placed on each SSA variable, and their respective constraint subgraphs.

During the second stage of analysis, we explore the relationships between the tainted SSA variables. Why is this important? We are only looking for unbounded SSA variables. While any direct constraints placed on an SSA variable are already handled in the first stage, the constraints placed on SSA variables that are propagated transitively are not yet handled. Consider the example below:

The index#0 variable could be tainted and not constrained. However, the derived variable constrained_var#1 is validated, indirectly placing constraints on the index variables. Therefore index#3 is not tainted during the memory access at 0x11f2. Here is another example:

Here, index#1, index#2, rax#1, and constrained_var#1 are copies or direct assignments of the variable index#0. When variable constrained_var#1 is validated, the other variables are also validated. Not analyzing the effect of constraints on derived variables or copies of variables leads to false-positives. This section details ideas on handling constraints on related variables.

Constraints on transitively related SSA variables

After the taint propagation phase is over, we iterate through all the tainted variables that have constraints on them. For every variable with constraints, we find its child variables and parent variables.

•      Variables that are derived from a given variable are called child variables.
•      Variables from which a given variable is derived are called parent variables.

To check if the constraints on the variable under consideration have any effect on its parent or child variables, we perform the below checks:

•      Pick the constraint subgraph for the SSA variable under consideration.
•      Check if the definitions of child variables are reachable from the definition of the current variable.
·       If not, the constraint is placed before defining the child variable in the CFG, and therefore none of the basic blocks in the def-use chain are tainted.
·       If yes, the constraint is placed after defining the child variable in the CFG. In this case, also check if all basic blocks of the def-use chain of the child variable are reachable from the definition of child. Mark the non-reachable blocks as clean.

•      For each parent variable, get its list of child variables. While all the child variables of the current variable are also child variables of the parent, parent variables might have other child variables too. The child variables already visited in the previous step can be skipped. Now check if the definitions of child variables are reachable from the definition of parent variable. For a yes or no, repeat the same as mentioned in the previous step to mark the non-reachable blocks as clean.

This way the non-reachable basic blocks are removed from tainted entries of variables related to a constrained variable. We can also choose to perform analysis on variables that are derived or just direct assignments using tags associated with a variable.

While propagating taint, two different tags were used – a direct memory load using MLIL_LOAD_SSA from tainted memory returns an offset, size pair as tag and it gets associated with the destination variable. Whereas for any derived variable, the destination variable is associated with a marker but not the offset, size pair. Consider the code:

The variable rcx_1#2 is tainted with a tag [0x2c, 0x4], which is the offset size pair. This is the same as the case with rbx#1, which is a direct assignment. However, rbx_1#2 derived from the expression rbx#1 u>> 5 is tainted with a tag [0xdeadbeef, 0xdeadbeef]. Using different tag type information, it is possible to identify the nature of taint propagation.

Constraints on SSA variables from multiple taint sources

While propagating taint, we mark a destination variable as tainted if any of the source variables are tainted, including the PHI function. During filtering in the second stage, if a variable is constrained, we apply the constraints to all is related variables. But, when the derived variable (child) is coming from more than one independent taint sources (parents) and only one of the parent variables is validated, the child variable is also considered validated. But this is not desirable. Consider the example below:

Let’s say x and y are coming from two independent taint sources and index is a sum of both, hence a derived variable. When x gets validated, index can still be tainted since y is not validated. The previous algorithm does not take this into account.

To solve this problem, I considered associating each derived tainted variable with the actual source variables referred to as root variables and maintain copies of def-use chain per root variable. For example, variable index#3 has two roots – x#0 and y#0. For each root, maintain a copy of reachable blocks in taint information associated with index#3. When x#1 is validated, only the x#0 copy of index#3 is marked not reachable and the y#0 copy is still considered tainted. A dependency graph of variables is built to establish these relationships.

Establishing SSA variable relationship using a dependency graph

In a variable dependency graph, any tainted variable in the function is represented as a node. When a variable is derived from another variable, it forms a directed edge from the parent variable (node) to the child variable (node).

In order to establish a relationship between variables, the definition site of all the tainted variables is visited using get_ssa_var_definition(). When any of the source variables in the MLIL expression are tainted, create an edge connection in the graph. A variable tainted during MLIL_LOAD_SSA operation does not have a parent node or incoming edges and therefore becomes the root node.

Such a dependency graph will have many weakly connected components because each memory load from the tainted memory region will be assigned to a new variable and therefore a new node in the graph. Put simply, each memory load creates a subgraph along with its derived variables. A subgraph might connect with another when a variable is derived out of more than one root node. Here is a sample dependency graph from the function Dbtux::execTUX_ADD_ATTRREQ():

Another property to note is that dependency graphs are not necessarily Directed Acyclic Graphs (DAG). This is because loops can be introduced by a circular dependency of variables in PHI functions. Consider the below SSA representation of a loop operation:

Here, the value of counter#2 depends on counter#1 or counter#4, which is a PHI function. The predecessor block decides the outcome of the function. Further down in the loop, counter#4 depends on counter#2. This relationship will be represented as a cycle in a dependency graph.

Once the dependency graph is generated, it is easy to get the root variables associated with any tainted variables. Moreover, child and parent variables for any given variable can be fetched for handling transitive relationships. The only missing part now is the forward propagation of tainted information to other functions.

Static Function Hooks and Inter-procedure Taint Propagation

All the MLIL_CALL_SSA and MLIL_TAILCALL_SSA instructions with tainted arguments are processed once the analysis of the current function is finished. For any CALL instructions with a known destination (e.g., MLIL_CONST_PTR), the symbol is fetched to check for static hooks. Here is the code snippet:

Static hooks are handlers to functions that we intend to handle differently compared to other functions. Consider a call to the libc memcpy function, where taint propagation is not necessary but only interested in checking for tainted size, source, or destination arguments. In order to provide this information to the analyzer and make it configurable, a JSON config with function names and arguments is used as below:

The arguments to check are indexed from 0. In the case of memcpy, all 3 parameters are marked for analysis. The argument index provided in the JSON config is checked against the tainted SSA variables. For example, arg2 in the config maps to an SSA argument variable associated with memcpy’s size argument.

Static hooks can also be used to mark an output variable or return value of a function to be tainted and further propagated. However, this is not currently implemented since function-specific details are not considered. When necessary, the visitor handler for the MLIL_SET_VAR_SSA operation can be reused for implementing backward taint propagation during CALL operations. For any other function without hooks, taint information is propagated by marking the target function’s variable as tainted.

Tracing Vulnerabilities from Reachable Blocks

Once the taint propagation and filtering phases are over, the last phase of analysis involves iterating through all tainted variables and checking the reachable blocks for potential sinks. Based on the bugs already reported, I chose to look for vulnerabilities involving Out-Of-Bounds (OOB) memory access, buffer overflows during function calls to APIs such as memcpy, untrusted inputs casted to a pointer, and tainted loop counters. The rest of this section details additional detection strategies.

OOB reads and writes

The majority of vulnerabilities in MySQL Cluster were OOB read and write memory access bugs due to the missing validation of untrusted array indexes. To detect these bugs, we can specifically consider any MLIL_LOAD_SSA or MLIL_STORE_SSA instructions as sinks. Here is an example code from Dbdih::execGET_LATEST_GCI_REQ():

Here, rax#1 is tainted, hence the read operation using MLIL_LOAD_SSA can be considered an OOB read condition. Similarly, consider another case from Thrman::execOVERLOAD_STATUS_REP():

Here again, rax#1 is tainted, hence write operation using MLIL_STORE_SSA can be considered an OOB write condition.

API buffer overflows

Static function hooks are used to detect buffer overflows caused by a lack of validation of arguments passed to functions like memcpy, memmove, etc. Details regarding this are already detailed in the section “Static Function Hooks and Inter-procedure taint propagation” above. Essentially, if any of the interesting parameters of a hooked function are tainted, we log it as a potential vulnerability.

Untrusted pointer dereferences

In some cases, I noticed that MySQL Cluster converts untrusted input to a pointer then dereferences it. To identify this, I relied on Binary Ninja’s type information. The MLIL variable object has a Type property that returns the Type object associated with a variable. A Type object’s type can be accessed using the type_class property. Here the pattern is that the source points to a tainted memory region within a Signal structure, and the destination variable is of type PointerTypeClass. The Type object also has a confidence property, as seen below:

The maximum confidence value for a variable type is 255. To reduce false positives, the analyzer only considers type information having the maximum confidence.

Tainted control flow operations in a loop

Loop termination conditions depending on tainted variables can lead to interesting bugs. Binary Ninja’s MLIL does not provide information on loops, therefore the alternative was to rely on HLIL to detect tainted loop conditions. The HLIL of a loop in Cmvmi::execEVENT_SUBSCRIBE_REQ() looks like the example below:

The trouble here is that we have implemented the entire taint propagation using MLIL, and Binary Ninja does not provide a mapping between MLIL and HLIL. Therefore, even if loops can be detected, the challenge is to know if a tainted MLIL variable maps to a HLIL variable used in a loop condition.

As a workaround, the HLIL instruction has a condition property that fetches the condition statement associated with a loop. The address of this condition statement can be mapped to a MLIL_IF instruction.

Therefore, if any of the MLIL_IF instructions are tainted and are a part of a HLIL loop condition, then the analyzer logs it as a potential bug.

Experiments with Dominance Relationships

dominance relationship provides information on the order of execution of some basic blocks. A basic block X is said to dominate another basic block Y if all paths to Y should go through X. Let’s take the example from Wikipedia:

In the provided graph, node B dominates the nodes C, D, E and F because all paths to these nodes must go through node B. By definition, every node dominates itself. So, the full set of nodes that are dominated by node B is B, C, D, E and F. There is also a related concept called the strict dominator, which does not consider the node in question. Therefore, the set of all nodes that are strictly dominated by node B is C, D, E and F.

Binary Ninja’s BasicBlock object has dominators and strict_dominators properties which provides information regarding dominance relation in a function.

What about using the already available dominance properties in Binary Ninja for handling taint constraints instead of relying on graph reachability algorithms from networkx package?

Mapping constraints to dominator blocks

In order to check if any basic blocks in a def-use chain of an SSA variable are reachable, we can follow the steps below:

•     Find all constraint blocks associated with a variable.
•     Get all the basic blocks that reference the variable using the def-use chain.
•     For each basic block, check if it is strictly dominated by a constraint block. If yes, the variable is considered validated for that basic block and considered not reachable.

Going back to the same example, the index gets validated in <mlil block: [email protected]> which dominates the <mlil block: [email protected]>. Therefore, by checking a constraint block against dominators it is possible to establish reachability.

False positives with dominators

While the dominance relationship was promising and gives good results, it does give rise to certain false positives. Consider the following CFG, where validation is done in two different program paths:

Here, the index is validated in two different program paths before hitting a potential sink block <mlil block: [email protected]>. However, none of the constraint blocks that perform validation <mlil block: [email protected]> and <mlil block: [email protected]> are dominators. In such cases, since constraint blocks cannot be mapped to any dominators, the potential sink block <mlil block: [email protected]> will be considered vulnerable on read access at address 0x11ba. This is a false-positive result.

The same is the case with the branch_dependence property, which returns a dictionary of branch instruction index and the branching decision taken to reach the instruction. When both the True and False branches dominate the instruction basic block, we do not get information regarding reachability.

A general observation from the scan result is that most constraints are part of dominator blocks. Very few are validated across multiple code paths, producing false positives. Since path-finding algorithms relying on the definition and usage of variables eliminate these false-positive results, I preferred it over dominators. However, the code is still in the repository for experimental purposes.

Notes on Analysis

The target ndbd executable is loaded in Binary Ninja to generate the BNDB analysis database. Then the analyzer is executed against ndbd.bndb for faster analysis:

        python3 mysql_bugs.py --function_hooks functions_to_hook.json ndbd.bndb

Though not optimized for speed, the analyzer runs for about 4-5 minutes and returns 195 results. Some of the results are duplicates because a single vulnerability in a helper function might get used by multiple handlers. The analyzer was able to find most of the issues already known as well as a few that were not previously known: ZDI-CAN-15120, ZDI-CAN-15121 and ZDI-CAN-15122. However, there is a high triage cost associated with static analysis results, especially when the target is less familiar. Luckily, my colleague Lucas had already spent a fair amount of time on the codebase making it easier to triage the results.

Source code for the project can be found here.

Acknowledgments and References

•     Thanks to Lucas Leong for all the discussions and triage. His blog post MindShaRE: When MySQL Cluster Encounters Taint Analysis is an excellent reference for performing static taint analysis on MySQL Cluster.
•     Tainted Flow Analysis on e-SSA-form Programs which served as an excellent reference for taint analysis as a graph reachability problem.
•     Various blog posts from Trail of Bits on Binary Ninja.
•     Josh Watson for various projects using Binary Ninja. The visitor class implementation is based on emILator.
•     Jordan for all the code snippets and the Binary Ninja slack community for answering various questions.

Conclusion

I hope you have enjoyed this look at using Binary Ninja to find vulnerabilities through taint analysis. In a few days, I’ll be back to discuss using Clang Static Analyzer (CSA) for detecting untrusted pointer dereferences and tainted loop conditions. Until then, you can find me on Twitter @RenoRobertr, and follow the team for the latest in exploit techniques and security patches.

Static Taint Analysis using Binary Ninja: A Case Study of MySQL Cluster Vulnerabilities

❌
❌