🔒
There are new articles available, click to refresh the page.
Before yesterdayZero Day Initiative - Blog

Analysis of a Parallels Desktop Stack Clash Vulnerability and Variant Hunting using Binary Ninja

9 September 2021 at 14:59

Parallels Desktop uses a paravirtual PCI device called the “Parallels ToolGate” for communication between guest and host OS. This device is identified by Vendor ID 0x1AB8 and Device ID 0x4000 in a Parallels guest.

The guest driver provided as part of Parallels Tools and the host virtual device communicate using a ToolGate messaging protocol. To provide a summary, the guest driver prepares a message and writes the physical address of the message to TG_PORT_SUBMIT [PORT IO ADDRESS+0x8]. The host then maps the guest provided physical address, parses the message, and transfers control to the respective request handlers for further processing. Many of the bugs received by the ZDI program in Parallels are in these request handlers.

ToolGate Interface and Protocol Format

The ToolGate request format is a variable size structure that could span across multiple pages. The guest sends data to the host as inline bytes or pointers to paged buffers by writing the physical address of TG_PAGED_REQUEST structure to the IO port of the virtual device.

Figure 1 - Variable size TG_PAGED_REQUEST structure in guest memory

Figure 1 - Variable size TG_PAGED_REQUEST structure in guest memory

Figure 2 - Variable size TG_PAGED_BUFFER structure in TG_PAGED_REQUEST

Figure 2 - Variable size TG_PAGED_BUFFER structure in TG_PAGED_REQUEST

The host then maps the page pointed to by the physical address, prepares a host buffer based on the information provided by the guest and then invokes the request handler. Request handlers may use inline bytes or paged buffers or both, for reading and writing data. These data buffers are accessed using a set of functions as roughly defined below:

TG_DataBuffer *TG_GetBuffer(TG_ReqBuffer *buf, uint32_t index, uint32 writable) uint32_t TG_ReadBuffer(TG_DataBuffer *buf, uint64_t offset, void *dst, size_t size) uint32_t TG_WriteBuffer(TG_DataBuffer *buf, uint64_t offset, void *src, size_t size)
void *TG_GetInlineBytes(TG_DataBuffer *buf)

Return of the Stack Clash Vulnerability

The STARLabs submission (ZDI-21-937) for Pwn2Own 2021 is an uncontrolled memory allocation vulnerability where a guest provided size value is allocated in the stack. If the size provided by the guest is more than the total size of the stack, it is possible to shift the Stack Pointer (RSP) into other regions of process memory (e.g., stack region of another thread).

Figure 3 - Normal stack operation (left) vs stack jumping due to large allocation (right)

Figure 3 - Normal stack operation (left) vs stack jumping due to large allocation (right)

When handling TG_REQUEST_INVSHARING (0x8420), Parallels Desktop does not validate the ByteCount value provided by the guest as part of TG_PAGED_BUFFER. When allocated in stack, this untrusted size value can be used for shifting the stack top of thread processing the ToolGate request into another victim thread to overwrite its contents. There is a guard page that is 4KB in size, however it is possible to jump over this small allocation without accessing it. Qualys published a detailed paper titled The Stack Clash on this bug class back in 2017, which also leads to various compiler mitigations to prevent such guard page jumps.

Below is the vulnerable section of code from Parallels Desktop 16.1.3. During the call to TG_ReadBuffer(), the stack memory of another thread can be overwritten with guest-controlled values.

Figure 4 - Vulnerability in TG_REQUEST_INVSHARING handling

Figure 4 - Vulnerability in TG_REQUEST_INVSHARING handling

Backward Compatibility and Compiler Mitigations

The most interesting question here is what happened to the stack clash compiler mitigation in Apple Clang? When a variable size allocation is made in stack like alloca(value) or char buffer[value], the Apple Clang compiler instruments the allocation with ___chkstk_darwin() to validate the size of allocation request. ___chkstk_darwin() essentially allocates a PAGE_SIZE of memory in stack, and then, for each page allocated, probes the new stack top if it is accessible. If the case guard page is reached, the probing step will fail leading to a safe crash. It is no longer possible to shift the stack pointer into an arbitrary location.

Figure 5 - Sample code with stack clash mitigation

Figure 5 - Sample code with stack clash mitigation

It is clear that Parallels Desktop did not have this mitigation enabled because there is no call to ___chkstk_darwin() during the variable size stack allocation. At this point there are couple of open questions:

        -- Did Parallels disable the mitigation using -fno-stack-check compiler flag?
        -- Did they use an old build configuration which disabled the mitigation?

Mac OS otool can be used to get a fair amount of information regarding the build environment. Specifically, LC_VERSION_MIN_MACOSX can provide information regarding the macOS versions supported. Here is the output of otool -l prl_vm_app:

Figure 6 - LC_VERSION_MIN_MACOSX  information of prl_vm_app

Figure 6 - LC_VERSION_MIN_MACOSX  information of prl_vm_app

The SDK used is 10.15 (Catalina), which is quite new. Moreover, Parallels has also set the minimal macOS version supported to 10.13, which makes it compatible with High Sierra. This aligns with the compatibility information provided in their KB article. Still, does backward compatibility for 10.13 disable the compiler mitigation? Here is a comparison of sample code compiled with and without -mmacosx-version-min=10.13:

Figure 7 - Backward compatibility with 10.13 disables ___chkstk_darwin() (right)

Figure 7 - Backward compatibility with 10.13 disables ___chkstk_darwin() (right)

It is unclear if Parallels had ___chkstk_darwin() explicitly disabled using -fno-stack-check, but setting -mmacosx-version-min=10.13 has the same effect and silently drops the mitigation. The same behavior is also observed with -mmacosx-version-min=10.14 (Mojave). Interestingly, GCC has inlined the stack probe check instead of depending on an external library function. This is possibly due to an external dependency, as compiling in Apple Clang with the backward compatibility flags (macosx-version-min, target etc.) ended up removing the mitigation.

Saying that, Mojave (10.14.6) did not give any symbol errors when running executables with calls to ___chkstk_darwin(). One can find many such issues in High Sierra (10.13.6). It is noticed that, in older compilers (e.g. Apple LLVM version 10.0.1 on Mojave), the stack clash mitigation is not enabled by default unless the -fstack-check flag is explicitly provided. Therefore, the recent compilers seem to drop the mitigation entirely when compiled with macosx-version-min for any version less than 10.15. This can be worked around by providing both the -fstack-check and -mmacosx-version-min flags together. However, High Sierra compatibility is questionable. The highlight is that using macosx-version-min alone can make the bug exploitable even on latest versions of Mac OS.

Figure 8 - Fig 8. Apple Clang calls ___chkstk_darwin (left) vs GCC mitigation inlined (right)

Figure 8 - Fig 8. Apple Clang calls ___chkstk_darwin (left) vs GCC mitigation inlined (right)

Variant hunting using Binary Ninja

The next question is, are there other similar bugs in Parallels Desktop? How can we automate this process? The size of a stack frame is generally known during compile time. Moreover, any operations that shift the stack pointer can also be tracked. Binary Ninja has static data flow capability which keeps track of the stack frame offset at any point in a function. However, when there is a variable size allocation in stack, the stack offsets cannot be determined statically. This exact property can be used to look for variants of the bug.

Figure 9 - Fig 9. Known stack offset (left) vs Undetermined value after alloca() (right)

Figure 9 - Fig 9. Known stack offset (left) vs Undetermined value after alloca() (right)

Consider the index 88 in the above Low-Level IL, where RSP register is loaded.

Here, the new stack top is calculated using a guest-provided size and loaded into RSP. Binary Ninja provides get_possible_reg_values() and get_possible_reg_values_after() python APIs to fetch statically determined values for registers. The register values are also associated with type information (RegisterValueType). Here is the stack frame offset value in RSP before and after the load operation:

The RSP is always associated with StackFrameOffset RegisterValueType. However, when the RSP value is not known, it is marked as UndeterminedValue. With this value type information, search for all references to TG_ReadBuffer() where RSP value is undetermined. If RSP is undetermined before the call to TG_ReadBuffer(), its deduced that a variable size allocation was made in stack prior to the call.

The above query yielded 3 results; one was the Pwn2Own submission and 2 other previously unknown vulnerabilities.

      0x1001c6e35 - ZDI-21-1056 - TG_REQUEST_GL_CREATE_CONTEXT
      0x10025591a - ZDI-21-1055 - TG_REQUEST_DIRECT3D_CREATE_CONTEXT
      0x10080bcd9 - ZDI-21-937 - (Pwn2Own) - TG_REQUEST_INVSHARING

Figure 10 - Pwn2Own bug and its variants found using Binary Ninja

Figure 10 - Pwn2Own bug and its variants found using Binary Ninja

Conclusion

Our research has shown that, in some cases, compiling for backward compatibility might silently drop mitigations, thus making an entire bug class exploitable. Perhaps the vendors should consider releasing separate binaries in such cases?

We also took a look at how Binary Ninja’s static data flow capability and python API’s can be useful in automating bug finding tasks. If you find any such vulnerabilities, consider submitting it to our program. Until then, you can find me on Twitter @RenoRobertr, and follow the team for the latest in exploit techniques and security patches.

Analysis of a Parallels Desktop Stack Clash Vulnerability and Variant Hunting using Binary Ninja

Static Taint Analysis using Binary Ninja: A Case Study of MySQL Cluster Vulnerabilities

15 February 2022 at 17:04

Taint analysis is an effective technique for finding vulnerabilities, even in large codebases. My colleague, Lucas Leong, recently demonstrated how Clang Static Analyzer and CodeQL can be used to model and find vulnerabilities in MySQL NDB Cluster using taint analysis. Largely inspired by his work, I wanted to try something similar but using Binary Ninja since it can also work with closed-source programs.

These are a few things I had in mind while working:

•      Identify vulnerabilities due to uses of untrusted values without bounds checking
•      Taint propagation and filtering should be control-flow sensitive
•      Support inter-procedure analysis

I approached this as a graph reachability problem, for which Tainted Flow Analysis on e-SSA-form Programs served as an excellent reference. All the analysis in this article is based on MySQL Cluster 8.0.25 and Binary Ninja 2.4.2846.

Defining Taint Sources

To get taint analysis working, it is essential to define the taint sources clearly. MySQL Cluster has a message passing architecture, and interesting taint sources are the messages themselves. This section provides details on message handling and how the message handlers can be identified for analysis.

MySQL NDB Cluster Signals

MySQL NDB Cluster defines functionalities as “blocks” and messages passing between them as “signals”. The NDB blocks are implemented as C++ classes, and each block registers multiple signal handlers during initialization, which are also methods of the class. Most of the vulnerabilities reported were in these message handlers, also known as signal handlers.

All the blocks inherit the SimulatedBlock class to register their signals using addRecSignal(), which invokes the SimulatedBlock::addRecSignalImpl() method. The registered signal handlers are of type ExecSignalLocal, which takes a single argument. Interested readers can refer to blog posts on Ndb software architecture by Frazer Clement and Code Reading Notes – MySQL Cluster by Steinway Wu for further details. The scope of this article is limited to entry points to signal handlers. Below is an example of the code of the NDBCNTR block registering a signal handler:

The “Signal” object that each handler receives contains untrusted data. The signal handlers can access the data as signal->getDataPtr() or with a few other methods. The handlers can also further pass the “Signal” object to other functions. There are a couple of ways to proceed here. You can either analyze any function that takes Signal as an argument or analyze only the actual signal handlers by cross-referencing calls to SimulatedBlock::addRecSignalImpl() and then let inter-procedure analysis take care of the rest. I chose to start with the former since the inter-procedure analysis was implemented at a later stage.

The Signal object is 0x8030 bytes in size and not all bytes should be considered tainted. We should only define a small region of memory of the object as tainted so that only memory reads from the tainted region are propagated. Marking the entire structure as tainted will lead to a lot of false positives. In this case, the signal’s tainted data starts at offset 0x28 and any memory loads from this offset are marked tainted. Both Signal::getDataPtr() and Signal::getDataPtrSend() return a pointer to this memory.

Porting type information from IDA Pro to Binary Ninja

The executable under analysis is “ndbd”, which is the NDB Cluster Data Node Daemon built with DWARF debug information. In order to find functions that take a pointer to a Signal object as an argument, check the type information of all functions as follows:

However, at the moment, Binary Ninja does not handle DWARF information as robustly as IDA Pro does. Another issue with Binary Ninja is its failure to detect the “this” argument when analyzing C++ executables. As a result, argument detection will not be accurate, breaking our taint source analysis. An easy fix is to import type information from IDA Pro into Binary Ninja. For example, the Dblqh::prepareContinueAfterBlockedLab() method has the following type information as per IDA Pro:

The same function looks different in Binary Ninja. In this case, the “this” pointer is missing, and Signal becomes the first argument. Marking “arg1” as a taint source makes the entire analysis wrong.

Since we are only interested in the right argument position and type information of the Signal argument, we fix it using the scripts provided in ida2bn directory:

Once the type information is fixed, we are good to identify functions and mark taint sources using the Signal argument. More details on working with types in Binary Ninja are documented here Working with Types, Structures, and Symbols.

Taint Propagation and Filtering

The goals of taint propagation are simple: when a variable is assigned a value from the Signal data, mark it as tainted. If any other variable is derived from the tainted variable, also mark it tainted, and so on. The challenge comes when there are sanitizers. Say a variable is tainted, and in some code path there is a validation for that variable. In this case, the variable is no longer tainted in that code path. Taint propagation should be control-flow sensitive to avoid over tainting and false positives. This section details how I approached the problem using Binary Ninja’s IL and SSA form. For a thorough reading on the topic, please refer to blog posts Breaking Down Binary Ninja’s Low-Level IL and Vulnerability Modeling with Binary Ninja.

Binary Ninja ILs and SSA form

Binary Ninja supports a variety of Intermediate Languages (IL) like Low-Level IL (LLIL), Medium Level IL (MLIL), and High-Level IL (HLIL). Since MLIL abstracts away stack memory access with variables and has parameters associated with call sites, I found it more suitable to perform inter-procedure taint analysis. Moreover, it is better documented than HLIL.

Another powerful feature supported is the Single Static Assignment (SSA) form of the available ILs. In SSA form, each variable is defined only once. When the variable is reassigned to another value, a new version of the variable is created. Therefore, it is easy to track a tainted variable at any point in a function. Consider this minimalistic example: when variable x is reassigned a new value, a new version of the variable is created in the SSA form:

SSA variable def-use chain

Binary Ninja provides get_ssa_var_definition() and get_ssa_var_uses() APIs to get a variable’s definition site and their uses respectively. Consider the MLIL SSA code snippet of Thrman::execOVERLOAD_STATUS_REP() method below:

Here, arg2 is a pointer to a Signal object. At the address 0x00784165, the SSA variable “rax#1” is loaded with a tainted value from [arg2#0 + 0x28]. The MLIL instructions that use the tainted SSA variable rax#1 can be fetched as below:

These APIs form the building blocks of our taint analysis going further.

Taint propagation with SSA def-use chain

Binary Ninja’s ILs are structured as an expression tree such that operands of an operation can be composed of other operations. Consider the graph generated for the below MLIL SSA instruction by the BNIL Instruction Graph plugin:

The MLIL_SET_VAR_SSA operation marks the definition of a new SSA variable, which sets the dest variable to the result of the src expression. The src expression could be composed of many other operations. In this case, MLIL_ADD adds an offset 0x28 to the base address of Signal and then MLIL_LOAD_SSA reads the value from the address computed using MLIL_ADD. Effective taint propagation requires visiting all the MLIL SSA operations for every instruction expression. Josh Watson’s emILator and IL instruction counting by Jordan are good examples for visiting and processing MLIL SSA instruction expressions. What does the taint propagation algorithm look like?

        • Visit all MLIL SSA instructions in the function linearly
        • For any MLIL_SET_VAR_SSA operation, resolve the src expression to check if it is tainted data
        • If the src operand returns tainted data, get the uses of the dest SSA variable with get_ssa_var_uses()
        • Visit the instructions that use the tainted SSA variable and propagate taint when MLIL_SET_VAR_SSA is encountered
        • Once an instruction taints a variable, mark it as visited and never visit again

Constraints on SSA variables

Once we have the taint propagation algorithm in place, how do we handle sanitizers on the tainted variables? We are only interested in code paths without any validations. With this in mind, let’s revisit our taint propagation algorithm, which relies on the def-use chain. Def-use chains are sequential statements of code; therefore, taint propagation is not control-flow sensitive. Here is an example to demonstrate the issue:

The “value” variable passed to the function is tainted and gets used in two different code paths. Along the code path executing the basic block at 0x1184, the variable is validated and considered clean. The get_ssa_var_uses() for the variable returns 3 instructions:

Processing these 3 instructions linearly would lead to the incorrect conclusion that the validation precedes both usages of the tainted value. In reality, only one instruction is protected. The other two are vulnerable. This problem can be solved by taking control flow into account.

Control-flow sensitive propagation using constraint graph

The MediumLevelILInstruction class has an il_basic_block property to get the basic block information of the MLIL instruction.

Using this property, we can fetch the basic blocks of SSA variable definition and SSA variable uses, which also includes the basic blocks where the validations are done. The basic blocks are also referred to as the “constraint” blocks. Some properties of these basic blocks are as follows:

•      The definition block always dominates all uses of the SSA variable.
•      The basic block that has the definition can contain constraints. The same applies to any basic blocks of the def-use chain.
•      A definition block is always reachable and hence all the instructions in it are reachable too.

Considering this as a graph reachability problem, the question is, can we reach all the instructions in the def-use chain of the SSA variable in the presence of a constraint block? To answer this, we build a constraint graph from the CFG of the function and use a pathfinding algorithm on it:

•      Remove the outgoing edges of constraint blocks from the CFG. This is our constraint graph.
•      Find if a path exists between the definition basic block and other basic blocks of the def-use chain in the constraint graph.
•      If any def-use basic blocks are not reachable, then those instructions are not used for taint propagation.

Since each assignment is unique in SSA representation, we maintain a per-variable dictionary of the necessary information including the constraint graph for later analysis. Here is a sample pseudocode to find reachable blocks in the presence of constraint blocks:

To find if an SSA variable is tainted at a given instruction, all we need to do is check its reachable def-use chain:

Arithmetic operations as filters

Other than explicit filters, there are also arithmetic operations that can also be taint filters. For example, AND or Logical Shift Right (LSR) operations may place constraints on a value. In such circumstances, a heuristic can be used to filter out undesired results. Consider the example below:

Here, the tainted value is not explicitly compared against anything, but operations such as LSR and AND limit the input range. This is where I found the possible_values property very useful. Binary Ninja’s data flow analysis can provide possible values for an expression:

Handling Transitively Tainted Variables

The first stage of analysis propagates taint data and generates information such as a list of tainted variables, constraints placed on each SSA variable, and their respective constraint subgraphs.

During the second stage of analysis, we explore the relationships between the tainted SSA variables. Why is this important? We are only looking for unbounded SSA variables. While any direct constraints placed on an SSA variable are already handled in the first stage, the constraints placed on SSA variables that are propagated transitively are not yet handled. Consider the example below:

The index#0 variable could be tainted and not constrained. However, the derived variable constrained_var#1 is validated, indirectly placing constraints on the index variables. Therefore index#3 is not tainted during the memory access at 0x11f2. Here is another example:

Here, index#1, index#2, rax#1, and constrained_var#1 are copies or direct assignments of the variable index#0. When variable constrained_var#1 is validated, the other variables are also validated. Not analyzing the effect of constraints on derived variables or copies of variables leads to false-positives. This section details ideas on handling constraints on related variables.

Constraints on transitively related SSA variables

After the taint propagation phase is over, we iterate through all the tainted variables that have constraints on them. For every variable with constraints, we find its child variables and parent variables.

•      Variables that are derived from a given variable are called child variables.
•      Variables from which a given variable is derived are called parent variables.

To check if the constraints on the variable under consideration have any effect on its parent or child variables, we perform the below checks:

•      Pick the constraint subgraph for the SSA variable under consideration.
•      Check if the definitions of child variables are reachable from the definition of the current variable.
·       If not, the constraint is placed before defining the child variable in the CFG, and therefore none of the basic blocks in the def-use chain are tainted.
·       If yes, the constraint is placed after defining the child variable in the CFG. In this case, also check if all basic blocks of the def-use chain of the child variable are reachable from the definition of child. Mark the non-reachable blocks as clean.

•      For each parent variable, get its list of child variables. While all the child variables of the current variable are also child variables of the parent, parent variables might have other child variables too. The child variables already visited in the previous step can be skipped. Now check if the definitions of child variables are reachable from the definition of parent variable. For a yes or no, repeat the same as mentioned in the previous step to mark the non-reachable blocks as clean.

This way the non-reachable basic blocks are removed from tainted entries of variables related to a constrained variable. We can also choose to perform analysis on variables that are derived or just direct assignments using tags associated with a variable.

While propagating taint, two different tags were used – a direct memory load using MLIL_LOAD_SSA from tainted memory returns an offset, size pair as tag and it gets associated with the destination variable. Whereas for any derived variable, the destination variable is associated with a marker but not the offset, size pair. Consider the code:

The variable rcx_1#2 is tainted with a tag [0x2c, 0x4], which is the offset size pair. This is the same as the case with rbx#1, which is a direct assignment. However, rbx_1#2 derived from the expression rbx#1 u>> 5 is tainted with a tag [0xdeadbeef, 0xdeadbeef]. Using different tag type information, it is possible to identify the nature of taint propagation.

Constraints on SSA variables from multiple taint sources

While propagating taint, we mark a destination variable as tainted if any of the source variables are tainted, including the PHI function. During filtering in the second stage, if a variable is constrained, we apply the constraints to all is related variables. But, when the derived variable (child) is coming from more than one independent taint sources (parents) and only one of the parent variables is validated, the child variable is also considered validated. But this is not desirable. Consider the example below:

Let’s say x and y are coming from two independent taint sources and index is a sum of both, hence a derived variable. When x gets validated, index can still be tainted since y is not validated. The previous algorithm does not take this into account.

To solve this problem, I considered associating each derived tainted variable with the actual source variables referred to as root variables and maintain copies of def-use chain per root variable. For example, variable index#3 has two roots – x#0 and y#0. For each root, maintain a copy of reachable blocks in taint information associated with index#3. When x#1 is validated, only the x#0 copy of index#3 is marked not reachable and the y#0 copy is still considered tainted. A dependency graph of variables is built to establish these relationships.

Establishing SSA variable relationship using a dependency graph

In a variable dependency graph, any tainted variable in the function is represented as a node. When a variable is derived from another variable, it forms a directed edge from the parent variable (node) to the child variable (node).

In order to establish a relationship between variables, the definition site of all the tainted variables is visited using get_ssa_var_definition(). When any of the source variables in the MLIL expression are tainted, create an edge connection in the graph. A variable tainted during MLIL_LOAD_SSA operation does not have a parent node or incoming edges and therefore becomes the root node.

Such a dependency graph will have many weakly connected components because each memory load from the tainted memory region will be assigned to a new variable and therefore a new node in the graph. Put simply, each memory load creates a subgraph along with its derived variables. A subgraph might connect with another when a variable is derived out of more than one root node. Here is a sample dependency graph from the function Dbtux::execTUX_ADD_ATTRREQ():

Another property to note is that dependency graphs are not necessarily Directed Acyclic Graphs (DAG). This is because loops can be introduced by a circular dependency of variables in PHI functions. Consider the below SSA representation of a loop operation:

Here, the value of counter#2 depends on counter#1 or counter#4, which is a PHI function. The predecessor block decides the outcome of the function. Further down in the loop, counter#4 depends on counter#2. This relationship will be represented as a cycle in a dependency graph.

Once the dependency graph is generated, it is easy to get the root variables associated with any tainted variables. Moreover, child and parent variables for any given variable can be fetched for handling transitive relationships. The only missing part now is the forward propagation of tainted information to other functions.

Static Function Hooks and Inter-procedure Taint Propagation

All the MLIL_CALL_SSA and MLIL_TAILCALL_SSA instructions with tainted arguments are processed once the analysis of the current function is finished. For any CALL instructions with a known destination (e.g., MLIL_CONST_PTR), the symbol is fetched to check for static hooks. Here is the code snippet:

Static hooks are handlers to functions that we intend to handle differently compared to other functions. Consider a call to the libc memcpy function, where taint propagation is not necessary but only interested in checking for tainted size, source, or destination arguments. In order to provide this information to the analyzer and make it configurable, a JSON config with function names and arguments is used as below:

The arguments to check are indexed from 0. In the case of memcpy, all 3 parameters are marked for analysis. The argument index provided in the JSON config is checked against the tainted SSA variables. For example, arg2 in the config maps to an SSA argument variable associated with memcpy’s size argument.

Static hooks can also be used to mark an output variable or return value of a function to be tainted and further propagated. However, this is not currently implemented since function-specific details are not considered. When necessary, the visitor handler for the MLIL_SET_VAR_SSA operation can be reused for implementing backward taint propagation during CALL operations. For any other function without hooks, taint information is propagated by marking the target function’s variable as tainted.

Tracing Vulnerabilities from Reachable Blocks

Once the taint propagation and filtering phases are over, the last phase of analysis involves iterating through all tainted variables and checking the reachable blocks for potential sinks. Based on the bugs already reported, I chose to look for vulnerabilities involving Out-Of-Bounds (OOB) memory access, buffer overflows during function calls to APIs such as memcpy, untrusted inputs casted to a pointer, and tainted loop counters. The rest of this section details additional detection strategies.

OOB reads and writes

The majority of vulnerabilities in MySQL Cluster were OOB read and write memory access bugs due to the missing validation of untrusted array indexes. To detect these bugs, we can specifically consider any MLIL_LOAD_SSA or MLIL_STORE_SSA instructions as sinks. Here is an example code from Dbdih::execGET_LATEST_GCI_REQ():

Here, rax#1 is tainted, hence the read operation using MLIL_LOAD_SSA can be considered an OOB read condition. Similarly, consider another case from Thrman::execOVERLOAD_STATUS_REP():

Here again, rax#1 is tainted, hence write operation using MLIL_STORE_SSA can be considered an OOB write condition.

API buffer overflows

Static function hooks are used to detect buffer overflows caused by a lack of validation of arguments passed to functions like memcpy, memmove, etc. Details regarding this are already detailed in the section “Static Function Hooks and Inter-procedure taint propagation” above. Essentially, if any of the interesting parameters of a hooked function are tainted, we log it as a potential vulnerability.

Untrusted pointer dereferences

In some cases, I noticed that MySQL Cluster converts untrusted input to a pointer then dereferences it. To identify this, I relied on Binary Ninja’s type information. The MLIL variable object has a Type property that returns the Type object associated with a variable. A Type object’s type can be accessed using the type_class property. Here the pattern is that the source points to a tainted memory region within a Signal structure, and the destination variable is of type PointerTypeClass. The Type object also has a confidence property, as seen below:

The maximum confidence value for a variable type is 255. To reduce false positives, the analyzer only considers type information having the maximum confidence.

Tainted control flow operations in a loop

Loop termination conditions depending on tainted variables can lead to interesting bugs. Binary Ninja’s MLIL does not provide information on loops, therefore the alternative was to rely on HLIL to detect tainted loop conditions. The HLIL of a loop in Cmvmi::execEVENT_SUBSCRIBE_REQ() looks like the example below:

The trouble here is that we have implemented the entire taint propagation using MLIL, and Binary Ninja does not provide a mapping between MLIL and HLIL. Therefore, even if loops can be detected, the challenge is to know if a tainted MLIL variable maps to a HLIL variable used in a loop condition.

As a workaround, the HLIL instruction has a condition property that fetches the condition statement associated with a loop. The address of this condition statement can be mapped to a MLIL_IF instruction.

Therefore, if any of the MLIL_IF instructions are tainted and are a part of a HLIL loop condition, then the analyzer logs it as a potential bug.

Experiments with Dominance Relationships

dominance relationship provides information on the order of execution of some basic blocks. A basic block X is said to dominate another basic block Y if all paths to Y should go through X. Let’s take the example from Wikipedia:

In the provided graph, node B dominates the nodes C, D, E and F because all paths to these nodes must go through node B. By definition, every node dominates itself. So, the full set of nodes that are dominated by node B is B, C, D, E and F. There is also a related concept called the strict dominator, which does not consider the node in question. Therefore, the set of all nodes that are strictly dominated by node B is C, D, E and F.

Binary Ninja’s BasicBlock object has dominators and strict_dominators properties which provides information regarding dominance relation in a function.

What about using the already available dominance properties in Binary Ninja for handling taint constraints instead of relying on graph reachability algorithms from networkx package?

Mapping constraints to dominator blocks

In order to check if any basic blocks in a def-use chain of an SSA variable are reachable, we can follow the steps below:

•     Find all constraint blocks associated with a variable.
•     Get all the basic blocks that reference the variable using the def-use chain.
•     For each basic block, check if it is strictly dominated by a constraint block. If yes, the variable is considered validated for that basic block and considered not reachable.

Going back to the same example, the index gets validated in <mlil block: [email protected]> which dominates the <mlil block: [email protected]>. Therefore, by checking a constraint block against dominators it is possible to establish reachability.

False positives with dominators

While the dominance relationship was promising and gives good results, it does give rise to certain false positives. Consider the following CFG, where validation is done in two different program paths:

Here, the index is validated in two different program paths before hitting a potential sink block <mlil block: [email protected]>. However, none of the constraint blocks that perform validation <mlil block: [email protected]> and <mlil block: [email protected]> are dominators. In such cases, since constraint blocks cannot be mapped to any dominators, the potential sink block <mlil block: [email protected]> will be considered vulnerable on read access at address 0x11ba. This is a false-positive result.

The same is the case with the branch_dependence property, which returns a dictionary of branch instruction index and the branching decision taken to reach the instruction. When both the True and False branches dominate the instruction basic block, we do not get information regarding reachability.

A general observation from the scan result is that most constraints are part of dominator blocks. Very few are validated across multiple code paths, producing false positives. Since path-finding algorithms relying on the definition and usage of variables eliminate these false-positive results, I preferred it over dominators. However, the code is still in the repository for experimental purposes.

Notes on Analysis

The target ndbd executable is loaded in Binary Ninja to generate the BNDB analysis database. Then the analyzer is executed against ndbd.bndb for faster analysis:

        python3 mysql_bugs.py --function_hooks functions_to_hook.json ndbd.bndb

Though not optimized for speed, the analyzer runs for about 4-5 minutes and returns 195 results. Some of the results are duplicates because a single vulnerability in a helper function might get used by multiple handlers. The analyzer was able to find most of the issues already known as well as a few that were not previously known: ZDI-CAN-15120, ZDI-CAN-15121 and ZDI-CAN-15122. However, there is a high triage cost associated with static analysis results, especially when the target is less familiar. Luckily, my colleague Lucas had already spent a fair amount of time on the codebase making it easier to triage the results.

Source code for the project can be found here.

Acknowledgments and References

•     Thanks to Lucas Leong for all the discussions and triage. His blog post MindShaRE: When MySQL Cluster Encounters Taint Analysis is an excellent reference for performing static taint analysis on MySQL Cluster.
•     Tainted Flow Analysis on e-SSA-form Programs which served as an excellent reference for taint analysis as a graph reachability problem.
•     Various blog posts from Trail of Bits on Binary Ninja.
•     Josh Watson for various projects using Binary Ninja. The visitor class implementation is based on emILator.
•     Jordan for all the code snippets and the Binary Ninja slack community for answering various questions.

Conclusion

I hope you have enjoyed this look at using Binary Ninja to find vulnerabilities through taint analysis. In a few days, I’ll be back to discuss using Clang Static Analyzer (CSA) for detecting untrusted pointer dereferences and tainted loop conditions. Until then, you can find me on Twitter @RenoRobertr, and follow the team for the latest in exploit techniques and security patches.

Static Taint Analysis using Binary Ninja: A Case Study of MySQL Cluster Vulnerabilities

Clang Checkers and CodeQL Queries for Detecting Untrusted Pointer Derefs and Tainted Loop Conditions

23 February 2022 at 17:49

In the first blog of the series, we saw how CodeQL and Clang checkers can be used to find bugs in MySQL Cluster involving untrusted size arguments leading to buffer overflow or out-of-bound array accesses. Though the majority of bugs were out-of-bounds array accesses, there were also bugs involving untrusted data being used in loop conditions or casted to pointers. In this final blog of the series, we experiment with CodeQL’s IR and Clang checkers for detecting such bug classes.

Defining taint sources - CSA vs CodeQL

The first part of the problem is defining the taint sources. Clang Static Analyzer (CSA) provides an experimental checker alpha.security.taint.TaintPropagation for performing static taint analysis. Implemented as GenericTaintChecker, it has a set of built-in taint sources. Since the Signal data used by message handlers does not come directly from these built-in taint sources, it is necessary to find an alternative approach to define taint sources. Lucas Leong had a quick approach to work around the problem - rewrite all access to signal->theData with getenv(). Since the return value of getenv() is considered tainted by GenericTaintChecker, taint propagation works after rewrite. Refer to his blog post “MindShaRE: When MySQL Cluster Encounters Taint Analysis” for details on using existing taint checkers and CodeQL to find memory corruption bugs.

One has to be careful with this approach to avoid missing potential taint sources. Though MySQL Cluster has defined APIs like Signal::getDataPtr() to fetch the Signal data pointer, various handlers don’t follow the standard. Instead, signal->theData is accessed in numerous other ways from offset 0 or from various other offsets. I rewrote some common patterns using find and sed. This may not be exhaustive but provides sufficient taint sources to experiment with the checkers.

The situation is much different when working with CodeQL, where defining taint sources is far more flexible. All we have to do is override the isSource() predicate to something like this:

CWE-822: Untrusted Pointer Dereference

In an untrusted pointer deference vulnerability, data from an untrusted source is casted to a pointer for further use. The impact of the bug depends on the usage of the casted pointer. This section provides details about detecting such pointer load operation using CSA and CodeQL.

Detecting untrusted pointer dereference using CSA

To detect this vulnerability using CSA, I relied on path-sensitive checker callback check::Location which fires every time a program accesses memory for read or write. The information below is from the checker documentation:

The SVal symbolic value Loc points to the memory region being accessed. IsLoad indicates whether the access is a read or a write. To check if a memory load is happening from a tainted memory region, we can check if Loc is tainted. If so, we can use the statement S to get the type of the expression:

Whenever the expression type is a pointer, it is logged as a bug. Scanning MySQL Cluster with this checker gave about 86 results, which is significantly high and also has false positives. Here is what the scan results look like:

The large number of results is due to the fact that any pointer derived from a tainted value (via pointer arithmetic, array indexing, etc.) is also considered tainted, irrespective of whether the tainted value is constrained by validation. In some cases, OOB reads which load pointers are also reported. Understanding the nature of false-positive results significantly improves the triage experience.

Considering the high number of results, we can sort the results by “File” which is likely to organize the result by feature, or for a less familiar code base, sorting by “Path Length” can help identify code paths reachable with minimum complexity.

scan-view results sorted by File

scan-view results sorted by Path Length

CSA reported a number of bugs in the code handling DBDIH block with a path length of 1. Here is an example bug report:

Using a known bug pattern, it is possible to apply filters to reduce the results in addition to the taint check. For example, we can check the AST statements for pointer loads which are only assignment statements like above. The checker can be tweaked as per the nature of bugs in the codebase.

Analyzing memory loads using CodeQL IR

To perform a similar analysis in CodeQL, one can possibly rely on expression classes such as PointerDereferenceExpr, VariableAccess, FieldAccess and its subtypes. But in this case, I was curious to explore CodeQL’s Intermediate Representation (IR) to track memory loads. Since the documentation around CodeQL’s IR is limited, I used the Instruction class to dump the IR for further analysis. Here is a partial dump of IR for the line EmulatedJamBuffer * jambuf = (EmulatedJamBuffer*)req->jamBufferPtr in function execDIH_SCAN_TAB_REQ():

The CodeQL query used to dump the IR along with source code information:

Consider the below set of instructions in the IR:

Here the LoadInstruction relies on 2 operands - a source address operand (r17303_4) and a source value operand (m17300_12). The source value operand is a MemoryOperand. The MemoryOperand describes the memory address accessed by the instruction, whose value gets copied to the result register (r17303_5). Moreover, the result is also associated with type information. The idea was to filter all tainted memory load operations whose return type is a pointer. However, such a query resulted in an unusably large number of results.

Adding constraints using overlap relationship and virtual variables

Consulting CodeQL’s IR SSA Construction documentation, I found a couple of features that could be useful for refining the query: the overlap relationship and VirtualVariables.

Overlap defines the relationship between the definition of a memory location and its usage. The most interesting relationship for this analysis is MustExactlyOverlap - the set of bits written by the definition is identical to the set of bits read by the use, and the data type of both the definition and the use are the same. For more clarity on the overlap relationship, refer to the SSA test case found in the repository. In the case of untrusted pointer load bugs, it is very likely that there will be a mismatch of data type between definition and usage (type casting) or that the number of bits read will be a subset of the definition. Therefore, we can skip Load operations where the source value operand has an exactly overlapping relationship with its definition. The getOperandMemoryLocation() predicate provides information regarding the MemoryLocation read by a memory operand. Consider the output of the following query:

The memory operands that are marked with tilde “~” are the ones that do not have an exactly overlapping relationship. This can also be checked using isDefinitionInexact() predicate. Putting all this together, we can override the isSink() predicate to get a meaningful number of results to work with.

Many of the false-positive results were due to multiple load operations performed using some misidentified pointer. These results can be either skipped or removed by adding further constraints to the query, such as ignoring certain variable names in the sink or checking for AST patterns.

In the case of MySQL Cluster, one such constraint explored for filtering memory access to the Signal structure is VirtualVariable. As per the documentation, “Each MemoryLocation is associated with exactly one VirtualVariable. A VirtualVariable represents a set of MemoryLocations such that any two MemoryLocations that overlap have the same VirtualVariable.” VirtualVariable information can be fetched from a MemoryLocation using the getVirtualVariable() predicate.

When the variable names are consistent, we can add a constraint to consider only Load operations where the memory operand points to the signal variable. A more generic option is to fetch the type information of the virtual variable to check if it is a Signalstructure. Such a query is still restrictive (not as much variable name) but significantly reduces false positives and returns results involving jamBufferPtr:

CWE-606: Unchecked Input for Loop Condition

In CWE-606: Unchecked Input for Loop Condition, values from an untrusted source are used for loop termination conditions. This may lead to a DoS or other issues depending on the operations done in the loop body. This section provides details about detecting such tainted loop conditions using CSA and CodeQL.

Detecting tainted loop condition using CSA

Unlike tainted memory loads, I couldn’t find any path-sensitive callback to trigger on loop conditions. Moreover, AST-based matchers without path-sensitivity are not useful in this case. Therefore, I relied on the check::BranchCondition callback which fires every time a control flow branching occurs during analysis. The following information is from the checker documentation:

The idea here is, whenever the callback triggers due to a conditional statement, walk up the AST using ParentMap and check for a loop statement. Here is what an example AST looks like for while loop:

For loop operations we are only interested in 3 AST statement classes – WhileStmtClass, DoStmtClass and ForStmtClass. Below is the loop detection code used in the checker:

Once we know that the condition statement triggering the callback is part of a loop statement, we can check if it is tainted. Specifically, I wanted to look for some common code patterns involving induction variables compared against untrusted values, or untrusted values used as induction variables to decide on loop termination. Consider a couple of common code patterns below that involve explicit binary comparison operations:

•      The induction variable is compared against a potentially untrusted value. For example, RHS could be tainted and LHS is not.
•      The induction variable could be potentially untrusted. Here LHS value is compared against constant 0 on RHS.

 The checker specifically looks for these cases to make decisions on bugs. When RHS is constant 0, check if LHS is tainted. Otherwise, check if RHS is tainted:

Another common code pattern is unary operations in loop statements, especially while and do while loops:

No special breakdown is necessary in case of unary operations in the way we handled binary operations. However, in certain cases, implicit Boolean conversions result in implicit conditional statements received by the check::BranchCondition callback. Here is what the AST looks like:

To handle these, if any non-binary conditional operations have a Boolean outcome, it is possibly due to implicit casting and hence we get the sub-expression associated with the expression.

Evaluating constraints on tainted variables

While it is possible to know if an input is tainted or not, there is no way to know if the tainted input value is constrained by some validation. Without this information, the checker is going to generate a lot of false-positive results. To solve this problem, I used the solution provided by Andrew Ruef in the blog post Trail of Bits - Using Static Analysis and Clang to Find Heartbleed. Since Signal data is treated as an unsigned 32-bit integer in most cases, I queried if the variable can take a value greater than 0x10000. If yes, consider the variable as unconstrained and log the bug.

Configuring analyzer-max-loop in CSA

Clang analyzer has a configuration parameter to choose the number of times a basic block in a loop gets executed. By default, this value is 4.

So how does it matter in our analysis? Consider the below code:

Here is buffer->index is validated and the loop operation is considered safe. But the checker still reports a false positive bug.

What seems to be happening here is, the symbolic expression gets evaluated analyzer-max-loop number of times i.e., for each visit to the basic block.

The analyzer seems to evaluate that the decrement operation can underflow resulting in a value greater than 0x10000, therefore reporting it as a bug. I’m not entirely sure of the right way to solve this issue, but a quick workaround is to set analyzer-max-loop to 1. This can also be done in scan-build using the maxloop parameter. In this case, the expression is evaluated only once and it does not report the bug.

        (reg_$3<unsigned int Element{SymRegion{conj_$2{char *, LC2, S26616, #1}},0 S64b,struct index_buf}.index>) > 65536U

The scan reported around 46 bugs, with some valid ones including previously found issues like ZDI-CAN-14488, ZDI-CAN-15120, ZDI-CAN-15121.

Here is the example bug report for vulnerability in Dbdict::execLIST_TABLES_CONF():

Detecting tainted loop condition using CodeQL

Using the analysis done earlier for CSA, an equivalent query in CodeQL can be implemented using the Loop class. The isSink() predicate essentially looks as follows:

Building and Performing the Scan

In order to build clang with custom checkers, copy the source files to the clang/lib/StaticAnalyzer/Checkers directory in the LLVM toolchain. Then add the filenames to the CMakeLists:

Also update the clang/include/clang/StaticAnalyzer/Checkers/Checkers.td file to add the package information:

Once you make clang, the new checkers will be listed alongside other alpha checkers.

Now we can use scan-build to perform the analysis. Use the maxloop and use-analyzer configurations if necessary.

Building with checkers can be slow. To speed up the scan, limit the target path by either modifying the Makefile as necessary or use the custom Makefile by Lucas. All this analysis was performed on clang 12.0 and MySQL Cluster 8.0.25

Scanning with CodeQL requires creating a database, followed by running any queries we are interested in against the created database.

The source code for the Clang checkers and CodeQL queries can be found here.

Acknowledgments and References

•      MindShaRE: When MySQL Cluster Encounters Taint Analysis
•      Clang Static Analyzer - A Checker Developer's Guide
•      A Memory Model for Static Analysis of C Programs
•      Using Static Analysis and Clang to Find Heartbleed
•      Source code of existing clang checkers
•      Vulnerability hunting with Semmle QL
•      CodeQL IR SSA Construction

Conclusion

We hope you’ve enjoyed this look at finding bugs using Clang Static Analyzer, CodeQL, and Binary Ninja. As ZDI Vulnerability Analysts, these have proved helpful in finding new bugs. If you use these (or other) tools to find bugs of your own, consider submitting them to our program. Until then, you can find me on Twitter @RenoRobertr, and follow the team for the latest in exploit techniques and security patches.

Clang Checkers and CodeQL Queries for Detecting Untrusted Pointer Derefs and Tainted Loop Conditions

  • There are no more articles
❌