πŸ”’
❌
There are new articles available, click to refresh the page.
Before yesterdaySentinelLabs

Another Brick in the Wall: Uncovering SMM Vulnerabilities in HP Firmware

10 March 2022 at 13:00

By Assaf Carlsbad & Itai Liba

Executive Summary

  • SentinelLabs has discovered 6 high severity flaws in HP’s UEFI firmware impacting HP laptops and desktops.
  • Attackers may exploit these vulnerabilities to locally escalate to SMM privileges.
  • SentinelLabs findings were proactively reported to HP on Aug 18, 2021, and are tracked as:
    • CVE-2022-23956, marked with a CVSS score of 8.2
    • CVE-2022-23953, marked with a CVSS score of 7.9
    • CVE-2022-23954, marked with a CVSS score of 7.9
    • CVE-2022-23955, marked with a CVSS score of 7.9
    • CVE-2022-23957, marked with a CVSS score of 7.9
    • CVE-2022-23958, marked with a CVSS score of 7.9
  • HP has released a security update to its customers to address these vulnerabilities.
  • At this time, SentinelOne has not discovered evidence of in-the-wild abuse.

Hello and welcome back to yet another post in our blog post series covering UEFI & SMM security. This is the 6th (!) entry in the series, and it’s a good spot to pause for a second and look back to better estimate the vast distance we covered: from the baby steps of merely dumping and peeking at UEFI firmware, through the development of emulation infrastructure for it, and up to the point where we learned how to proactively hunt for SMM vulnerabilities. This post will continue where we left last time and will further explore SMM vulnerabilities, albeit from a slightly different angle.

So far, the SMM bug hunting methodology we came up with is mostly manual and goes roughly as follows:

  1. Obtain a UEFI firmware image of interest, either by dumping it from the SPI flash or, when possible, downloading it directly from the vendor’s website.
  2. Extract the encapsulated SMM binaries via tools such as UEFITool or UEFIExtract.
  3. Open the SMM images one by one in IDA and analyze them using efiXplorer, while keeping a keen eye for vulnerable code patterns like the ones described in the previous part.

Needless to say, this process is extremely slow, inaccurate, and cumbersome. After doing it repetitively over and over again, we were so unsatisfied with it that we decided to take the intuition and rules-of-thumb we developed and codify them into the form of an automated tool. The outcome of this endeavor is an IDA-based vulnerability scanner for SMM binaries we named Brick. For the benefit of the firmware security community, we decided to publish it as an open-source project that is readily available on GitHub.

In this post, we’ll introduce readers to Brick, its internal architecture, and its bug-hunting capabilities. Afterward, we’ll present a case study where we demonstrate how Brick was used to discover 6 different vulnerabilities affecting the firmware of some HP laptops. By doing so, we hope to encourage more people in the community to contribute back to Brick, as well as to educate the readers about the potential strengths (and weaknesses) of automated vulnerability hunting.

Enjoy the read!

Automated SMM Vulnerability Hunting Using Brick

As we said, Brick was developed to pinpoint certain vulnerabilities and anti-patterns inside SMM binaries. To effectively pull this off, its execution lifecycle is split into three different phases: harvest phase, analysis phase, and summary phase. Following is a detailed description of each phase.

Figure 1 – Schematic overview of Brick’s execution lifecycle

Harvest Phase

In the vast majority of cases, it’s most useful to give Brick a complete UEFI firmware image to scan. Doing so allows the researcher to β€œsqueeze” the most vulnerabilities out of it while also gaining a bird-eye view of the code quality of the firmware as a whole. Alas, a typical UEFI firmware image is a complex beast that contains much more than SMM binaries. Among other things, it usually includes other stuff such as

  • Non-SMM executable modules for the different boot phases (PEI/DXE/etc.)
  • Microcode updates for the target CPU to be applied during early boot
  • Various Authenticated Code Modules (ACMs) signed by Intel, such as Boot Guard and BIOS Guard
  • A store for NVRAM variables
  • And much more
Figure 2 – SMM binaries are by no means the only file type stored inside a firmware image

Because of that, our first task is to separate the wheat from the chaff. In Brick’s terminology, this is accomplished by the harvest phase. During this phase, Brick will parse the firmware image and extract out of it just the SMM binaries we’re interested in.

To invoke Brick and kickstart the harvest phase, just pass the full path of the firmware’s image to the Brick.py script:

Figure 3a – The harvest phase in action

Internally, the harvest phase is implemented by offloading most of the actual work to two external tools/libraries:

The reason we use two different solutions for this phase is that we encountered several cases where one of them struggled to properly parse a UEFI image, while the other succeeded without any hurdles. Thus, the strategy of using one of them and falling back to the other in case of failure gives us just the right amount of redundancy we need to successfully handle the vast majority of firmware images encountered in the wild.

At the end of the harvest phase β€” given that all went well β€” the output directory should contain several dozens of SMM binaries waiting for further examination.

Figure 3b – The output directory at the end of the harvest phase

Note that in addition to full UEFI firmware images, Brick also supports other input formats in case you want to limit bug hunting to a narrower scope. These include

  • A single executable file (e.g. foo.efi)
  • Directory that contains multiple SMM binaries
  • UEFI capsule update package
  • Various other options (see the source code for the complete list of supported options).

Analysis Phase

At this point, we have a directory filled with the SMM binaries we’re interested in analyzing. The rough idea was to open each SMM binary in IDA and β€” after the initial autoanalysis completes β€” run some custom IDAPython scripts on top of it to do the actual bug hunting. This must be done intelligently, as a naive solution for this problem would suffer from two severe downsides:

  1. Analyzing SMM binaries one at a time is not very efficient performance-wise. For this reason, we should strive at parallelizing the whole process while taking advantage of multiple CPU cores.
  2. IDA is mostly used in an interactive fashion, and while there exists a batch mode for non-interactive usage, it’s often overlooked as it’s not very convenient to use.

Luckily for us, it didn’t take us too long to bump into a project called idahunt that solves these exact two problems. Put in the author’s own words:

β€œidahunt is a framework to analyze binaries with IDA Pro and hunt for things in IDA Pro. It is a command-line tool to analyze all executable files recursively from a given folder. It executes IDA in the background so you don’t have to manually open each file. It supports executing external IDA Python scripts.”

Figure 4 – Overview of using idahunt to speed up the scanning process

The IDAPython scripts executed by idahunt on behalf of Brick are known as Brick modules and come in three different flavors:

  • Processing modules, which are in charge of doing some initial preparatory work and handling some of the shenanigans of UEFI.
  • Hunting modules that employ a wide range of heuristics to pinpoint potential vulnerabilities. Usually, there exists a dedicated module for each of the vulnerability classes described earlier.
  • Informational modules emit valuable information about the target image that is not necessarily tied to vulnerabilities. This includes, for example, the list of unrecognized UEFI protocols consumed by the image.
Figure 5 – Overview of the various Brick modules

While developing these Brick modules, we found the raw IDAPython API to be a bit rough at times, so for the most part the modules were developed on top of a wrapper framework called Bip. One of the major highlights of this framework is that it also exposes wrapper functions for the Hex-Rays Decompiler API, which allows writing analysis routines in a fairly high-level notion.

Figure 6 – The analysis phase, running 8 concurrent IDA instances in the background

Summary Phase

After all SMM images in the input directory were scanned, Brick will move on to collect the output emitted by individual modules and merge them into a single, browsable HTML report.

Note that in addition to the scan’s verdict, the report file also includes links to some useful resources such as the annotated IDB file (necessary for validating the correctness of the results), the raw IDA log file (useful for troubleshooting and debugging), as well as a separate report file generated by efiXplorer.

Figure 7 – Portion of a Brick report produced for some firmware image

Case Study – Using Brick to Uncover HP Firmware Vulnerabilities

Throughout the past year, we were using Brick extensively to review various firmware images from almost all leading manufacturers in the industry. So far, this campaign is definitely paying off and has already given birth to no less than 13 different CVEs (see Appendix A). In this case study, we would like to put a spotlight on several such vulnerabilities found while auditing one particular firmware image from HP (version 01.04.01 Rev.A for HP ProBook 440 G8 Notebook). After Brick’s scan was completed, we opened the resulting report file and were faced with a rather intriguing entry:

Figure 8 – The SMM module 0155.efi does not validate certain nested pointers

This entry immediately drew our attention because, if confirmed correct, it means that the SMI handler installed by the SMM image 0155.efi does not validate certain pointers that are nested within its communication buffer. As we explained in the previous post, that in turn implies the handler can be exploited by attackers to corrupt or disclose the contents of SMRAM.

In this section, we’ll elaborate on how Brick managed to find such vulnerability in a completely automated fashion. For that, we’ll walk you through the internal workings of some Brick modules that were involved in making this verdict. Note that due to the medium of a written article, the case study will be presented using snapshots of the IDA database, before and after each module invocation. In reality, however, all modules will be executed automatically one after another, without any user interaction.

Preprocessor

The first Brick module that is called to handle any input file is called the preprocessor. The preprocessor sets up the ground for the next modules in the chain and takes care of the following:

  • Making the .text section read-write, which prevents the decompiler from performing some excessive optimizations.
  • Discovering functions that the initial auto-analysis missed (based on codatify).
  • Scraping the edk2 and edk2-platforms repositories for protocol header files and attempting to import them into the IDA database. The net result is that the database is filled with a plethora of UEFI protocol definitions:
Figure 9 – Some UEFI protocols that were imported from EDK2 by the preprocessor

efiXplorer

Right after the preprocessor, Brick moves on to load and run the efiXplorer plugin. As we mentioned countless times throughout the series, efiXplorer has tons of functionality and serves as the de-facto standard way of analyzing UEFI binaries with IDA. Among other things, it takes care of the following:

  • Locating and renaming known UEFI GUIDs
  • Locating and renaming calls to UEFI boot/runtime services
  • Applying correct types for interface pointers
Figure 10 – Pseudocode from a decompiled function before efiXplorer was invoked
Figure 11 – The same function, after efiXplorer analysis

Last but not least, efiXplorer is also capable of locating and renaming SMI handlers. In its recent editions, it prefixes all CommBuffer-based SMIs with SmiHandler’, and all legacy software SMIs with β€˜SwSmiHandler’. As can be seen, in the case of 0155.efi, only one SMI handler seems to exist:

Figure 12 – the SMI handler found by efiXplorer

Postprocessor

Following efiXplorer, control is passed to the postprocessor. The postprocessor is a module that is in charge of completing the analysis performed earlier by efiXplorer. Among other things, this includes:

  • Locating SMI handlers that efiXplorer might have missed
  • Fixing the function prototype for some UEFI services such as GetVariable()/SetVariable()
  • Renaming function arguments

In the context of this case study, the most important feature of the postprocessor is the handling of calls to EFI_SMM_ACCESS2_PROTOCOL. In a nutshell, this protocol is used to control the visibility of SMRAM on the platform. As such, it exposes the respective methods to open, close, and lock SMRAM.

Figure 13 – interface definition for EFI_SMM_ACCESS2_PROTOCOL, source: Step to UEFI

In addition to those, this protocol also exposes a method called GetCapabilities(), which can be used by clients to query the memory controlled for the exact location of SMRAM in physical memory. Upon return, this function fills in an array of EFI_SMRAM_DESCRIPTOR structures that informs the caller what regions of SMRAM exist, what is their size, state (open vs. close), etc.

Figure 14 – documentation of the GetCapabilities() function, source: Step to UEFI

In EDK2 and its derived implementations, the common practice is to store these EFI_SMRAM_DESCRIPTORS as global variables so that they could be consumed by other functions in the future. As part of its operation, the postprocessor scans the input file for calls to GetCapabilities() and marks the SMRAM descriptors in a way that will make it easy to recover them afterward. This includes both retyping them as 'EFI_SMRAM_DESCRIPTOR *' as well as renaming them to have a unique, known prefix. The significance of this operation will be clarified shortly.

Figure 15 – Calling GetCapabilities(), before running the postprocessor
Figure 16 – Same code, after applying the postprocessor

Reconstructing the CommBuffer

Initially, the type assigned to the CommBuffer in the SMI handler’s signature is VOID *. This is adequate, as the structure of the CommBuffer is not known in advance and it’s the responsibility of the handler to correctly interpret it. Still, figuring out the internal layout of the Communication Buffer will be of great aid because it will let us know whether or not it contains nested pointers.

Usually, such tasks are completed manually as part of the reverse engineering process, but in Brick we needed to pull this off automatically. The two most prominent and successful IDA plugins for doing so are HexRaysPyTools and HexRaysCodeXplorer. Based on our experience, HexRaysPyTools produced more accurate results, while HexRaysCodeXplorer is better suited for non-interactive use. Eventually, the scriptability capabilities of HexRaysCodeXplorer tipped the scale in its favor and so it was incorporated into Brick.

Figure 17 – HexRaysCodeXplorer can be invoked from an IDAPython script

At this stage, all SMI handlers present in the image were already identified so Brick can iterate over them and invoke HexRaysCodeXplorer on the associated CommBuffer to reconstruct its internal structure. Doing so for the SMI handler from 0155.efi yields the following structure, which holds two members (field_18 and field_28) that are presumably pointers by themselves:

Figure 18 – the reconstructed structure of the Comm Buffer

How did HexRaysCodeXplorer get to this conclusion? To answer this question, let’s take a closer look at the handler’s code itself:

Figure 19 – The SMI handler forwards CommBuffer->field_18 to sub_17AC

As can be seen, during the course of its operation, the handler passes CommBuffer->field_18 as the 2nd argument to the function sub_17AC. This function, in turn, forwards it to CopyMem(), where it is used as the destination buffer. Based on the signature of CopyMem(), we know the destination buffer is in fact a pointer. That means the argument for sub_17AC is also a pointer by itself and therefore β€” due to the transitivity of assignments β€” CommBuffer->field_18 must be a pointer as well! The same logic also applied to field_28, even though we won’t show it here.

Figure 20 – The 2nd argument is forwarded as the destination buffer for CopyMem()

Resolving SmmIsBufferOutsideSmmValid

Now that it knows the CommBuffer does contain some nested pointers, Brick moves on and checks if these pointers are being sanitized properly. That is a two-fold operation:

  1. Locating the function SmmIsBufferOutsideSmmValid() in the input binary.
  2. If found, check that it is aptly used to sanitize the nested pointers.

Let’s start with resolving SmmIsBufferOutsideSmmValid(). As we mentioned in the previous part, SmmIsBufferOutsideSmmValid() is statically linked to the binary and thus locating it is not a trivial problem. To pull this off, we compiled a heuristic comprised of three conditions. Brick will iterate over all of the functions in the IDA database and try to find a function that matches all three. The heuristic goes as follows:

  1. The function at hand must receive two integer arguments – the first used as the buffer’s address and the second as its size. With the help of Bip’s API, checking for these properties is rather trivial:
    def check_arguments(f: BipFunction):
        # The arguments of the function must match (EFI_PHYSICAL_ADDRESS, UINT64)
        if (f.type.nb_args == 2 and \
            isinstance(f.type.get_arg_type(0), BTypeInt) and \
            isinstance(f.type.get_arg_type(1), BTypeInt)):
            return True
        else:
            return False
    

    Figure 21 – Matching the arguments of SmmIsbufferOutsideSmmValid

  2. The function at hand must return a BOOLEAN value. From the perspective of the decompiler, BOOLEAN values are just plain integers, so if we want to make this distinction we must go over all the return statements in the function and check if the returned value is a member of the set {0,1}. In Bip, this can also be accomplished very easily:
    def check_return_type(f: BipFunction):
        if not isinstance(f.type.return_type, BTypeInt):
            # Return type is not something derived from an integer.
            return False
     
        def inspect_return(node: CNodeStmtReturn):
            if not isinstance(node.ret_val, CNodeExprNum) or node.ret_val.value not in (0, 1):
                # Not a boolean value.
                return False
     
        # Run 'inspect_return' on all return statements in the function.
        return f.hxcfunc.visit_cnode_filterlist(inspect_return, [CNodeStmtReturn])
    

    Figure 22 – Checking the function actually returns a BOOLEAN value

  3. Lastly, we know that SmmIsBufferOutsideSmmValid() uses an array of EFI_SMRAM_DESCRIPTORS to keep track of active SMRAM ranges, so we expect the candidate function to reference at least one of them. Because global EFI_SMRAM_DESCRIPTORS were already marked earlier by the postprocessor, checking for xrefs between the function and the descriptors becomes straightforward:
    def references_smram_descriptor(f: BipFunction):
        # The function must reference at least one SMRAM descriptor.
        for smram_descriptor in BipElt.get_by_prefix('gSmramDescriptor'):
            if f in smram_descriptor.xFuncTo:
                return True
        else:
            # No xref to SMRAM descriptor.
            return False
    

    Figure 23 – Checking the function references an EFI_SMRAM_DESCRIPTOR

Are these heuristics bulletproof and guarantee they will always match SmmIsBufferOutsideSmmValid() in the binary? Of course not! But more often than not they do the trick, and that’s what matters. In the HP case, the heuristics didn’t fail and managed to find a proper match:

Figure 24 – Matching SmmIsBufferOutsideSmmValid()

Nested Pointers Validation

Once SmmIsBufferOutsideSmmValid() is matched, Brick verifies it is being used properly by the SMI handler. For that, it iterates over all calls to SmmIsBufferOutsideSmmValid() and tries to deduce if all nested pointers are being covered by it. In 0155.efi, it notices there is only one call to SmmIsBufferOutsideSmmValid() that is used to validate field_28. That implies no validation takes place over the second nested pointer, namely field_18, so it flags the handler as vulnerable.

Figure 25 – SmmIsBufferOutsideSmmValid validates one field while neglecting the other one

To be fair, we were quite lucky to encounter such a clear-cut case as the one above. If the control flow was a bit more convoluted, there is a decent chance Brick’s verdict would become more ambiguous.

Impact

We already saw that depending on the exact flow the handler takes, it might end up calling sub_17AC. This function gets an argument that is derived from CommBuffer->field_18 and will later forward it as the destination address for CopyMem(). The contents of the CommBuffer are fully controllable by the attacker and, leveraging the missing validation, he or she can craft a buffer whose field_18 points to an arbitrary SMRAM address of their choice. As a result, the SMRAM region pointed to by that address will get corrupted by the time CopyMem() gets called.

Figure 26 - CommBuffer->field18 is passed from SmiHandler through sub_17AC and ends up at CopyMem
Figure 26 – CommBuffer->field18 is passed from SmiHandler through sub_17AC and ends up at CopyMem

How to cause the handler to actually call sub_17AC, and how to promote this memory corruption into an arbitrary code execution in SMM are left as exercises to the diligent reader.

Low SMRAM Corruption

In addition to the nested pointer vulnerability present in 0155.efi, the HP firmware image also suffered from 5 additional, less severe issues that enable attackers to corrupt the low portion of SMRAM. All five vulnerabilities are isomorphic to each other, so we’ll focus on the simplest case found in 017D.efi:

Figure 27 – Low SMRAM corruption discovered in 017D.efi

As we mentioned in the previous post, these vulnerabilities arise when an SMI handler writes data to the communication buffer without first validating its size. Attackers can place the CommBuffer just below SMRAM, which will cause unintended corruption once the handler performs the write to it.

We also noted that SMI handlers can shield themselves from these problems by performing one or both of the following actions:

  1. Calling SmmIsBufferOutsideSmmValid on the CommBuffer with the exact size expected by the handler.
  2. Dereferencing the provided CommBufferSize argument (a pointer to an integer value holding the size of the buffer), then comparing the result against the expected size.

Therefore, to detect this class of vulnerabilities, Brick searches for SMI handlers that omit both checks. Unlike the previous case, this time the heuristics employed to resolve SmmIsBufferOutsideSmmValid() bore no fruit, so Brick simply assumes it’s absent from the binary and moves on to check if CommBufferSize is being dereferenced. This is achieved by traversing the AST associated with the handler, looking for nodes that correspond to a dereference operation (cot_ptr in the Hex-Rays terminology). The child node of a dereference operation in the tree represents the variable being dereferenced, so Brick can check if it’s CommBufferSize.

Figure 28 – The portion of the AST that corresponds to dereferencing CommBufferSizejust above

If such a pair of nodes is found, it tells us that the C source code for the handler contained the expression: *CommBufferSize, so we can assume the programmer intended to compare that value against some anticipated size.

Figure 29 – The corresponding C source code for the dereference operator

Using Bip, implementing this heuristic is easy and only takes a handful of Python lines:

def dereferences_CommBufferSize(handler: BipFunction):
    # CommBufferSize is the 3rd argument of the SMI handler
    CommBufferSize = handler.hxcfunc.args[2]
 
    if not CommBufferSize._lvar.used:
        # CommBufferSize is not touched at all.
        return False
 
    def inspect_dereference(node: CNodeExprPtr):
        child = node.ops[0].ignore_cast
 
        if isinstance(child, CNodeExprVar) and child.lvar == CommBufferSize:
            # This is confusing, we return False just to signal the search to stop.
            return False
 
    # Run 'insepct_dereference' on all dereference expressions in the function.
    return not handler.hxcfunc.visit_cnode_filterlist(inspect_dereference, [CNodeExprPtr])

Figure 29 – Implementing the heuristic in python

Unfortunately, this heuristic yields no results, so Brick now knows CommBufferSize is not being dereferenced and as a result marks the handler as vulnerable.

Figure 31 - Brick’s assessment of 017D.efi
Figure 30 – Brick’s assessment of 017D.efi

Conclusion

As can be judged by the number of CVEs it has already generated, we believe Brick is a very promising project that takes a big step in the right direction of harnessing automation to streamline the bug hunting process. This feeling we have was even reinforced recently when a related project called FwHunt was released. FwHunt attempts to solve the same set of problems as Brick, only using strict rule-sets rather than more relaxed heuristics.

Using automation, rules, heuristics, and other static code analysis techniques to crack through complex problems are very much desirable, but it’s always important to remember that reality is more complex than how we describe it. As such, occasional edge conditions that cause Brick and other automated tools to generate false positives and false negatives from time to time are inevitable.

That is perfectly acceptable, as long as we keep in mind that these tools were never intended to fully replace a human analyst, but rather empower him to handle larger and larger quantities of data. Eventually, it’s not the tool itself that makes the difference, but rather the human being that chooses how to use it, on what targets, and how to interpret its findings.

If you’re interested in learning more about the subject, come attend the upcoming Insomnihack conference, where we will be delivering a talk about some more SMM vulnerabilities, found this time in the Intel codebase.

See you there!

Appendix A – List of CVEs by Brick

CVE ID CVSS score Vendor
CVE-2021-36342 7.5 Dell
CVE-2021-44346 ? Gigabyte
CVE-2021-0157 8.2 Intel
CVE-2021-0158 8.2 Intel
CVE-2021-42055 6.8 ASUS
CVE-2021-3599 6.7 Lenovo
CVE-2021-3786 5.5 Lenovo
CVE-2022-23956 8.2 HP
CVE-2022-23953 7.9 HP
CVE-2022-23954 7.9 HP
CVE-2022-23955 7.9 HP
CVE-2022-23957 7.9 HP
CVE-2022-23958 7.9 HP

Appendix B – References and Further Reading

Zen and the Art of SMM Bug Hunting | Finding, Mitigating and Detecting UEFI Vulnerabilities

3 March 2022 at 16:51

It’s been almost a full year since we published the last part of our UEFI blog posts series. During that period, the firmware security community has been was more active than ever and produced several high-quality publications. Notable examples include the discovery of new UEFI implants such as MoonBounce and ESPecter, and the recent disclosure of no less than 23 high-severity BIOS vulnerabilities by Binarly.

Here at SentinelOne, we haven’t been sitting idle either. In the past year, we tried our hand at hunting down and exploiting SMM vulnerabilities. After spending several months doing so, we noticed some repetitive anti-patterns in SMM code and developed a pretty good intuition regarding the potential exploitability of bugs. Eventually, we managed to conclude 2021 after having disclosed 13 such vulnerabilities, affecting most of the well-known OEMs in the industry. In addition, several more vulnerabilities are still moving through the responsible disclosure pipeline and should go public soon.

In this blog post, we would like to share the knowledge, tools, and methods we developed to help uncover these SMM vulnerabilities. We hope that by the time you finish reading this article, you too will be able to find such firmware vulnerabilities yourselves. Please note that this article assumes a solid knowledge of SMM terminology and internals, so if your memory needs a refresher we highly recommend reading the articles in the Further Reading section before proceeding. And now, let’s get started.

Classes of SMM Vulnerabilities

While in theory SMM code is isolated from the outside world, in reality, there are many circumstances in which non-SMM code can trigger and even affect code running inside SMM. Because SMM has a complex architecture with lots of β€œmoving parts” in it, the attack surface is pretty vast and contains among other things data passed in communication buffers, NVRAM variables, DMA-capable devices, and so on.

In the following section, we will go through some of the more common SMM security vulnerabilities. For each vulnerability type, we will provide a brief description, some recommended mitigations as well as a strategy for detecting it while reversing. Note that the list of vulnerabilities is not exhaustive and contains only vulnerabilities that are specific to the SMM environment. For that reason, it will not include more generic bugs such as stack overflows and double-frees.

SMM Callouts

The most basic SMM vulnerability class is known as an β€œSMM callout”. This occurs whenever SMM code calls a function located outside of the SMRAM boundaries (as defined by the SMRRs). The most common callout scenario is an SMI handler that tries to invoke a UEFI boot service or runtime service as part of its operation. Attackers with OS-level privileges can modify the physical pages where these services live prior to triggering the SMI, thus hijacking the privileged execution flow once the affected service is called.

Figure 1 – Schematic overview of an SMM callout, source: CanSecWest 2015

Mitigation

Besides the obvious approach of not writing such faulty code in the first place, SMM callouts can also be mitigated at the hardware level. Starting from the 4th generation of the Core microarchitecture (Haswell) Intel CPUs support a security feature called SMM_Code_Chk_En. If this security feature is turned on, the CPU is prohibited from executing any code located outside the SMRAM region once it enters SMM. One can think of this feature as the SMM equivalent of Supervisor Mode Execution Prevention (SMEP).

Querying for the status of this mitigation can be done by executing the smm_code_chk module from CHIPSEC.

Figure 2 – Using chipsec to query for the hardware mitigation against SMM callouts

Detection

Static detection of SMM callouts is pretty straightforward. Given an SMM binary, we should analyze it while looking for SMI handlers that have some execution flow that leads to calling a UEFI boot or runtime service. This way, the problem of finding SMM callouts is reduced to the problem of searching the call graph for certain paths. Luckily for us, no additional effort is required at all since this heuristic is already implemented by the excellent efiXplorer IDA plugin.

As we mentioned in previous posts in the series, efiXplorer is a one-stop-shop and serves as the de-facto standard way of analyzing UEFI binaries with IDA. Among other things, it takes care of the following:

  • Locating and renaming known UEFI GUIDs
  • Locating and renaming SMI handlers
  • Locating and renaming UEFI boot/runtime services
  • Recent versions of efiXplorer use the Hex-Rays decompiler to improve analysis. One such feature is the ability to assign the correct type to interface pointers passed to methods such as LocateProtocol() or its SMM counterpart SmmLocateProtocol().

A note to Ghidra users: We also want to add that the Ghidra plugin efiSeek takes care of all the changes in the list above. However, it doesn’t include the UI elements like the protocols window and the vulnerability detection capabilities offered by efiXplorer.

After analysis of the input file is complete, efiXplorer will move on to inspect all calls carried out by SMI handlers, which yields a curated listing of potential callouts:

Figure 3 – Callouts found by efiXplorer
Figure 4 – sub_7F8 is reachable from an SMI handler but still calls a boot service located outside of SMRAM

For the most part, this heuristic works great, but we’ve encountered several edge cases where it might generate some false positives as well. The most common one is caused due to the usage of EFI_SMM_RUNTIME_SERVICES_TABLE. This is a UEFI configuration table that exposes the exact same functionality as the standard EFI_RUNTIME_SERVICES_TABLE, with the only significant difference being that, unlike its β€œstandard” counterpart, it resides in SMRAM and is therefore suitable to be consumed by SMI handlers. Many SMM binaries often re-map the global RuntimeServices pointer to the SMM-specific implementation after completing some boilerplate initialization tasks:

Figure 5 – Remapping the global RuntimeService pointer to the SMM-compatible implementation

Calling runtime services via the re-mapped pointer yields a situation that appears to be a callout at first glance, though a closer examination will prove otherwise. To overcome this, analysts should always search the SMM binary for the GUID identifying EFI_SMM_RUNTIME_SERVICES_TABLE. If this GUID is found, chances are that most of the callouts involving UEFI runtime services are false positives. This does not apply to callouts involving boot services, though.

Figure 6 – A false positive caused by calling GetVariable() via the re-mapped RuntimeService pointer

Another source of potential false positives is various wrapper functions which are β€œdual-mode”, meaning they can be called from both SMM and non-SMM contexts. Internally, these functions dispatch a call to an SMM service if the caller is executing in SMM, and dispatches a call to the equivalent boot/runtime service otherwise. The most common example we’ve seen in the wild is FreePool() from EDK2, which calls gSmst->SmmFreePool() if the buffer to be freed resides in SMRAM, or calls gBs->FreePool() otherwise.

Figure 7 – The FreePool() utility functions from EDK2 is a common source of false positives

As this example demonstrates, bug hunters should be aware of the fact that static code analysis techniques are having a hard time determining that certain code paths won’t be executed in practice, and as such are likely to flag this as a callout. Some tips and tricks for identifying this function in compiled binaries will be conveyed in the Identifying Library Functions section.

Low SMRAM Corruption

Description

Under normal circumstances, the communication buffer used to pass arguments to the SMI handler must not overlap with SMRAM. The rationale for this restriction is quite simple: if that wasn’t the case, any time the SMI handler would write some data into the comm buffer β€” for example, in order to return a status code to the caller β€” it would also modify some portion of SMRAM along the way, which is undesirable.

Figure 8 – This situation should not occur

In EDK2, the function responsible for checking whether or not a given buffer overlaps with SMRAM is called SmmIsBufferOutsideSmmValid(). This function gets called on the communication buffer upon each SMI invocation in order to enforce this restriction.

Figure 9 – EDK2 forbids the comm buffer from overlapping with SMRAM

Alas, since the size of the communication buffer is also under the attacker’s control this check on its own is not enough to guarantee sound protection and some additional responsibilities lay on the shoulders of the firmware developers. As we will see shortly, many SMI handlers fail here and leave a gap attackers can exploit to violate this restriction and corrupt the bottom portion of SMRAM. To understand how, let’s take a closer look at a concrete example:

Figure 10 – A vulnerable SMI handler

Above we have a real-life, very simple SMI handler. We can divide its operation into 4 discrete steps:

  1. Sanity checking the arguments.
  2. Reading the value of the MSR_IDT_MCR5 register into a local variable.
  3. Computing a 64-bit value out of it, then writing the result back to the communication buffer.
  4. Return to the caller.

The astute reader might be aware of the fact that during step 3, an 8-byte value is written to the Comm Buffer, but nowhere during step 1 does the code check for the prerequisite that the buffer is at least 8 bytes long. Because this check is omitted, an attacker can exploit this by:

  1. Placing the Comm Buffer in a memory location as adjacent as possible to the base of SMRAM (say SMRAM – 1).
  2. Set the size of the Comm Buffer to a small enough integer value, say 1 byte.
  3. Trigger the vulnerable SMI. Schematically, the memory layout would look as follows:
Figure 11 – Memory layout at the time of SMI invocation

As far as SmmEntryPoint is concerned, the Comm Buffer is just 1 byte long and does not overlap with SMRAM. Because of that, SmmIsBufferOutsideSmmValid() will succeed and the actual SMI handler will be called. During step 3, the handler will blindly write a QWORD value into the Comm Buffer, and by doing so it will unintentionally write over the lower 7 bytes of SMRAM as well.

Figure 12 – Memory layout at the time of corruption

Based on EDK2, the bottom portion of TSEG (the de-facto standard location for SMRAM), contains a structure of type SMM_S3_RESUME_STATE whose job is to control recovery from the S3 sleep state. As can be seen below, this structure contains a plethora of members and function pointers whose corruption can benefit the attacker.

Figure 13 – Definition for the SMM_S3_RESUME_STATE object, source: EDK2

Mitigation

To mitigate this class of vulnerabilities, SMI handlers must explicitly check the size of the provided communication buffer and bailout in case the actual size differs from the expected size. This can be achieved in one of two ways:

  1. Dereferencing the provided CommBufferSize argument and then comparing it to the expected size. This method works because we already saw that SmmEntryPoint calls SmmIsBufferOutsideSmmValid(CommBuffer, *CommBufferSize), which guarantees *CommBufferSize bytes of the buffer are located outside of SMRAM.

    Figure 14 – Mitigating low SMRAM corruption can be achieved simply by checking the CommBufferSize argument

  2. Calling SmmIsBufferOutsideSmmValid() on the Comm Buffer again, this time with the concrete size expected by the handler.

Detection

To detect this class of vulnerabilities, we should be looking for SMI handlers that don’t properly check the size of the Comm Buffer. That suggests the handler does not perform any of the following:

  1. Dereferences the CommBufferSize argument.
  2. Calls SmmIsBufferOutsideSmmValid() on the communication buffer.

Condition 1 is straightforward to check because efiXplorer already takes care of locating SMI handlers and assigning them their correct function prototype. Condition 2 is also easy to validate, but the crux is this: since SmmIsBufferOutsideSmmValid() is statically linked to the code, we must be able to identify it in the compiled binary. Some tips and tricks for doing so can be found in the next section.

Arbitrary SMRAM Corruption

Description

While certainly a big step forward in our analysis of SMM vulnerabilities, the previous bug class still suffers from several significant limitations that hinder it from being easily exploited in real-life scenarios. A better, more powerful exploitation primitive will allow us to corrupt arbitrary locations within SMRAM, not only those that are adjacent to the bottom.

Such exploitation primitives can often be found in SMI handlers whose communication buffers contain nested pointers. Since the internal layout of the communication buffer is not known apriori, it is the responsibility of the SMI handler itself to correctly parse and sanitize it, which usually boils down to calling SmmIsBufferOutsideSmmValid() on nested pointers and bailing out if one of them happens to overlap with SMRAM. A textbook example for properly checking these conditions can be found in the SmmLockBox driver from EDK2:

Figure 15 – the sub-handler for SmmLockBoxSave sanitizes nested pointers

To report back to the OS that certain best practices have been implemented in SMM, a modern UEFI firmware usually creates and populates an ACPI table called the Windows SMM Mitigations Table, or WSMT for short. Among other things, the WSMT maintains a flag called COMM_BUFFER_NESTED_PTR_PROTECTION that, if present, asserts that no nested pointers are used by SMI handlers without prior sanitization. This table can be dumped and parsed using the chipsec module common.wsmt:

Figure 16 – Using CHIPSEC to dump and parse the contents of the WSMT table

Unfortunately, practice has shown that more often than not, the correlation between reported mitigations and reality is scarce at best. Even when the WSMT is present and reports all the supported mitigations as active, it’s not uncommon to discover SMM drivers that completely forget to sanitize the communication buffer. Leveraging this, attackers can trigger the vulnerable SMI with a nested pointer pointing to SMRAM memory. Depending on the nature of the particular handler, this can result in either corruption of the specified address or disclosure of sensitive information read from that address. Let’s take a look at an example.

Figure 17 – An SMI handler that does not sanitize nested pointers, leaving it vulnerable to memory corruption attacks

In the snippet above, we have an SMI handler that gets some arguments via the communication buffer. Based on the decompiled pseudocode, we can deduce that the first byte of the buffer is interpreted as an OpCode field that instructs the handler what it should do next (1). As can be seen (2), valid values for this field are either 0, 2, or 3. If the actual value differs from those, the default clause (3) will be executed. In this clause, an error code is written to the memory location pointed to by the 2nd field of the comm buffer. Since this field is under the attacker’s control along with the entire contents of the communication buffer, he or she can set it up as follows prior to triggering the SMI:

Figure 18 – Contents of the communication buffer that lead to SMRAM corruption

As the handler executes, the value of the OpCode field will force it to fall back into the default clause, while the address field will be selected in advance by the attacker depending on the exact portion of SMRAM he or she wants to corrupt.

Mitigation

To mitigate this class of vulnerabilities, the SMI handler must sanitize any pointer value passed in the communication buffer prior to using it. The pointer validation can be performed in one of two ways:

  • Calling SmmIsBufferOutsideSmmValid(): As was already mentioned, SmmIsBufferOutsideSmmValid() is a utility function provided by EDK2 that checks whether or not a given buffer overlaps with SMRAM. Using it is the recommended way to sanitize external input pointers.
  • Alternatively, some UEFI implementations based on the AMI codebase don’t use SmmIsBufferOutsideSmmValid(), but rather expose a similar functionality via a dedicated protocol called AMI_SMM_BUFFER_VALIDATION_PROTOCOL. Besides the semantic differences of calling a function versus utilizing a UEFI protocol, both approaches work roughly the same. Please check out the next section to learn how to correctly import this protocol definition into IDA.
    Figure 19 – AMI_SMM_BUFFER_VALIDATION_PROTOCOL is action

Detection

The basic idea to detect this class of vulnerabilities is to look for SMI handlers that don’t call SmmIsBufferOutsideSmmValid() or utilize the equivalent AMI_SMM_BUFFER_VALIDATION_PROTOCOL. However, some edge cases must also be taken into consideration. Failing to do so might introduce unwanted false positives or false negatives.

  1. Calling SmmIsBufferOutsideSmmValid() on the comm buffer itself: this merely guarantees that the comm buffer does not overlap with SMRAM (see Low SMRAM corruption below), but it says nothing about the nested pointers. As a result, when trying to assess the robustness of a handler against rouge pointer values, these cases should not be taken into consideration.
  2. Not using nested pointers at all: Some SMI handlers might not call SmmIsBufferOutsideSmmValid() simply because the communication buffer does not hold any nested pointers, but rather other data types such as integers, boolean flags, etc. To distinguish between this benign case from the vulnerable case, we must be able to figure out the internal layout of the communication buffer.

    While this can be done manually as part of the reverse engineering process, fortunately for us, nowadays automatic type reconstruction is far from being science fiction, and various tools for doing so are readily available as off-the-shelf solutions. The two most prominent and successful IDA plugins in this category are HexRaysPyTools and HexRaysCodeXplorer. Using any of these tools lets you transform raw pointer access notation such as the following:

    Figure 20 – SMI handler using the raw CommBuffer

    Into a more friendly and comprehensible point-to-member notation:

    Figure 21 – SMI handler using the reconstructed CommBuffer

    Even more importantly, these plugins keep track of how individual fields are being accessed. Based on the access pattern, they are fully capable of reconstructing the layout of the containing structure. This includes extrapolating the number of members, their respective sizes, types, attributes, and so on. When applied to the Comm Buffer, this method lets you quickly discover if it holds any nested pointers.

    Figure 22 – The reconstructed CommBuffer as extrapolated by HexRaysCodeXplorer. Notice this structure holds two members which are nested pointers

TOCTOU attacks

Description

Sometimes, even calling SmmIsBufferOutsideSmmValid() on nested pointers is not enough to make an SMI handler fully secure. The reason for this is that SMM was not designed with concurrency in mind and as a result, it suffers from some inherent race conditions, the most prominent one being TOCTOU attacks against the communication buffer. Because the comm buffer itself resides outside of SMRAM, its contents can change while the SMI handler is executing. This fact has serious security implications as it means double-fetches from it won’t necessarily yield the same values.

In an attempt to remedy this, SMM in multiprocessing environments follows what’s known as an β€œSMI rendezvous”. In a nutshell, once a CPU enters SMM a dedicated software preamble will send an Inter-Processor-Interrupt (IPI) to all other processors in the system. This IPI will cause them to enter SMM as well and wait there for the SMI to complete. Only then can the first processor safely call the handler function to actually service the SMI.

This scheme is highly effective in preventing other processors from meddling with the communication buffer while it is being used, but of course, CPUs are not the only entities that have access to the memory bus. As any OS 101 course teaches you, nowadays many hardware devices are capable of acting as DMA agents, meaning they can read/write memory without going through the CPU at all. These are great news performance-wise but are terribly bad news as far as firmware security is concerned.

Figure 23 – DMA-aware hardware can modify the contents of the comm buffer while an SMI is executing, source: Dell Firmware Security

To see how DMA operations can assist exploitation, let’s take a look at the following snippet taken from a real-life SMI handler:

Figure 24 – SMI handler that is vulnerable to a TOCTOU attack

As can be seen, this handler references a nested pointer that we named field_18 in at least 3 different locations:

  1. First, its value is retrieved from the comm buffer and saved into a local variable in SMRAM.
  2. Then, SmmIsBufferOutsideSmmValid() is called on the local variable to make sure it does not overlap SMRAM.
  3. If deemed safe, the nested pointer is re-read from the comm buffer and then passed to CopyMem() as the destination argument.

As was mentioned earlier, nothing guarantees consecutive reads from the comm buffer will necessarily yield the same value. That means an attacker can issue this SMI with the pointer referencing a perfectly safe location outside of SMRAM:

Figure 25 – Initial layout of the communication buffer at the time of issuing the SMI

However, right after the SMI validates the nested pointer and just before it is being fetched again, there exists a small window of opportunity where a DMA attack can modify its value to point somewhere else. Knowing that the pointer will soon be passed to CopyMem(), the attacker could make it point to an address in SMRAM he wants to corrupt.

Figure 26 – A malicious DMA device can modify the pointer inside the CommBuffer to point somewhere else, potentially to SMRAM memory

Mitigation

If configured properly by the firmware, SMRAM should be shielded from tampering by DMA devices. To make sure that’s the case on your machine, run the smm_dma module from CHIPSEC.

Figure 27 – Checking that SMRAM is protected from DMA attacks

Because of that, mitigating TOCTOU vulnerabilities can be performed merely by copying data from the communication buffer into local variables that reside in SMRAM. Like always, a good reference for the proper coding style is EDK2:

Figure 28 – Copying data from the comm buffer into local variables in SMRAM, source: SmmLockBox.c

Once all the required pieces of data are copied into SMRAM that way, DMA attacks won’t be able to influence the execution flow of SMI handlers:

Figure 29 – If configured properly, SMRAM should be protected from tampering by DMA devices

Detection

Detecting TOCTOU vulnerabilities in SMI handlers requires reconstructing the internal layout of the communication buffer, then counting how many times each field is being fetched. If the same field is being fetched twice or more by the same execution flow, chances are the respective handler is susceptible to such attacks. The severity of these issues greatly depends on the types of individual fields, with pointer fields being the most acute ones. Again, properly reconstructing the structure of the Comm Buffer greatly helps in assessing the potential risk.

CSEG-only Aware Handlers

Description

As was mentioned by previous posts in the series, the de-facto standard location for SMRAM memory is the β€œTop Memory Segment”, often abbreviated as TSEG. Still, on many machines, a separate SMRAM region called CSEG (Compatibility Segment) co-exists with TSEG for compatibility reasons. Unlike TSEG whose location in physical memory can be programmed by the BIOS, the location of the CSEG region is fixed to the address range 0xA0000-0xBFFFF. Some legacy SMI handlers were designed with only CSEG in mind, a fact that can be abused by attackers. Below is an example of one such handler:

Figure 30 – An SMI handler with some CSEG-specific protections

Unlike the handlers we reviewed so far, this SMI handler does not get its arguments via the communication buffer. Instead, it uses the EFI_SMM_CPU_PROTOCOL to read registers from the SMM save state, created automatically by the CPU upon entering SMM. Therefore, the potential attack surface in this example is not the communication buffer, but rather the general-purpose registers of the CPU, whose values can be set almost arbitrarily prior to issuing the SMI.

The handler goes as follows:

  1. First, it reads the values of the ES and EBX registers from the save state.
  2. Then, it computes a linear address from them using the formula: 16 * ES + (EBX & 0xFFFF).
  3. Finally, it checks that the computed address does not fall within the bounds of CSEG. If the address is considered safe, it is passed as an argument to the function at 0x3020.

Note that the handler essentially re-implements common utility functions such as SmmIsBufferOutsideSmmValid(), only it does so in a poor way that completely neglects SMRAM segments other than CSEG. Theoretically, attackers can set the ES and BX registers such that the computed linear address will point to some other SMRAM region such as TSEG and will surely pass the safety checks imposed by the handler.

In practice, however, chances are this vulnerability is not realistically exploitable. The reason for this is that the maximal linear address we can reach is limited to 16 * 0xFFFF + 0xFFFF == 0x10FFEF, and experience shows that TSEG is usually located at much higher addresses. Nevertheless, it is a good thing to be aware of such handlers and the danger they impose.

Mitigation

Mitigating these vulnerabilities is entirely up to the developers of the SMI handler.

Detection

A good strategy to pinpoint these cases is to look for SMI handlers that make use of β€œmagic numbers” that reference some unique characteristics of CSEG. These include immediate values such as 0xA0000 (the physical base address of CSEG), 0x1FFFF (its size), and 0xBFFFF (last addressable byte). Based on our experience, a function that uses two or more of these values is likely to have some CSEG-specific behavior and must be examined carefully to assess its potential risk.

SetVariable() Information Disclosure

Description

All the bug classes described so far were centered around hijacking the SMM execution flow and corrupting SMM memory. Yet another very important category of vulnerabilities revolves around disclosing the contents of SMRAM. It is a known fact that SMRAM cannot be read from outside of SMM, which is why it is sometimes used by the firmware to store secrets that must be kept hidden from the outside world. In addition to that, disclosing the contents of SMRAM can also help with the exploitation of other vulnerabilities that require accurate knowledge of the memory layout.

A common scenario for SMRAM disclosure happens when SMM code tries to update the contents of an NVRAM variable. In UEFI, updating an NVRAM variable is not an atomic operation, but rather a composite one made out of the following steps:

  1. Allocating a stack buffer that will hold the data associated with the variable.
  2. Using the GetVariable() service to read the contents of the variable into the stack buffer.
  3. Performing all the required modifications on the stack buffer.
  4. Using the SetVariable() service to write the modified stack buffer back to NVRAM.
Figure 31 – UEFI code that demonstrates updating a UEFI variable. Source: TCGSmm

When calling GetVariable(), note that the 4th parameter is used as an input-output argument. Upon entry, this argument signifies the number of bytes the caller is interested in reading, while on return it is set to the number of bytes that were read from NVRAM in practice. In case the actual size of the variable matches the expected one, both values should be the same.

A problem arises when developers implicitly assume the size of a variable to be immutable. Due to this assumption, they completely ignore the number of bytes read by GetVariable() and just pass a hardcoded size to SetVariable() when writing the updated contents:

Figure 32 – the code above implicitly assumes the size of CpuSetup will always be 0x101A, so it doesn’t bother to check the number of bytes actually read by GetVariable()

Since the contents of some NVRAM variables (at least those that have the EFI_VARIABLE_RUNTIME_ACCESS attribute) can be modified from the operating system, they can be abused to trigger information disclosures in SMM while also serving simultaneously as the exfiltration channel. Let’s see how this can be done in practice.

First, the attacker would use an OS-provided API function such as SetFirmwareEnvironmentVariable() to truncate the variable, thus making it shorter than expected. Then, it will move on to trigger the vulnerable SMI handler. The SMI handler will:

  1. Allocate the stack-based buffer. Like any other stack-based allocation this buffer is uninitialized by default, meaning it holds leftovers from previous function calls that took place in SMM.
    Figure 33 – Side-by-side depiction of the NVRAM variable and the stack buffer (phase 1)
  2. Call the GetVariable() service to read the contents of the variable into the stack buffer. Normally, the size of the variable is equal to the size of the stack buffer, but since the attacker just truncated the variable in NVRAM, the buffer is surely longer. This in turn means it will continue to hold some uninitialized bytes even after GetVariable() returns.
    Figure 34 – Side-by-side depiction of the NVRAM variable and the stack buffer (phase 2)
  3. Modify the stack buffer in memory.
    Figure 35 – Side-by-side depiction of the NVRAM variable and the stack buffer (phase 3)
  4. Call the SetVariable() service to write back the modified stack buffer into NVRAM. Because this call is done using the hardcoded, constant size of the stack buffer, it will also write to NVRAM its uninitialized part.
    Figure 36 – Side-by-side depiction of the NVRAM variable and the stack buffer (phase 4)

To complete the process, the attacker can now use an API function such as GetFirmwareEnvironmentVariable() to fully disclose the contents of the variable, including the bytes that originate from the uninitialized portion.

Mitigation

The moral of this story is that NVRAM variables are not to be trusted blindly and should be taken into account when reasoning about the attack surface of the handler. If applicable, use compiler flags such as InitAll to make sure stack buffers will be zero-initialized. More tactically, when updating the contents of NVRAM variables the code must always take into account the actual size of the variable and not rely on a static, pre-computed value.

Yet another possible direction to mitigate these issues is to limit access to NVRAM variables. This can be done either by removing the EFI_VARIABLE_RUNTIME_ACCESS attribute entirely or using a protocol such as EDKII_VARIABLE_LOCK_PROTOCOL to make variables read-only.

Detection

It’s reasonable to assume that an NVRAM variable update operation will take place during the course of one function. That means we can usually ignore scenarios in which one function reads the variable and another one writes it. To locate these functions, after analyzing the input file with efiXplorer, navigate to the β€œservices” tab and search for pairs of calls where SetVariable() is immediately followed by GetVariable():

Figure 37 – Searching for pairs of calls to GetVariable() and SetVariable()

For each such pair of calls, check that:

  1. Both calls originate from the same function
  2. Both calls operate on the same NVRAM variable
  3. The size argument passed to SetVariable() is an immediate value
Figure 38 – Simple heuristics to detect SMRAM info leaks

Identifying Library Functions

This post freely references library functions such as FreePool() and SmmIsBufferOutsideSmmValid() and naively assumes we can locate them without any hassle. The problem is these functions are statically linked to the binary, and normally SMM images are stripped of any debug symbols before being shipped to end-users. Due to that, locating them inside the IDA database is quite challenging.

During our work, we researched multiple approaches to tackle this problem, including automated diffing using Diaphora as well as experimentation with some lesser-known plugins such as rizzo and fingermatch. Eventually, we decided to stick to the KISS principle and perform the matching using plain and simple heuristics that take into consideration some of the unique characteristics of the target function. Below are some rules-of-thumb for matching the functions referenced earlier. Note that we assume the binary was already analyzed by efiXplorer, which makes things a bit easier.

FreePool

Identifying FreePool() is pretty straightforward. All it takes is to scan the IDA database for a function that:

  • Receives one integer argument.
  • Conditionally, calls one of gBs->FreePool() or gSmst->FreePool() (but never both)
  • Forwards its input argument to both of these services
  • Figure 39 – Simple heuristic to pinpoint FreePool()

SmmIsBufferOutsideSmmValid

Identification of SmmIsBufferOutsideSmmValid() is a bit trickier. To successfully pull this off, we need to have some background information about a UEFI protocol called EFI_SMM_ACCESS2_PROTOCOL. This protocol is used to manage and query the visibility of SMRAM on the platform. As such, it exposes the respective methods to open, close, and lock SMRAM.

Figure 40 – Interface definition for EFI_SMM_ACCESS2_PROTOCOL, source: Step to UEFI

In addition to those, this protocol also exports a method called GetCapabilities(), which can be used by clients to figure out exactly where SMRAM lives in physical memory.

Figure 41 – Documentation of the GetCapabilities() function, source: Step to UEFI

Upon return, this function fills an array of EFI_SMRAM_DESCRIPTOR structures that tell the caller what regions of SMRAM are available, what is their size, state, etc.

Figure 42 – Output of a sample program that uses EFI_SMM_ACCESS2_PROTOCOL to query SMRAM ranges, source: Step to UEFI

In EDK2, the common practice is to store these EFI_SMRAM_DESCRIPTORS as global variables so that other functions could easily access them in the future. As you probably guessed, one of these functions is no other than SmmIsBufferOutsideSmmValid(), which iterates over the descriptors list to decide if the caller-provided buffer is safe:

Figure 43 – Source code for SmmIsBufferOutsideSmmValid, source: SmmMemLib.c

Taking this into consideration, our strategy to identify SmmIsBufferOutsideSmmValid() would be that of reverse lookup – first, we’ll find the global SMRAM descriptors initialized by EFI_SMM_ACCESS2_PROTOCOL and only then, based on the functions that use them, deduce who’s the most promising candidate to be SmmIsBufferOutsideSmmValid().

Technically, one can do so by following these simple steps:

  • Go to the β€œprotocols” tab in efiXplorer and double click EFI_SMM_ACCESS2_PROTOCOL. This will cause IDA to jump to the location where this GUID is utilized (usually the call to LocateProtocol)
    Figure 44 – Searching for EFI_SMM_ACCESS2_PROTOCOL in IDA
  • Click on the protocol’s interface pointer (EfiSmmAccess2Protocol) and hit β€˜x’ to search for its xrefs:
    Figure 45 – Listing the cross-references to EfiSmmAccess2Protocol
  • For each call to GetCapabilities(), check if the 3rd parameter (the SMRAM descriptor) is a global variable. If it is, do the following:
    • Hit β€˜n’ to rename it according to some naming convention (say, SmramDescriptor_XXX, where XXX is an ordinal) to allow for easy reference in the future
    • Hit β€˜y’ and set its variable type to EFI_SMRAM_DESCRIPTOR *

    Figure 46 – Renaming and setting the type for the SMRAM descriptors

  • Now check the following criteria for each function in the database.
    1. The function must receive two integer arguments
    2. The function must return a boolean value. From the perspective of the decompiler, boolean values are just plain integers, so to make this distinction we should go over all the return statements in the function and check that the returned value is a member of the set {0,1}.
    3. The function must reference one of the SMRAM descriptors that were marked in the previous step

If all three conditions are met, chances are the function you’re looking at is actually SmmIsBufferOutsideSmmValid():

Figure 47 – Locating SmmIsBufferOutsideSmmValid() in compiled SMM binaries using simple heuristics

AMI_SMM_BUFFER_VALIDATION_PROTOCOL

Currently, efiXplorer does not support the definition of AMI_SMM_BUFFER_VALIDATION_PROTOCOL out of the box, so we must import the protocol definition separately.

Figure 48 – AMI_SMM_BUFFER_VALIDATION is not supported out of the box

To accomplish this, follow these steps:

  1. Download the protocol header file from GitHub and save it locally.
  2. Open an IDAPython prompt and run the following snippet:
    Figure 49 – Defining some C macros to enable importing the protocol header

    This is necessary because the header file makes use of several macros and typedefs that must be #defined manually before importing it.
  3. Navigate to the File->Import C header file menu to import the header.
    Figure 50 – Importing the header file
  4. Run again efiXplorer (hotkey: CTRL+ALT+E) and notice how the decompilation output suddenly changes:
    Figure 51 – AMI_SMM_BUFFER_VALIDATION is now recognized

Conclusion

β€œThe more you look, the more you see.”
– Robert M. Pirsig,Β  Zen and the Art of Motorcycle Maintenance

Firmware-level attacks seem to pose a significant challenge to the security community. As part of the everlasting cat-and-mouse game between attackers and defenders, threat actors are starting to shift their spotlight to the firmware, considered by many the soft belly of the IT stack. In recent years, awareness of firmware threats is constantly increasing and some promising approaches are emerging to combat them:

  • Hardware vendors such as Intel, are constantly adding more security features to each new line of CPUs. The important advantage of these features is that they’re baked into the hardware and are capable of eliminating certain bug classes from the ground up (or at least make exploitation much harder). The downside with this approach is that due to the fragmented nature of the industry, not every feature that is supported by the hardware gets widespread adoption from the software side. While certain features such as Secure Boot, Boot Guard, and BIOS Guard are highly popular and can be found in the majority of commodity machines, other features such as STM (SMI Transfer Monitor, a technology which was intended to de-privilege SMM) were left as merely a PoC.
  • OS vendors such as Microsoft are collaborating intensely with leading OEMs to help bridge the gap between firmware security and OS security, a mandatory move given their long-term vision of harnessing virtualization to protect every Windows machine. The outcome of these endeavors is the line of Secured-Core PCs, which come preloaded with security features and configurations that are aimed at narrowing down the firmware attack surface as well as constricting the damage in case of an attack.
  • EDR vendors also contribute their part and are starting to tap into the firmware and provide visibility into the SPI flash memory and the EFI system partition. This approach is great for spotting IOCs of known firmware implants, but unfortunately is rather restricted when it comes to detecting the underlying vulnerabilities that enabled the infection in the first place.

Even in the face of these advancements, firmware security still bears lots and lots of issues, design flaws, and of course vulnerabilities to uncover. The ability of the security community to successfully pull this off depends on three fundamental pillars: knowledge, tooling, and diligence.

In this blog post, we were focused on promoting knowledge by shedding light on unfamiliar territory. In the next post, we’ll cover tooling and reveal:

  • How we automated the bug hunting process to the degree that finding SMM vulnerabilities is merely a matter of running a Python script
  • Some real-life examples of vulnerabilities we found, affecting most well-known OEMs in the industry.

As for diligence, unfortunately, no known recipe exists for producing such human qualities. It is, therefore, the responsibility of each and every one of us to just try our best and make sure that no stone is left unturned in this exciting and challenging domain.

Further Reading

  • There are no more articles
❌