When we think about V8 exploits, the first things that come to mind are probably related to sophisticated browser zero-day exploit chains. While the browser may be the most interesting target for V8 exploits, we shouldn’t forget that this open-source JavaScript engine is also embedded into countless projects other than the browser. And where a JavaScript engine is used across a security boundary to execute potentially untrusted code, security issues may arise.
One such issue affected the massively popular Dota 2 video game. Dota used an outdated build of v8.dll that was compiled in December 2018. It’s no surprise that this build was vulnerable to a range of CVEs, many of them even being known exploited vulnerabilities with public proof-of-concept (PoC) exploits. We discovered that one of these vulnerabilities, CVE-2021-38003, was exploited in the wild in four custom game modes published within the game. Since V8 was not sandboxed in Dota, the exploit on its own allowed for remote code execution against other Dota players.
We disclosed our findings to the developer of Dota 2, Valve. In response, Valve pushed an update for Dota on January 12, upgrading the old and vulnerable version of V8. This update took effect immediately, since Dota has to be up to date for players to participate in online games. Valve also took additional action, by taking down the offending custom game modes, notifying the affected players, and introducing new mitigations to reduce the game’s attack surface.
Dota 2 is a MOBA game that was initially released on July 9, 2013. Despite being almost 10 years old (or perhaps 20 if counting the original Dota 1), it is still attracting a large player base of around 15 million active monthly players. Like other popular games, Dota is a complex piece of software under the hood, assembled from multiple separate components. One component that will be of particular interest to us is the Panorama framework. This is a framework designed by Valve itself to enable user interface development using the well-known web triad of HTML, CSS, and JavaScript. The JavaScript part here was problematic because it got executed by the vulnerable version of V8. Thus, malicious JavaScript could exploit a V8 vulnerability and gain control over the victim’s machine. This wouldn’t be such an issue for the unmodified game, because by default, only legitimate Valve-authored scripts should get executed. However, Dota is very open to customization by the player community, which opens the doors for threat actors to attempt to sneak malicious pieces of JavaScript to their unsuspecting victims.
Customization of Dota can take many forms: There are custom wearable in-game items, announcer packs, loading screens, chat emoticons, and more. Crucially, there are also custom community-developed game modes. These are essentially brand-new games that leverage Dota’s powerful game engine to allow anyone with a bit of programming experience to implement their ideas for a game. Custom game modes play an important role in Dota, and Valve is well aware of the benefits of letting players express their creativity by developing custom game modes. After all, Dota itself started out as a game mode for Warcraft III: The Frozen Throne. This might be why game modes can be installed with a single click from within the game. As a result, there are thousands of game modes available, some of which are extremely popular. For instance, DOTA AUTO CHESS was played by over 10 million players.
Before a game mode can be played by regular players, it must be published on the Steam store. The publishing process includes a verification performed by Valve. While this could potentially weed out some malicious game modes, no verification process is perfect. As we’ll show later, at least four malicious game modes managed to slip through. We believe the verification process exists mostly for moderation reasons to prevent inappropriate content from getting published. There are many ways to hide a backdoor within a game mode, and it would be very time-consuming to attempt to detect them all during verification.
The main game logic of custom game modes is coded in Lua. This is executed on the game server, which can either be the host player’s machine or a dedicated server owned by Valve. For client-side scripting, there is JavaScript from the Panorama framework. This is mainly used to control user interface elements, such as scoreboards or quest status bars. JavaScript is executed by the V8 engine, and there’s full support for many advanced features, including WebAssembly execution. In addition, there’s also a Dota-specific API that exposes additional functionality. Particularly interesting was the $.AsyncWebRequest function, which, in combination with eval, could have been used to backdoor a game mode so that it could execute arbitrary additional JavaScript code downloaded from the internet. Perhaps it was this concern that caused the $.AsyncWebRequest function to be deprecated and eventually removed altogether. However, there are ways around this. For instance, the web request can be made by the server-side Lua code, with the response passed to the client-side JavaScript using game event messaging APIs.
Just Testing
We discovered four malicious custom game modes published on the Steam store, all developed by the same author. The first game mode (id 1556548695) is particularly interesting, as it appears that it is where the attacker only tested the exploit, judging from the lack of an actual payload attached to it. Interestingly enough, the attacker also used this game mode to test various other techniques, leaving in commented-out code or unused functions. This offered us a great opportunity to understand the attacker’s thought process.
The Steam page of the custom game mode where the attacker tested the exploit.
As can be seen in the above screenshot, the attacker was very transparent about the nature of this game mode, naming it test addon plz ignore and even going as far as using the description to urge other players not to download this game mode. While this might seem like an expression of good faith, we’ll show shortly that in the other three malicious game modes, the same attacker took the exact opposite approach and tried to make the malicious code as stealthy as possible.
The JavaScript exploit in this custom game mode can be found inside overthrow_scoreboard.vjs_c. This used to be a legitimate JavaScript file implementing scoreboard functionality, but the attacker replaced its content with an exploit for CVE-2021-38003. This vulnerability was originally discovered as a zero-day by Google researchers Clément Lecigne and Samuel Groß, when it was used in the wild in an exploit chain against a fully-patched Samsung phone.
There are now public PoCs and write-ups for this CVE. However, these weren’t available in March 2022, when the attacker last updated the game mode. This means they had to develop a large portion of the exploit themselves (even if there was a public PoC at that time, the attacker would still need to possess some technical skills to backport it to the outdated V8 build that Dota was using). Even so, the core of the exploit was provided in the Chromium bug tracker entry for the CVE. There is a snippet of code that can trigger the vulnerability to leak the purportedly inaccessible TheHole object and then use this leaked object to corrupt the size of a map. The attacker took this snippet and pasted it into their exploit, building up the rest of the exploit on top of this corrupted map.
The core of the exploit that triggers CVE-2021-38003 to leak TheHole object. Note the yay! at the end — that’s simply an expression of joy and it’s in no way necessary for the exploit to work.
Interestingly, the exploit contains a large amount of commented-out code and debug prints. This further suggests that the attacker had to put a lot of effort into weaponizing the vulnerability. The attacker-developed part of the exploit starts by using the corrupted map to corrupt the length of an array, achieving a relative read/write primitive. Then, it corrupts an ArrayBuffer backing store pointer in order to gain an arbitrary read/write primitive. There is no addrof function, as addresses are leaked by placing the target object at a known offset from the corrupted array and then using the relative read primitive. Finally, with the arbitrary read/write in place, the exploit uses a well-known WebAssembly trick to execute custom shellcode. We have tested the whole exploit locally against Dota and can confirm that it worked.
A JavaScript snippet taken from the exploit. Note the debug prints, comments, and commented-out code.
Apart from this JavaScript exploit, the custom game mode also contains another interesting file, which is ominously named evil.lua. This is where the attacker tested the capabilities of the server-side Lua execution. See the Lua snippet below where the attacker tested the following in particular:
Logging
Dynamic compilation of additional Lua code (loadstring)
Determining the exact version of the Lua interpreter
Executing arbitrary system commands (whoami)
Coroutine creation
Network connectivity (HTTP GET requests)
A Lua snippet taken from evil.lua.
Unfortunately, we do not have access to the full update history of this particular game mode. Therefore, it’s possible that some interesting code from previous versions is no longer present in the version that we analyzed. We can at least see from the changelog that there were nine updates to this game mode, all of them happening either in November 2018 or March 2022. Since the exploited JavaScript vulnerability was only discovered in 2021, we assume that the game mode initially started out as a legitimate game and that the malicious functionality was only added in the March 2022 updates.
The Backdoor
After discovering this first malicious game mode, we were of course wondering whether there are more such exploits out there. Since the attacker did not bother reporting the vulnerability to Valve, we found it likely that they would have malicious intentions and attempt to exploit it at a larger scale. As a result, we developed a script that downloaded all the JavaScript files from all the custom game modes published on the Steam store. This yielded us gigabytes of JavaScript that we could query for suspicious code patterns.
It didn’t take long to discover three more malicious game modes, all by the same author (who also happened to be the author of the previously analyzed test addon plz ignore game mode). These game modes were named Overdog no annoying heroes (id 2776998052), Custom Hero Brawl (id 2780728794), and Overthrow RTZ Edition X10 XP (id 2780559339). Interestingly, the same author also published a fifth game mode named Brawl in Petah Tiqwa (id 1590547173), which did not include any malicious code (to our great surprise).
The Steam page of one of the backdoored custom game modes.
The malicious code in these new three game modes is much more subtle. There is no file named evil.lua nor any JavaScript exploit directly visible in the source code. Instead, there’s just a simple backdoor consisting of only about twenty lines of code. This backdoor can execute arbitrary JavaScript downloaded via HTTP, giving the attacker not only the ability to hide the exploit code, but also the ability to update it at their discretion without having to update the entire custom game mode (and going through the risky game mode verification process).
The backdoor starts with the JavaScript code sending a custom ClientReady event to the server. This is to signal to the server that there is a new victim game client, waiting to receive the JavaScript payload. The Lua code on the server registered a listener for the ClientReady event. When it receives this event, it makes an HTTP GET request to its C&C server to fetch the JavaScript payload. This payload is expected in the response body, and it’s forwarded to the client-side JavaScript in a custom event named test.
The Lua part of the backdoor, which is executed on the game server.
When the client-side JavaScript receives this test event, it unwraps the payload, dynamically creates a new function out of it, and immediately executes it. On a high level, this is clearly just a simple downloader capable of executing arbitrary JavaScript downloaded from the C&C server. The cooperation of client-side JavaScript and server-side Lua code was only necessary because JavaScript was no longer allowed to directly access the internet.
The JavaScript part of the backdoor, which is executed on the game clients.
At the time that we discovered this backdoor, the C&C server was no longer responding. Even so, we can confidently assume that this backdoor was intended to download the JavaScript exploit for CVE-2021-38003. This is because all three backdoored game modes were updated by the same author within 10 days after said author introduced the JavaScript exploit into their first malicious game mode. However, we remain unsure about whether there was any malicious shellcode attached to the exploit. After all, the use of ngrok for C&C is slightly unconventional and could suggest that the attacker only tested the backdoor functionality. One way or another, we can say that this attack was not very large in scale. According to Valve, under 200 players were affected.
Parting Thoughts
After discovering the four malicious game modes, we tried to hunt for more — unfortunately, our trail went cold. Therefore, it’s not clear what the attacker’s ultimate intentions were. However, we believe that they were not exactly pure research intentions, for two main reasons. First, the attacker did not report the vulnerability to Valve (which would generally be considered a nice thing to do). Second, the attacker tried to hide the exploit in a stealthy backdoor. Regardless, it’s also possible that the attacker didn’t have purely malicious intentions either, since such an attacker could arguably abuse this vulnerability with a much larger impact.
For example, a malicious attacker could attempt to take over a popular custom game mode. Many game modes are neglected by their original developers, so the attacker could try something as simple as promising to fix bugs and continue development for free. After some number of legitimate updates, the attacker could try to sneak in the JavaScript backdoor. Since game modes are updated automatically in the background, the unsuspecting victim players would not have a lot of opportunities to defend themselves.
Alternatively, the attacker could search for other ways to exploit the vulnerability without involving any custom game modes. For instance, the attacker could try to look for a separate XSS vulnerability to chain with the V8 exploit. Such an XSS vulnerability could allow the attacker to execute arbitrary JavaScript within the remote victim’s Panorama instance. The V8 exploit could then be used to break out of the Panorama framework. Note that Panorama is also heavily used in the game’s main menu, so depending on the nature of the XSS vulnerability, this could have a blast radius as big as the 15 million monthly players.
Before we sign off, we would like to thank Valve for quickly addressing our reports. We hope that they will continue updating V8 in the future and reduce the patch gap as much as possible. Valve also shared with us some plans about additional mitigations, and we will be most excited to see those implemented in practice. Due to the potential impact, we would also recommend very careful vetting of future updates for popular custom games.
We can also appreciate that Valve made the decision to publish custom game modes on Steam even though it might put more responsibility on their shoulders. Ultimately, this is a net positive for the overall players’ security, due to the fact that Valve can moderate the published game modes and take down malicious ones. Many other games don’t have such integrated support for custom games, so players resort to downloading mods from random third-party sites, which are often known to bundle malware.
There are various tricks malware authors use to make malware analysts’ jobs more difficult. These tricks include obfuscation techniques to complicate reverse engineering, anti-sandbox techniques to evade sandboxes, packing to bypass static detection, and more. Countless deceptive tricks used by various malware strains in-the-wild have beendocumented over the years. However, few of these tricks are implemented in a typical piece of malware, despite the many available tricks.
The subject of this blog post, a backdoor we dubbed Roshtyak, is not your typical piece of malware. Roshtyak is full of tricks. Some are well-known, and some we have never seen before. From a technical perspective, the lengths Roshtyak takes to protect itself are extremely interesting. Roshtyak belongs to one of the best-protected malware strains we have ever seen. We hope by publishing our research and analysis of the malware and its protection tricks we will help fellow researchers recognize and respond to similar tricks, and harden their analysis environments, making them more resistant to the evasion techniques described.
Roshtyak is the DLL backdoor used by Raspberry Robin, a worm spreading through infected removable drives. Raspberry Robin is extremely prevalent. We protected over 550K of our users from the worm this year. Due to its high prevalence, it should be no surprise that we aren’t the only ones taking note of Raspberry Robin.
Red Canary’s researchers published the first analysis of Raspberry Robin in May 2022. In June, Symantec published a report describing a mining/clipboard hijacking operation, which reportedly made the cybercriminals at least $1.7 million. Symantec did not link the malicious operation to Raspberry Robin. Nevertheless, we assess with high confidence that what they analyzed was Raspberry Robin. This assessment is based on C&C overlaps, strong malware similarity, and coinfections observed in our telemetry. Cybereason, Microsoft, and Cisco published further reports in July/August 2022. Microsoft reported that Raspberry Robin infections led to DEV-0243 (a.k.a Evil Corp) pre-ransomware behavior. We could not confirm this connection using our telemetry. Still, we find it reasonable to believe that the miner payload is not the only way Raspberry Robin infections are being monetized. Other recentreports also hint at a possible connection between Raspberry Robin and Evil Corp.
A map showing the number of users Avast protected from Raspberry Robin
There are many unknowns about Raspberry Robin, despite so many published reports. What are the ultimate objectives behind the malware? Who is responsible for Raspberry Robin? How did it become so prevalent? Unfortunately, we do not have answers to all these questions. However, we can answer an important question we saw asked multiple times: What functionality is hidden inside the heavily obfuscated DLL (or Roshtyak as we call it)? To answer this question, we fully reverse engineered a Roshtyak sample, and present our analysis results in this blog post.
Overview
Roshtyak is packed in as many as 14 protective layers, each heavily obfuscated and serving a specific purpose. Some artifacts suggest the layers were originally PE files but were transformed into custom encrypted structures that only the previous layers know how to decrypt and load. Numerous anti-debugger, anti-sandbox, anti-VM, and anti-emulator checks are sprinkled throughout the layers. If one of these checks successfully detects an analysis environment, one of four actions are taken.
The malware calls TerminateProcess on itself to avoid exhibiting any further malicious behavior and to keep the subsequent layers encrypted.
Roshtyak crashes on purpose. This has the same effect as terminating itself, but it might not be immediately clear if the crash was intentional or because of a bug thanks to Roshtyak’s obfuscated nature.
The malware enters an infinite loop on purpose. Since the loop itself is located in obfuscated code and spans thousands of instructions, it might be hard to determine if the loop is doing something useful or not.
The most interesting case is when the malware reacts to a successful check by unpacking and loading a fake payload. This happens in the eighth layer, which is loaded with dozens of anti-analysis checks. The result of each of these checks is used to modify the value of a global variable. There are two payloads encrypted in the data section of the eighth layer: the real ninth layer and a fake payload. The real ninth layer will get decrypted only if the global variable matches the expected value after all the checks have been performed. If at least one check succeeded in detecting an analysis environment, the global variable’s value will differ from the expected value, causing Roshtyak to unpack and execute the fake payload instead.
Roshtyak’s obfuscation causes even relatively simple functions to grow into large proportions. This necessitates some custom deobfuscation tooling if one wants to reverse engineer it within a reasonable timeframe.
The fake payload is a BroAssist (a.k.a BrowserAssistant) adware sample. We believe this fake payload was intended to mislead malware analysts into thinking the sample is less interesting than it really is. When a reverse engineer focuses on quickly unpacking a sample, it might look like the whole sample is “just” an obfuscated piece of adware (and a very old one at that), which could cause the analyst to lose interest in digging deeper. And indeed, it turns out that these fake payload shenanigans can be very effective. As can be seen on the screenshot below, it fooled at least one researcher, who misattributed the Raspberry Robin worm, because of the fake BrowserAssistant payload.
A security researcher misattributing Raspberry Robin because of the fake payload. This is not to pick on anyone, we just want to show how easy it is to make a mistake like this given Roshtyak’s trickery and complexity.
The Bag of Tricks
For the sake of keeping this blog post (sort of) short and to the point, let’s get straight into detailing some of the more interesting evasion techniques employed by Roshtyak.
Segment registers
Early in the execution, Roshtyak prefers to use checks that do not require calling any imported functions. If one of these checks is successful, the sample can quietly exit without generating any suspicious API calls. Below is an example where Roshtyak checks the behavior of the gs segment register. The check is designed to be stealthy and the surrounding garbage instructions make it easy to overlook.
A stealthy detection of single-stepping. Only the underscored instructions are useful.
The first idea behind this check is to detect single-stepping. Before the above snippet, the value of cx was initialized to 2. After the pop ecx instruction, Roshtyak checks if cx is still equal to 2. This would be the expected behavior because this value should propagate through the stack and the gs register under normal circumstances. However, a single step event would reset the value of the gs selector, which would result in a different value getting popped into ecx at the end.
But there is more to this check. As a side effect of the two push/pop pairs above, the value of gs is temporarily changed to 2. After this check, Roshtyak enters a loop, counting the number of iterations until the value of gs is no longer 2. The gs selector is also reset after a thread context switch, so the loop essentially counts the number of iterations until a context switch happens. Roshtyak repeats this procedure multiple times, averages out the result, and checks that it belongs to a sensible range for a bare metal execution environment. If the sample runs under a hypervisor or in an emulator, the average number of iterations might fall outside of this range, which allows Roshtyak to detect undesirable execution environments.
Roshtyak also checks that the value of the cs segment register is either 0x1b or 0x23. Here, 0x1b is the expected value when running on native x86 Windows, while 0x23 is the expected value under WoW64.
APC injection through a random ntdll gadget
Roshtyak performs some of its functionality from separate processes. For example, when it communicates with its C&C server, it spawns a new innocent-looking process like regsvr32.exe. Using shared sections, it injects its comms module into the address space of the new process. The injected module is executed via APC injection, using NtQueueApcThreadEx.
Interestingly, the ApcRoutine argument (which marks the target routine to be scheduled for execution) does not point to the entry point of the injected module. Instead, it points to a seemingly random address inside ntdll. Taking a closer look, we see this address was not chosen randomly but that Roshtyak scanned the code section of ntdll for pop r32; ret gadgets (excluding pop esp, because pivoting the stack would be undesirable) and picked one at random to use as the ApcRoutine.
A random pop r32; ret gadget used as the entry point for APC injection
Looking at the calling convention for the ApcRoutine reveals what’s going on. The pop instruction makes the stack pointer point to the SystemArgument1 parameter of NtQueueApcThreadEx and so the ret instruction effectively jumps to wherever SystemArgument1 is pointing. This means that by abusing this gadget, Roshtyak can treat SystemArgument1 as the entry point for the purpose of APC injection. This obfuscates the control flow and makes the NtQueueApcThreadEx call look more legitimate. If someone hooks this function and inspects the ApcRoutine argument, the fact that it is pointing into the ntdll code section might be enough to convince them that the call is not malicious.
Checking read/write performance on write-combined memory
In this next check, Roshtyak allocates a large memory buffer with the PAGE_WRITECOMBINE flag. This flag is supposed to modify cache behavior to optimize sequential write performance (at the expense of read performance and possibly memory ordering). Roshtyak uses this to detect if it’s running on a physical machine. It conducts an experiment where it first writes to the allocated buffer and then reads from the allocated buffer, all while measuring the read/write performance using a separate thread as a counter. This experiment is repeated 32 times and the check is passed only if write performance was at least six times higher than read performance most of the times. If the check fails, Roshtyak intentionally selects a wrong RC4 key, which results in failing to properly decrypt the next layer.
Hiding shellcode from plain sight
The injected shellcode is interestingly hidden, too. When Roshtyak prepares for code injection, it first creates a large section and maps it into the current process as PAGE_READWRITE. Then, it fills the section with random data and places the shellcode at a random offset within the random data. Since the shellcode is just a relatively small loader followed by random-looking packed data, the whole section looks like random data.
A histogram of the bytes inside the shared section. Note that it looks almost random, the most suspicious sign is the slight overrepresentation of null bytes.
The section is then unmapped from the current process and mapped into the target process, where it is executed using the above-described APC injection technique. The random data was added in an attempt to conceal the existence of the shellcode. Judging only from the memory dump of the target process, it might look like the section is full of random data and does not contain any valid executable code. Even if one suspects actual valid code somewhere in the middle of the section, it will not be easy to find its exact location.
The start of the shellcode within the shared section. It might be hard to pinpoint the exact start address because it unconventionally starts on an odd bt instruction.
Ret2Kernel32
Roshtyak makes a point of cleaning up after itself. Whenever a certain string or piece of memory is no longer needed, Roshtyak wipes and/or frees it in an attempt to destroy as much evidence as possible. The same holds for Roshtyak’s layers. Whenever one layer finishes its job, it frees itself before passing execution onto the next layer. However, the layer cannot just simply free itself directly. The whole process would crash if it called VirtualFree on the region of memory it’s currently executing from.
Roshtyak, therefore, frees the layer through a ROP chain executed during layer transitions to avoid this problem. When a layer is about to exit, it constructs a ROP chain on the stack and returns into it. An example of such a ROP chain can be seen below. This chain starts by returning into VirtualFree and UnmapViewOfFile to release the previous layer’s memory. Then, it returns into the next layer. The return address from the next layer is set to RtlExitUserThread, to safeguard execution.
A simple ROP chain consisting of VirtualFree -> UnmapViewOfFile -> next layer -> RtlExitUserThread
MulDiv bug
MulDiv is a function exported by kernel32.dll, which takes three signed 32-bit integers as arguments. It multiplies the first two arguments, divides the multiplication result by the third argument, and returns the final result rounded to the nearest integer. While this might seem like a simple enough function, there’s an ancient sign extension bug in Microsoft’s implementation. This bug is sort of considered a feature now and might never get fixed.
Roshtyak is aware of the bug and tests for its presence by calling MulDiv(1, 0x80000000, 0x80000000). On real Windows machines, this triggers the bug and MulDiv erroneously returns 2, even though the correct return value should be 1, because (1 * -2147483648) / -2147483648 = 1. This allows Roshtyak to detect emulators that do not replicate the bug. For example, this successfully detects Wine, which, funnily enough, contains a different bug, which makes the above call return 0.
Tampering with return addresses stored on the stack
There are also tricks designed to obfuscate function calls. As shown in the previous section, Roshtyak likes to call functions using the ret instruction. This next trick is similar in that it also manipulates the stack so a ret instruction can be used to jump to the desired address.
To achieve this, Roshtyak scans the current thread’s stack for pointers into the code section of one of the previous layers (unlike the other layers, this one was not freed using the ROP chain technique). It replaces all these pointers with the address it wants to call. Then it lets the code return multiple times until a ret instruction encounters one of the hijacked pointers, redirecting the execution to the desired address.
Exception-based checks
Additionally, Roshtyak contains checks that set up a custom vectored exception handler and intentionally trigger various exceptions to ensure they all get handled as expected.
Roshtyak sets up a vectored exception handler using RtlAddVectoredExceptionHandler. This handler contains custom handlers for selected exception codes. A top-level exception handler is also registered using SetUnhandledExceptionFilter. This handler should not be called in the targeted execution environments (none of the intentionally triggered exceptions should fall through the vectored exception handler). So this top-level handler just contains a single call to TerminateProcess. Interestingly, Roshtyak also uses ZwSetInformationProcess to set SEM_FAILCRITICALERRORS using the ProcessDefaultHardErrorMode class. This ensures that even if the exception somehow is passed all the way to the default exception handler, Windows would not show the standard error message box, which could alert the victim that something suspicious is going on.
When everything is set up, Roshtyak begins generating exceptions. The first exception is generated by a popf instruction, directly followed by a cpuid instruction (shown below). The value popped by the popf instruction was crafted to set the trap flag, which should, in turn, raise a single-step exception. On a physical machine, the exception would trigger right after the cpuid instruction. Then, the custom vectored exception handler would take over and move the instruction pointer away from the C7 B2 opcodes, which mark an invalid instruction. However, under many hypervisors, the single-step exception would not be raised. This is because the cpuid instruction forces a VM exit, which might delay the effect of the trap flag. If that is the case, the processor will raise an illegal instruction exception when trying to execute the invalid opcodes. If the vectored exception handler encounters such an exception, it knows that it is running under a hypervisor. A variation of this technique is described thoroughly in a blog post by Palo Alto Networks. Please refer to it for more details.
The exception-based check using popf and cpuid to detect hypervisors
Another exception is generated using the two-byte int 3 instruction (CD 03). This instruction is followed by garbage opcodes. The int 3 here raises a breakpoint exception, which is handled by the vectored exception handler. The vectored exception handler doesn’t really do anything to handle the exception, which is interesting. This is because by default, when Windows handles the two-byte int 3 instruction, it will leave the instruction pointer in between the two instruction bytes, pointing to the 03 byte. When disassembled from this 03 byte, the garbage opcodes suddenly start making sense. We believe this is a check against some overeager debuggers, which could “fix” the instruction pointer to point after the 03 byte.
Moreover, the vectored exception handler checks the thread’s CONTEXT and makes sure that registers Dr0 through Dr3 are empty. If they are not, the process is being debugged using hardware breakpoints. While this check is relatively common in malware, the CONTEXT is usually obtained using a call to a function like GetThreadContext. Here, the malware authors took advantage of CONTEXT being passed as an argument to the exception handler, so they did not need to call any additional API functions.
Large executable mappings
This next check is interesting mostly because we are not sure what it’s really supposed to check (in other words, we’d be happy to hear your theories!). It starts with Roshtyak creating a large PAGE_EXECUTE_READWRITE mapping of size 0x386F000. Then it maps this mapping nine times into its own address space. After this, it memsets the mapping to 0x42 (opcode for inc edx), except for the last six bytes, which are filled with four inc ecx instructions and jmp dword ptr [ecx] (see below). Next, it puts the nine base addresses of the mapped views into an array, followed by an address of a single ret instruction. Finally, it points ecx into this array and calls the first mapped view, which results in all the mapped views being called sequentially until the final ret instruction. After the return, Roshtyak validates that edx got incremented exactly 0x1FBE6FCA times (9 * (0x386F000 - 6)).
The end of the large mapped section. The jmp dword ptr [ecx] instruction is supposed to jump to the start of the next mapped view.
Our best guess is that this is yet another anti-emulator check. For example, in some emulators, mapped sections might not be fully implemented, so the instructions written into one instance of the mapped view might not propagate to the other eight instances. Another theory is the check could be done to request large amounts of memory that emulators might fail to provide. After all, the combined size of all the views is almost half of the standard 32-bit user mode address space.
Detecting process suspension
This trick abuses an undocumented thread creation flag in NtCreateThreadEx to detect when Roshtyak’s main process gets externally suspended (which could mean that a debugger got attached). This flag essentially allows a thread to keep running even when PsSuspendProcess gets called. This is coupled with another trick abusing the fact that the thread suspend counter is a signed 8-bit value, which means that it maxes out at 127. Roshtyak spawns two threads, one of which keeps suspending the other one until the suspend counter limit is reached. After this, the first thread keeps periodically suspending the other one and checking if the call to NtSuspendThread keeps failing with STATUS_SUSPEND_COUNT_EXCEEDED. If it does not, the thread must have been externally suspended and resumed (which would leave the suspend counter at 126, so the next call to NtSuspendThread would succeed). Not getting this error code would be suspicious enough for Roshtyak to quit using TerminateProcess. This entire technique is described in more detail in a blog post by Secret Club. We believe that’s where the authors of Roshtyak got this trick from. It’s also worth mentioning Roshtyak uses this technique only on Windows builds 18323 (19H1) and later because the undocumented thread creation flag was not implemented on prior builds.
Indirect registry writes
Roshtyak performs many suspicious registry operations, for example, setting up the RunOnce key for persistence. Since modifications to such keys are likely to be monitored, Roshtyak attempts to circumvent the monitoring. It first generates a random registry key name and temporarily renames the RunOnce key to the random name using ZwRenameKey. Once renamed, Roshtyak adds a new persistence entry to the temporary key before finally renaming it back to RunOnce. This method of writing to the registry can be easily detected, but it might bypass some simple hooking-based monitoring methods.
Similarly, there are multiple methods Roshtyak uses to delete files. Aside from the apparent call to NtDeleteFile, Roshtyak is able to effectively delete a file by setting FileDispositionInformation or FileRenameInformation in a call to ZwSetInformationFile. However, unlike the registry modification method, this doesn’t seem to be implemented in order to evade detection. Instead, Roshtyak will try these alternative methods if the initial call to NtDelete file fails.
Checking VBAWarnings
The VBAWarnings registry value controls how Microsoft Office behaves when a user opens a document containing embedded VBA macros. If this value is 1 (meaning “Enable all macros”), macros are executed by default, even without the need for any user interaction. This is a common setting for sandboxes, which are designed to detonate maldocs automatically. On the other hand, this setting is uncommon for regular users, who generally don’t go around changing random settings to make themselves more vulnerable (at least most of them don’t). Roshtyak therefore uses this check to differentiate between sandboxes and regular users and refuses to run further if the value of VBAWarnings is 1. Interestingly, this means that users, who for whatever reason have lowered their security this way, are immune to Roshtyak.
Command line wiping
Roshtyak’s core is executed with very suspicious command lines, such as RUNDLL32.EXE SHELL32.DLL,ShellExec_RunDLL REGSVR32.EXE -U /s "C:\Users\<REDACTED>\AppData\Local\Temp\dpcw.etl.". These command lines don’t look particularly legitimate, so Roshtyak attempts to hide them during execution. It does this by wiping command line information collected from various sources. It starts by calling GetCommandLineA and GetCommandLineW and wiping both of the returned strings. Then it attempts to wipe the string pointed to by PEB->ProcessParameters->CommandLine (even if this points to a string that has already been wiped). Since Roshtyak is often running under WoW64, it also calls NtWow64QueryInformationProcess64 to obtain a pointer to PEB64 to wipe ProcessParameters->CommandLine obtained by traversing this “second” PEB. While the wiping of the command lines was probably meant to make Roshtyak look more legitimate, the complete absence of any command line is also highly unusual. This was noticed by the Red Canary researchers in their blog post, where they proposed a detection method based on these suspiciously empty command lines.
Roshtyak’s core process, as shown by Process Explorer. Note the suspiciously empty command line.
Additional tricks
Aside from the techniques described so far, Roshtyak uses many less sophisticated tricks that are commonly found in other malware as well. These include:
Hiding threads using ThreadHideFromDebugger (and verifying that the threads really got hidden using NtQueryInformationThread)
Patching DbgBreakPoint in ntdll
Detecting user inactivity using GetLastInputInfo
Checking fields from PEB (BeingDebugged, NtGlobalFlag)
Checking fields from KUSER_SHARED_DATA (KdDebuggerEnabled, ActiveProcessorCount, NumberOfPhysicalPages)
Checking the names of all running processes (some are compared by hash, some by patterns, and some by character distribution)
Hashing the names of all loaded modules and checking them against a hardcoded blacklist
Verifying the main process name is not too long and doesn’t match known names used in sandboxes
Using the cpuid instruction to check hypervisor information and the processor brand
Using poorly documented COM interfaces
Checking the username and computername against a hardcoded blacklist
Checking for the presence of known sandbox decoy files
Checking MAC addresses of own adapters against a hardcoded blacklist
Checking MAC addresses from the ARP table (using GetBestRoute to populate it and GetIpNetTable to inspect it)
Calling ZwQueryInformationProcess with ProcessDebugObjectHandle, ProcessDebugFlags, and ProcessDebugPort
Checking DeviceId of display devices (using EnumDisplayDevices)
Checking ProductId of \\.\PhysicalDrive0 (using IOCTL_STORAGE_QUERY_PROPERTY)
Checking for virtual hard disks (using NtQuerySystemInformation with SystemVhdBootInformation)
Checking the raw SMBIOS firmware table (using NtQuerySystemInformation with SystemFirmwareTableInformation)
Setting up Defender exclusions (both for paths and processes)
Removing IFEO registry keys related to process names used by the malware
Obfuscation
We’ve shown many anti-analysis tricks that are designed to prevent Roshtyak from detonating in undesirable execution environments. These tricks alone would be easy to patch or bypass. What makes analyzing Roshtyak especially lethal is the combination of all these tricks with heavy obfuscation and multiple layers of packing. This makes it very difficult to study the anti-analysis tricks statically and figure out how to pass all the checks in order to get Roshtyak to unpack itself. Furthermore, even the main payload received the same obfuscation, which means that statically analyzing Roshtyak’s core functionality also requires a great deal of deobfuscation.
In the rest of this section, we’ll go through the main obfuscation techniques used by Roshtyak.
A random code snippet from Roshtyak. As can be seen, the obfuscation makes the raw output of the Hex-Rays decompiler practically incomprehensible.
Control flow flattening
Control flow flattening is one of the most noticeable obfuscation techniques employed by Roshtyak. It is implemented in an unusual way, giving the control flow graphs of Roshtyak’s functions a unique look (see below). The goal of control flow flattening is to obscure control flow relations between individual code blocks.
Control flow is directed by a 32-bit control variable, which tracks the execution state, identifying the code block to be executed. This control variable is initialized at the start of each function to refer to the starting code block (which is frequently a nop block). The control variable is then modified at the end of each code block to identify the next code block that should be executed. The modification is performed using some arithmetic instructions, such as add, sub, or xor.
There is a dispatcher using the control variable to route execution into the correct code block. This dispatcher is made up of if/else blocks that are circularly linked into a loop. Each dispatcher block takes the control variable and masks it using arithmetic instructions to check if it should route execution into the code block that it is guarding. What’s interesting here is there are multiple points of entry from the code blocks into the dispatcher loop, giving the control flow graphs the jagged “sawblade” look in IDA.
Branching is performed using a special code block containing an imul instruction. It relies on the previous block to compute a branch flag. This branch flag is multiplied using the imul instruction with a random constant, and the result is added, subbed, or xored to the new control variable. This means that after the branch block, the control variable will identify one of the two possible succeeding code blocks, depending on the value that was computed for the branch flag.
Control flow graph of a function obfuscated using control flow flattening
Function activation keys
Roshtyak’s obfuscated functions expect an extra argument, which we call an activation key. This activation key is used to decrypt all local constants, strings, variables, etc. If a function is called with a wrong activation key, the decryption results in garbage plaintext, which will most likely cause Roshtyak to get stuck in an infinite loop inside the control flow dispatcher. This is because all constants used by the dispatcher (the initial value of the control variable, the masks used by the dispatcher guards, and the constants used to jump to the next code block) are encrypted with the activation key. Without the correct activation key, the dispatcher simply does not know how to dispatch.
Reverse engineering a function is practically impossible without knowing the correct activation key. All strings, buffers, and local variables/constants remain encrypted, all cross-references are lost, and worse, there is no control flow information. Only individual code blocks remain, with no way to know how they relate to each other.
Each obfuscated function has to be called from somewhere, which means the code calling the function has to supply the correct activation key. However, obtaining the activation key is not that easy. First, call targets are also encrypted with activation keys, so it’s impossible to find where a function is called from without knowing the right activation keys. Second, even the supplied activation key is encrypted with the activation key of the calling function. And that activation key got encrypted with the activation key of the next calling function. And so on, recursively, all the way until the entry point function.
This brings us to how to deobfuscate the mess. The activation key of the entry point function must be there in plaintext. Using this activation key, it is possible to decrypt the call targets and activation keys of functions that are called directly from this entry point function. Applying this method recursively allows us to reconstruct the full call graph along with the activation keys of all the functions. The only exceptions would be functions that were never called and were left in by the compiler. These functions will probably remain a mystery, but since the sample does not use them, they are not that important from a malware analyst’s point of view.
Variable masking
Some variables are not stored in plaintext form but are masked using one or more arithmetic instructions. This means that if Roshtyak is not actively using a variable, it keeps the variable’s value in an obfuscated form. Whenever Roshtyak needs to use the variable, it has to first unmask it before it can use it. Conversely, after Roshtyak uses the variable, it converts it back into the masked form. This masking-based obfuscation method slightly complicates tracking variables during debugging and makes it harder to search memory for a known variable value.
Loop transformations
Roshtyak is creative with some loop conditions. Instead of writing a loop like for (int i = 0; i < 1690; i++), it transforms the loop into e.g. for (int32_t i = 0x06AB91EE; i != 0x70826068; i = i * -0x509FFFF + 0xEC891BB1). While both loops will execute exactly 1690 times, the second one is much harder to read. At first glance, it is not clear how many iterations the second loop executes (and if it even terminates). Tracking the number of loop iterations during debugging is also much harder in the second case.
Packing
As mentioned, Roshtyak’s core is hidden behind multiple layers of packing. While all the layers look like they were originally compiled into PE files, all but the strictly necessary data (entry point, sections, imports, and relocations) were stripped away. Furthermore, Roshtyak supports two custom formats for storing the stripped PE file information, and the layers take turns on what format they use. Additionally, parts of the custom formats are encrypted, sometimes using keys generated based on the results of various anti-analysis checks.
This makes it difficult to unpack Roshtyak’s layers statically into a standalone PE file. First, one would have to reverse engineer the custom formats and figure out how to decrypt the encrypted parts. Then, one would have to reconstruct the PE header, the sections, the section headers, and the import table (the relocation table doesn’t need to be reconstructed since relocations can just be turned off). While this is all perfectly doable (and can be simplified using libraries like LIEF), it might take a significant amount of time. Adding to this that the layers are sometimes interdependent, it might be easier to just analyze Roshtyak dynamically in memory.
A section header in one of the custom PE-like file formats: raw_size corresponds to SizeOfRawData, raw_size + virtual_padding_size is effectively VirtualSize. There is no VirtualAddress or PointerToRawData equivalent because the sections are loaded sequentially.
Other obfuscation techniques
In addition to the above-described techniques, Roshtyak also uses other obfuscation techniques, including:
Junk instruction insertion
Import hashing
Frequent memory wiping
Mixed boolean-arithmetic obfuscation
Redundant threading
Heavy polymorphism
Core Functionality
Now that we’ve described how Roshtyak protects itself, it might be interesting to also go over what it actually does. Roshtyak’s DLL is relatively large, over a megabyte, but its functionality is surprisingly simple once you eliminate all the obfuscation. Its main purpose is to download further payloads to execute. In addition, it does the usual evil malware stuff, namely establishing persistence, escalating privileges, lateral movement, and exfiltrating information about the victim.
Persistence
Roshtyak first generates a random file name in %SystemRoot%\Temp and moves its DLL image there. The generated file name consists of two to eight random lowercase characters concatenated with a random extension chosen from a hardcoded list. The PRNG used to generate this file name is seeded with the volume serial number of C:\. The sample we analyzed hardcoded seven extensions (.log, .tmp, .loc, .dmp, .out, .ttf, and .etl). We observed other extensions being used in other samples, suggesting this list is somewhat dynamic. With a small probability, Roshtyak will also use a randomly generated extension. Once fully constructed, the full path to the Roshtyak DLL might look like e.g. C:\Windows\Temp\wcdp.etl.
After the DLL image is moved to the new filesystem path, Roshtyak stomps its Modified timestamp to the current system time. It then proceeds to set up a RunOnce(Ex) registry key to actually establish persistence. The registry entry is created using the previously described indirect registry write technique. The command inserted into the key might look as follows:
There are a couple of things to note here. First, regsvr32 doesn’t care about the extensions of the DLLs it loads, allowing Roshtyak to hide under an innocent-looking extension such as .log. Second, the /s parameter puts regsvr32 into silent mode. Without it, regsvr32 would complain that it did not find an export named DllUnregisterServer. Finally, notice the trailing period character at the end of the path. This period is removed during path normalization, so it practically has no effect on the command. We are not exactly sure what the author’s original intention behind including this period character is. It looks like it could have been designed to trick some anti-malware software into not being able to connect the persistence entry with the payload on the filesystem.
By default, Roshtyak uses the HKCU\SOFTWARE\Microsoft\Windows\CurrentVersion\RunOnce key for persistence. However, under some circumstances (such as when it detects that Kaspersky is running by checking for a process named avp.exe) the key HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\RunOnceEx will be used instead. The RunOnceEx key is capable of loading a DLL, so when using this key, Roshtyak specifies shell32.dll directly, omitting the use rundll32.
A RunOnceEx persistence entry established by Roshtyak
Privilege escalation
Roshtyak uses both UAC bypasses and regular EoP exploits in an attempt to elevate its privileges. Unlike many other pieces of malware, which just blindly execute whatever UAC bypasses/exploits the authors could find, Roshtyak makes efforts to figure out if the privilege escalation method is even likely to be successful. This was probably implemented to lower the chances of detection due to the unnecessary usage of incompatible bypasses/exploits. For UAC bypasses, this involves checking the ConsentPromptBehaviorAdmin and ConsentPromptBehaviorUser registry keys. For EoP exploits, this is about checking the Windows build number and patch level.
Besides checking the ConsentPromptBehavior(Admin|User) keys, Roshtyak performs other sanity checks to ensure that it should proceed with the UAC bypass. Namely, it checks for admin privileges using CheckTokenMembership with the SID S-1-5-32-544 (DOMAIN_ALIAS_RID_ADMINS). It also inspects the value of the DbgElevationEnabled flag in KUSER_SHARED_DATA.SharedDataFlags. This is an undocumented flag that is set if UAC is enabled. Finally, there are AV checks for BitDefender (detected by the module atcuf32.dll), Kaspersky (process avp.exe), and our own Avast/AVG (module aswhook.dll). If one of these AVs is detected, Roshtyak avoids selected UAC bypass techniques, presumably the ones that might result in detection.
As for the actual UAC bypasses, there are two main methods implemented. The first is an implementation of the aptly named ucmDccwCOM method from UACMe. Interestingly when this method is executed, Roshtyak temporarily masquerades its process as explorer.exe by overwriting FullDllName and BaseDllName in the _LDR_MODULE structure corresponding to the main executable module. The payload launched by this method is a randomly named LNK file, dropped into %TEMP% using the IShellLink COM interface. This LNK file is designed to relaunch the Roshtyak DLL, through LOLBins such as advpack or register-cimprovider.
The second method is more of a UAC bypass framework than a specific bypass method, because multiple UAC bypass methods follow the same simple pattern: first registering some specific shell open command and then executing an autoelevating Windows binary (which internally triggers the shell open command). For instance, a UAC bypass might be accomplished by writing a payload command to HKCU\Software\Classes\ms-settings\shell\open\command and then executing fodhelper.exe from %windir%\system32. Basically, the same bypass can be achieved by substituting the pair ms-settings/fodhelper.exe with other pairs, such as mscfile/eventvwr.exe. Roshtyak uses the following six pairs to bypass UAC:
Class
Executable
mscfile
eventvwr.exe
mscfile
compmgmtlauncher.exe
ms-settings
fodhelper.exe
ms-settings
computerdefaults.exe
Folder
sdclt.exe
Launcher.SystemSettings
slui.exe
Let’s now look at the kernel exploits (CVE-2020-1054 and CVE-2021-1732) Roshtyak uses to escalate privileges. As is often the case in Roshtyak, these exploits are stored encrypted and are only decrypted on demand. Interestingly, once decrypted, the exploits turn out to be regular PE files with completely valid headers (unlike the other layers in Roshtyak, which are either in shellcode form or stored in a custom stripped PE format). Moreover, the exploits lack the obfuscation given to the rest of Roshtyak, so their code is immediately decompilable, and only some basic string encryption is used. We don’t know why the attackers left these exploits so exposed, but it might be due to the difference in bitness. While Roshtyak itself is x86 code (most of the time running under WoW64), the exploits are x64 (which makes sense considering they exploit vulnerabilities in 64-bit code). It could be that the obfuscation tools used by Roshtyak’s authors were designed to work on x86 and are not portable to x64.
Snippet from Roshtyak’s exploit for CVE-2020-1054, scanning through IsMenu to find the offset to HMValidateHandle.
To execute the exploits, Roshtyak spawns (the AMD64 version of) winver.exe and gets the exploit code to run there using the KernelCallbackTable injection method. Roshtyak’s implementation of this injection method essentially matches a public PoC, with the biggest difference being the usage of slightly different API functions due to the need for cross-subsystem injection (e.g. NtWow64QueryInformationProcess64 instead of NtQueryInformationProcess or NtWow64ReadVirtualMemory64 instead of ReadProcessMemory). The code injected into winver.exe is not the exploit PE itself but rather a slightly obfuscated shellcode, designed to load the exploit PE into memory.
The kernel exploits target certain unpatched versions of Windows. Specifically, CVE-2020-1054 is only used on Windows 7 systems where the revision number is not higher than 24552. On the other hand, the exploit for CVE-2021-1732 runs on Windows 10, with the targeted build number range being from 16353 to 19042. Before exploiting CVE-2021-1732, Roshtyak also scans through installed update packages to see if a patch for the vulnerability is installed. It does this by enumerating the registry keys under HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Component Based Servicing\Packages and checking if the package for KB4601319 (or higher) is present.
Lateral movement
When it comes to lateral movement, Roshtyak simply uses the tried and tested PsExec tool. Before executing PsExec, Roshtyak ensures it makes sense to run it by checking for a SID matching the “well-known” WinAccountDomainAdminsSid group. If domain admin rights are not detected, Roshtyak skips its lateral movement phase entirely.
Roshtyak attempts to get around detection by setting Defender exclusions, as PsExec is often flagged as a hacktool (for good reasons). It sets a path exclusion for %TEMP% (where it will drop PsExec and other files used for lateral movement). Later, it sets up a process exclusion for the exact path from which PsExec will be executed.
While we would expect PsExec to be bundled inside Roshtyak, it turns out Roshtyak downloads it on demand from https://download.sysinternals[.]com/files/PSTools.zip. The downloaded zip archive is dropped into %TEMP% under a random name with the .zip extension. PsExec is then unzipped from this archive using the Windows Shell COM interface (IShellDispatch) into a randomly named .exe file in %TEMP%.
The payload to be executed by PsExec is a self-extracting package created by a tool called IExpress. This is an archaic installer that’s part of Windows, which is probably why it’s used, since Roshtyak can rely on it already being on the victim machine. The installer generation is configured by a text file using the Self Extraction Directive (SED) syntax.
Roshtyak’s IExpress configuration template
Roshtyak uses a SED configuration template with three placeholders (%1, %2, and %3) that it substitutes with real values at runtime. As seen above, the configuration template was written in mixed-case, which is frequently used in Raspberry Robin in general. Once the SED configuration is prepared, it is written into a randomly named .txt file in %TEMP%. Then, iexpress is invoked to generate the payload using a command such as C:\Windows\iexpress.exe /n /q <path_to_sed_config>. The generated payload is dumped into a randomly named .exe file in %TEMP%, as configured by the TargetName directive (placeholder %1).
Once the payload is generated, Roshtyak proceeds to actually run PsExec. There are two ways Roshtyak can execute PsExec. The first one uses the command <path_to_psexec> \\* -accepteula -c -d -s <path_to_payload>. Here, the \\* wildcard instructs PsExec to run the payload on all computers in the current domain. Alternatively, Roshtyak might run the command <path_to_psexec> @<path_to_target_file> -accepteula -c -d -s <path_to_payload>. Here, the target_file is a text file containing a specific list of computers to run the payload on. Roshtyak builds this list by enumerating Active Directory objects using API functions exported from activeds.dll.
Profiling the victim
USB worms tend to have a life of their own. Since their worming behavior is usually completely automated, the threat actor who initially deployed the worm doesn’t necessarily have full control over where it spreads. This is why it’s important for threat actors to have the worm beacon back to their C&C servers. With a beaconing mechanism in place, the threat actor can be informed about all the machines under their control and can use this knowledge to manage the worm as a whole.
The outgoing beaconing messages typically contain some information about the infected machine. This helps financially-motivated cybercriminals decide on how to best monetize the infection. Roshtyak is no exception to this, and it collects a lot of information about each infected victim. Roshtyak concatenates all the collected information into a large string, using semicolons as delimiters. This large string is then exfiltrated to one of Roshtyak’s C&C servers. The exfiltrated pieces of information are listed below, in order of concatenation.
External IP address (obtained during a Tor connectivity check)
A string hardcoded into Roshtyak’s code, e.g. AFF123 (we can’t be sure what’s the meaning behind this, but it looks like an affiliate ID)
A 16-bit hash of the DLL’s PE header (with some fields zeroed out) xored with the lower 16 bits of its TimeDateStamp. The TimeDateStamp appears to be specially crafted so that the xor results in a known value. This could function as a tamper check or a watermark.
Creation timestamp of the System Volume Information folder on the system drive
Active processes (NtQuerySystemInformation(SystemProcessInformation))
Screenshot encoded in base64 (gdi32 method)
Beaconing
Once collected, Roshtyak sends the victim profile to one of its C&C servers. The profile is sent over the Tor network, using a custom comms module Roshtyak injects into a newly spawned process. The C&C server processes the exfiltrated profile and might respond with a shellcode payload for the core module to execute.
Let’s now take a closer look at this whole process. It’s worth mentioning that before generating any malicious traffic, Roshtyak first performs a Tor connectivity check. This is done by contacting 28 legitimate and well-known .onion addresses in random order and checking if at least one of them responds. If none of them respond, Roshtyak doesn’t even attempt to contact its C&C, as it would most likely not get through to it anyway.
As for the actual C&C communication, Roshtyak contains 35 hardcoded V2 onion addresses (e.g. ip2djbz3xidmkmkw:53148, see our IoC repository for the full list). Like during the connectivity check, Roshtyak iterates through them in random order and attempts to contact each of them until one responds. Note that while V2 onion addresses are officially deprecated in favor of V3 addresses (and the Tor Browser no longer supports them in its latest version) they still appear to be functional enough for Roshtyak’s nefarious purposes.
Roshtyak’s hardcoded C&C addresses
The victim profile is sent in the URL path, appended to the V2 onion address, along with the / character. As the raw profile might contain characters forbidden for use in URLs, the profile is wrapped in a custom structure and encoded using Base64. The very first 0x10 bytes of the custom structure serve as an encryption key, with the rest of the structure being encrypted. The custom structure also contains a 64-bit hash of the victim profile, which presumably serves as an integrity check. Interestingly, the custom structure might get its end padded with random bytes. Note that the full path could be pretty large, as it contains a doubly Base64-encoded screenshot. The authors of Roshtyak were probably aware that the URL path is not suitable for sending large amounts of data and decided to cap the length of the victim profile at 0x20000 bytes. If the screenshot makes the exfiltrated profile larger than this limit, it isn’t included.
When the full onion URL is constructed, Roshtyak goes ahead to launch its Tor comms module. It first spawns a dummy process to host the comms module. This dummy process is randomly chosen and can be one of dllhost.exe, regsvr32.exe, or rundll32.exe. The comms module is injected into the newly spawned process using a shared section, obfuscated through the previously described shellcode hiding technique. The comms module is then executed via NtQueueApcThreadEx, using the already discussed ntdll gadget trick. The injected comms module is a custom build of an open-source Tor library packed in three additional protective shellcode layers.
The core module communicates with the comms module using shared sections as an IPC mechanism. Both modules synchronously use the same PRNG with the same seed (KUSER_SHARED_DATA.Cookie) to generate the same section name. Both then map this named section into their respective address spaces and communicate with each other by reading/writing to it. The data read/written into the section is encrypted with RC4 (the key also generated using the synchronized PRNGs).
The communication between the core module and the comms module follows a simple request/response pattern. The core module writes an encrypted onion URL (including the URL path to exfiltrate) into the shared section. The comms module then decrypts the URL and makes an HTTP request over Tor to it. The core module waits for the comms module to write the encrypted HTTP response back to the shared section. Once it’s there, the core module decrypts it and unwraps it from a custom format (which includes decrypting it yet again and computing a hash to check the payload’s integrity). The decrypted payload might include a shellcode for the core module to execute. If the shellcode is present, the core module allocates a huge chunk of memory, hides the shellcode there using the shellcode hiding technique, and executes it in a new thread. This new thread is hidden using the NtSetInformationThread -> ThreadHideFromDebugger technique (including a follow-up anti-hooking check using NtGetInformationThread to confirm that the NtSetInformationThread call did indeed succeed).
Conclusion
In this blog post, we took a technical deep dive into Roshtyak, the backdoor payload associated with Raspberry Robin. The main focus was to describe how to deal with Roshtyak’s protection mechanisms. We showed some never-before-seen anti-debugger/anti-sandbox/anti-VM tricks and discussed Roshtyak’s heavy obfuscation. We also described Roshtyak’s core functionality. Specifically, we detailed how it establishes persistence, escalates privileges, moves laterally, and uses Tor to download further payloads.
We have to admit that reverse engineering Roshtyak was certainly no easy task. The combination of heavy obfuscation and numerous advanced anti-analysis tricks made it a considerable challenge. Nick Harbour, if you’re looking for something to repurpose for next year’s final Flare-On challenge, this might be it.
We recently discovered a zero-day vulnerability in Google Chrome (CVE-2022-2294) when it was exploited in the wild in an attempt to attack Avast users in the Middle East. Specifically, a large portion of the attacks took place in Lebanon, where journalists were among the targeted parties.
The vulnerability was a memory corruption in WebRTC that was abused to achieve shellcode execution in Chrome’s renderer process. We reported this vulnerability to Google, who patched it on July 4, 2022.
Based on the malware and TTPs used to carry out the attack, we can confidently attribute it to a secretive spyware vendor of many names, most commonly known as Candiru. (A name the threat actors chose themselves, inspired by a horrifying parasitic fish of the same name.)
After Candiru was exposed by Microsoft and CitizenLab in July 2021, it laid low for months, most likely taking its time to update its malware to evade existing detection. We’ve seen it return with an updated toolset in March 2022, targeting Avast users located in Lebanon, Turkey, Yemen, and Palestine via watering hole attacks using zero-day exploits for Google Chrome. We believe the attacks were highly targeted.
Exploit Delivery and Protection
There were multiple attack campaigns, each delivering the exploit to the victims in its own way.
In Lebanon, the attackers seem to have compromised a website used by employees of a news agency. We can’t say for sure what the attackers might have been after, however often the reason why attackers go after journalists is to spy on them and the stories they’re working on directly, or to get to their sources and gather compromising information and sensitive data they shared with the press.
Interestingly, the compromised website contained artifacts of persistent XSS attacks, with there being pages that contained calls to the Javascript function alert along with keywords like test. We suppose that this is how the attackers tested the XSS vulnerability, before ultimately exploiting it for real by injecting a piece of code that loads malicious Javascript from an attacker-controlled domain. This injected code was then responsible for routing the intended victims (and only the intended victims) to the exploit server, through several other attacker-controlled domains.
The malicious code injected into the compromised website, loading further Javascript from stylishblock[.]com
Once the victim gets to the exploit server, Candiru gathers more information. A profile of the victim’s browser, consisting of about 50 data points, is collected and sent to the attackers. The collected information includes the victim’s language, timezone, screen information, device type, browser plugins, referrer, device memory, cookie functionality, and more. We suppose this was done to further protect the exploit and make sure that it only gets delivered to the targeted victims. If the collected data satisfies the exploit server, it uses RSA-2048 to exchange an encryption key with the victim. This encryption key is used with AES-256-CBC to establish an encrypted channel through which the zero-day exploits get delivered to the victim. This encrypted channel is set up on top of TLS, effectively hiding the exploits even from those who would be decrypting the TLS session in order to capture plaintext HTTP traffic.
Exploits and Vulnerabilities
We managed to capture a zero-day exploit that abused a heap buffer overflow in WebRTC to achieve shellcode execution inside a renderer process. This zero-day was chained with a sandbox escape exploit, which was unfortunately further protected and we were not able to recover it. We extracted a PoC from the renderer exploit and sent it to Google’s security team. They fixed the vulnerability, assigning it CVE-2022-2294 and releasing a patch in Chrome version 103.0.5060.114 (Stable channel).
While the exploit was specifically designed for Chrome on Windows, the vulnerability’s potential was much wider. Since the root cause was located in WebRTC, the vulnerability affected not only other Chromium-based browsers (like Microsoft Edge) but also different browsers like Apple’s Safari. We do not know if Candiru developed exploits other than the one targeting Chrome on Windows, but it’s possible that they did. Our Avast Secure Browser was patched on July 5. Microsoft adopted the Chromium patch on July 6, while Apple released a patch for Safari on July 20. We encourage all other WebRTC integrators to patch as soon as possible.
At the end of the exploit chain, the malicious payload (called DevilsTongue, a full-blown spyware) attempts to get into the kernel using another zero-day exploit. This time, it is targeting a legitimate signed kernel driver in a BYOVD (Bring Your Own Vulnerable Driver) fashion. Note that for the driver to be exploited, it has to be first dropped to the filesystem (Candiru used the path C:\Windows\System32\drivers\HW.sys) and loaded, which represents a good detection opportunity.
The driver is exploited through IOCTL requests. In particular, there are two vulnerable IOCTLs: 0x9C40648C can be abused for reading physical memory and 0x9C40A4CC for writing physical memory. We reported this to the driver’s developer, who acknowledged the vulnerability and claimed to be working on a patch. Unfortunately, the patch will not stop the attackers, since they can just continue to exploit the older, unpatched driver. We are also discussing a possible revocation, but that would not be a silver bullet either, because Windows doesn’t always check the driver’s revocation status. Driver blocklisting seems to be the best solution for now.
One of the vulnerable ioctl handlers
While there is no way for us to know for certain whether or not the WebRTC vulnerability was exploited by other groups as well, it is a possibility. Sometimes zero-days get independently discovered by multiple groups, sometimes someone sells the same vulnerability/exploit to multiple groups, etc. But we have no indication that there is another group exploiting this same zero-day.
Because Google was fast to patch the vulnerability on July 4, Chrome users simply need to click the button when the browser prompts them to “restart to finish applying the update.” The same procedure should be followed by users of most other Chromium-based browsers, including Avast Secure Browser. Safari users should update to version 15.6.
In October 2021, we discovered that the Magnitude exploit kit was testing out a Chromium exploit chain in the wild. This really piqued our interest, because browser exploit kits have in the past few years focused mainly on Internet Explorer vulnerabilities and it was believed that browsers like Google Chrome are just too big of a target for them.
#MagnitudeEK is now stepping up its game by using CVE-2021-21224 and CVE-2021-31956 to exploit Chromium-based browsers. This is an interesting development since most exploit kits are currently targeting exclusively Internet Explorer, with Chromium staying out of their reach.
We’ve been monitoring the exploit kit landscape very closely since our discoveries, watching out for any new developments. We were waiting for other exploit kits to jump on the bandwagon, but none other did, as far as we can tell. What’s more, Magnitude seems to have abandoned the Chromium exploit chain. And while Underminer still continues to use these exploits today, its traditional IE exploit chains are doing much better. According to our telemetry, less than 20% of Underminer’s exploitation attempts are targeting Chromium-based browsers.
This is some very good news because it suggests that the Chromium exploit chains were not as successful as the attackers hoped they would be and that it is not currently very profitable for exploit kit developers to target Chromium users. In this blog post, we would like to offer some thoughts into why that could be the case and why the attackers might have even wanted to develop these exploits in the first place. And since we don’t get to see a new Chromium exploit chain in the wild every day, we will also dissect Magnitude’s exploits and share some detailed technical information about them.
Exploit Kit Theory
To understand why exploit kit developers might have wanted to test Chromium exploits, let’s first look at things from their perspective. Their end goal in developing and maintaining an exploit kit is to make a profit: they just simply want to maximize the difference between money “earned” and money spent. To achieve this goal, most modern exploit kits follow a simple formula. They buy ads targeted to users who are likely to be vulnerable to their exploits (e.g. Internet Explorer users). These ads contain JavaScript code that is automatically executed, even when the victim doesn’t interact with the ad in any way (sometimes referred to as drive-by attacks). This code can then further profile the victim’s browser environment and select a suitable exploit for that environment. If the exploitation succeeds, a malicious payload (e.g. ransomware or a coinminer) is deployed to the victim. In this scenario, the money “earned” could be the ransom or mining rewards. On the other hand, the money spent is the cost of ads, infrastructure (renting servers, registering domain names etc.), and the time the attacker spends on developing and maintaining the exploit kit.
Modus operandi of a typical browser exploit kit
The attackers would like to have many diverse exploits ready at any given time because it would allow them to cast a wide net for potential victims. But it is important to note that individual exploits generally get less effective over time. This is because the number of people susceptible to a known vulnerability will decrease as some people patch and other people upgrade to new devices (which are hopefully not plagued by the same vulnerabilities as their previous devices). This forces the attackers to always look for new vulnerabilities to exploit. If they stick with the same set of exploits for years, their profit would eventually reduce down to almost nothing.
So how do they find the right vulnerabilities to exploit? After all, there are thousands of CVEs reported each year, but only a few of them are good candidates for being included in an exploit kit. Weaponizing an exploit generally takes a lot of time (unless, of course, there is a ready-to-use PoC or the exploit can be stolen from a competitor), so the attackers might first want to carefully take into account multiple characteristics of each vulnerability. If a vulnerability scores well across these characteristics, it looks like a good candidate for inclusion in an exploit kit. Some of the more important characteristics are listed below.
Prevalence of the vulnerability The more users are affected by the vulnerability, the more attractive it is to the attackers.
Exploit reliability Many exploits rely on some assumptions or are based on a race condition, which makes them fail some of the time. The attackers obviously prefer high-reliability exploits.
Difficulty of exploit development This determines the time that needs to be spent on exploit development (if the attackers are even capable of exploiting the vulnerability). The attackers tend to prefer vulnerabilities with a public PoC exploit, which they can often just integrate into their exploit kit with minimal effort.
Targeting precision The attackers care about how hard it is to identify (and target ads to) vulnerable victims. If they misidentify victims too often (meaning that they serve exploits to victims who they cannot exploit), they’ll just lose money on the malvertising.
Expected vulnerability lifetime As was already discussed, each vulnerability gets less effective over time. However, the speed at which the effectiveness drops can vary a lot between vulnerabilities, mostly based on how effective is the patching process of the affected software.
Exploit detectability The attackers have to deal with numerous security solutions that are in the business of protecting their users against exploits. These solutions can lower the exploit kit’s success rate by a lot, which is why the attackers prefer more stealthy exploits that are harder for the defenders to detect.
Exploit potential Some exploits give the attackers System, while others might make them only end up inside a sandbox. Exploits with less potential are also less useful, because they either need to be chained with other LPE exploits, or they place limits on what the final malicious payload is able to do.
Looking at these characteristics, the most plausible explanation for the failure of the Chromium exploit chains is the expected vulnerability lifetime. Google is extremely good at forcing users to install browser patches: Chrome updates are pushed to users when they’re ready and can happen many times in a month (unlike e.g. Internet Explorer updates which are locked into the once-a-month “Patch Tuesday” cycle that is only broken for exceptionally severe vulnerabilities). When CVE-2021-21224 was a zero-day vulnerability, it affected billions of users. Within a few days, almost all of these users received a patch. The only unpatched users were those who manually disabled (or broke) automatic updates, those who somehow managed not to relaunch the browser in a long time, and those running Chromium forks with bad patching habits.
A secondary reason for the failure could be attributed to bad targeting precision. Ad networks often allow the attackers to target ads based on various characteristics of the user’s browser environment, but the specific version of the browser is usually not one of these characteristics. For Internet Explorer vulnerabilities, this does not matter that much: the attackers can just buy ads for Internet Explorer users in general. As long as a certain percentage of Internet Explorer users is vulnerable to their exploits, they will make a profit. However, if they just blindly targeted Google Chrome users, the percentage of vulnerable victims might be so low, that the cost of malvertising would outweigh the money they would get by exploiting the few vulnerable users. Google also plans to reduce the amount of information given in the User-Agent string. Exploit kits often heavily rely on this string for precise information about the browser version. With less information in the User-Agent header, they might have to come up with some custom version fingerprinting, which would most likely be less accurate and costly to manage.
Now that we have some context about exploit kits and Chromium, we can finally speculate about why the attackers decided to develop the Chromium exploit chains. First of all, adding new vulnerabilities to an exploit kit seems a lot like a “trial and error” activity. While the attackers might have some expectations about how well a certain exploit will perform, they cannot know for sure how useful it will be until they actually test it out in the wild. This means it should not be surprising that sometimes, their attempts to integrate an exploit turn out worse than they expected. Perhaps they misjudged the prevalence of the vulnerabilities or thought that it would be easier to target the vulnerable victims. Perhaps they focused too much on the characteristics that the exploits do well on: after all, they have reliable, high-potential exploits for a browser that’s used by billions. It could also be that this was all just some experimentation where the attackers just wanted to explore the land of Chromium exploits.
It’s also important to point out that the usage of Internet Explorer (which is currently vital for the survival of exploit kits) has been steadily dropping over the past few years. This may have forced the attackers to experiment with how viable exploits for other browsers are because they know that sooner or later they will have to make the switch. But judging from these attempts, the attackers do not seem fully capable of making the switch as of now. That is some good news because it could mean that if nothing significant changes, exploit kits might be forced to retire when Internet Explorer usage drops below some critical limit.
CVE-2021-21224
Let’s now take a closer look at the Magnitude’s exploit chain that we discovered in the wild. The exploitation starts with a JavaScript exploit for CVE-2021-21224. This is a type confusion vulnerability in V8, which allows the attacker to execute arbitrary code within a (sandboxed) Chromium renderer process. A zero-day exploit for this vulnerability (or issue 1195777, as it was known back then since no CVE ID had been assigned yet) was dumped on Github on April 14, 2021. The exploit worked for a couple of days against the latest Chrome version, until Google rushed out a patch about a week later.
It should not be surprising that Magnitude’s exploit is heavily inspired by the PoC on Github. However, while both Magnitude’s exploit and the PoC follow a very similar exploitation path, there are no matching code pieces, which suggests that the attackers didn’t resort that much to the “Copy/Paste” technique of exploit development. In fact, Magnitude’s exploit looks like a more cleaned-up and reliable version of the PoC. And since there is no obfuscation employed (the attackers probably meant to add it in later), the exploit is very easy to read and debug. There are even very self-explanatory function names, such as confusion_to_oob, addrof, and arb_write, and variable names, such as oob_array, arb_write_buffer, and oob_array_map_and_properties. The only way this could get any better for us researchers would be if the authors left a couple of helpful comments in there…
Interestingly, some parts of the exploit also seem inspired by a CTF writeup for a “pwn” challenge from *CTF 2019, in which the players were supposed to exploit a made-up vulnerability that was introduced into a fork of V8. While CVE-2021-21224 is obviously a different (and actual rather than made-up) vulnerability, many of the techniques outlined in that writeup apply for V8 exploitation in general and so are used in the later stages of the Magnitude’s exploit, sometimes with the very same variable names as those used in the writeup.
The core of the exploit, triggering the vulnerability to corrupt the length of vuln_array
The root cause of the vulnerability is incorrect integer conversion during the SimplifiedLowering phase. This incorrect conversion is triggered in the exploit by the Math.max call, shown in the code snippet above. As can be seen, the exploit first calls foofunc in a loop 0x10000 times. This is to make V8 compile that function because the bug only manifests itself after JIT compilation. Then, helper["gcfunc"] gets called. The purpose of this function is just to trigger garbage collection. We tested that the exploit also works without this call, but the authors probably put it there to improve the exploit’s reliability. Then, foofunc is called one more time, this time with flagvar=true, which makes xvar=0xFFFFFFFF. Without the bug, lenvar should now evaluate to -0xFFFFFFFF and the next statement should throw a RangeError because it should not be possible to create an array with a negative length. However, because of the bug, lenvar evaluates to an unexpected value of 1. The reason for this is that the vulnerable code incorrectly converts the result of Math.max from an unsigned 32-bit integer 0xFFFFFFFF to a signed 32-bit integer -1. After constructing vuln_array, the exploit calls Array.prototype.shift on it. Under normal circumstances, this method should remove the first element from the array, so the length of vuln_array should be zero. However, because of the disparity between the actual and the predicted value of lenvar, V8 makes an incorrect optimization here and just puts the 32-bit constant 0xFFFFFFFF into Array.length (this is computed as 0-1 with an unsigned 32-bit underflow, where 0 is the predicted length and -1 signifies Array.prototype.shift decrementing Array.length).
A demonstration of how an overwrite on vuln_array can corrupt the length of oob_array
Now, the attackers have successfully crafted a JSArray with a corrupted Array.length, which allows them to perform out-of-bounds memory reads and writes. The very first out-of-bounds memory write can be seen in the last statement of the confusion_to_oob function. The exploit here writes 0xc00c to vuln_array[0x10]. This abuses the deterministic memory layout in V8 when a function creates two local arrays. Since vuln_array was created first, oob_array is located at a known offset from it in memory and so by making out-of-bounds memory accesses through vuln_array, it is possible to access both the metadata and the actual data of oob_array. In this case, the element at index 0x10 corresponds to offset 0x40, which is where Array.length of oob_array is stored. The out-of-bounds write therefore corrupts the length of oob_array, so it is now too possible to read and write past its end.
The addrof and fakeobj exploit primitives
Next, the exploit constructs the addrof and fakeobj exploit primitives. These are well-known and very powerful primitives in the world of JavaScript engine exploitation. In a nutshell, addrof leaks the address of a JavaScript object, while fakeobj creates a new, fake object at a given address. Having constructed these two primitives, the attacker can usually reuse existing techniques to get to their ultimate goal: arbitrary code execution.
A step-by-step breakdown of the addrof primitive. Note that just the lower 32 bits of the address get leaked, while %DebugPrint returns the whole 64-bit address. In practice, this doesn’t matter because V8 compresses pointers by keeping upper 32 bits of all heap pointers constant.
Both primitives are constructed in a similar way, abusing the fact that vuln_array[0x7] and oob_array[0] point to the very same memory location. It is important to note here that vuln_array is internally represented by V8 as HOLEY_ELEMENTS, while oob_array is PACKED_DOUBLE_ELEMENTS (for more information about internal array representation in V8, please refer to this blog post by the V8 devs). This makes it possible to write an object into vuln_array and read it (or more precisely, the pointer to it) from the other end in oob_array as a double. This is exactly how addrof is implemented, as can be seen above. Once the address is read, it is converted using helper["f2ifunc"] from double representation into an integer representation, with the upper 32 bits masked out, because the double takes 64 bits, while pointers in V8 are compressed down to just 32 bits. fakeobj is implemented in the same fashion, just the other way around. First, the pointer is converted into a double using helper["i2ffunc"]. The pointer, encoded as a double, is then written into oob_array[0] and then read from vuln_array[0x7], which tricks V8 into treating it as an actual object. Note that there is no masking needed in fakeobj because the double written into oob_array is represented by more bits than the pointer read from vuln_array.
The arbitrary read/write exploit primitives
With addrof and fakeobj in place, the exploit follows a fairly standard exploitation path, which seems heavily inspired by the aforementioned *CTF 2019 writeup. The next primitives constructed by the exploit are arbitrary read/write. To achieve these primitives, the exploit fakes a JSArray (aptly named fake in the code snippet above) in such a way that it has full control over its metadata. It can then overwrite the fake JSArray’s elements pointer, which points to the address where the actual elements of the array get stored. Corrupting the elements pointer allows the attackers to point the fake array to an arbitrary address, and it is then subsequently possible to read/write to that address through reads/writes on the fake array.
Let’s look at the implementation of the arbitrary read/write primitive in a bit more detail. The exploit first calls the get_arw function to set up the fake JSArray. This function starts by using an overread on oob_array[3] in order to leak map and properties of oob_array (remember that the original length of oob_array was 3 and that its length got corrupted earlier). The map and properties point to structures that basically describe the object type in V8. Then, a new array called point_array gets created, with the oob_array_map_and_properties value as its first element. Finally, the fake JSArray gets constructed at offset 0x20 before point_array. This offset was carefully chosen, so that the the JSArray structure corresponding to fake overlaps with elements of point_array. Therefore, it is possible to control the internal members of fake by modifying the elements of point_array. Note that elements in point_array take 64 bits, while members of the JSArray structure usually only take 32 bits, so modifying one element of point_array might overwrite two members of fake at the same time. Now, it should make sense why the first element of point_array was set to oob_array_map_and_properties. The first element is at the same address where V8 would look for the map and properties of fake. By initializing it like this, fake is created to be a PACKED_DOUBLE_ELEMENTS JSArray, basically inheriting its type from oob_array.
The second element of point_array overlaps with the elements pointer and Array.length of fake. The exploit uses this for both arbitrary read and arbitrary write, first corrupting the elements pointer to point to the desired address and then reading/writing to that address through fake[0]. However, as can be seen in the exploit code above, there are some additional actions taken that are worth explaining. First of all, the exploit always makes sure that addrvar is an odd number. This is because V8 expects pointers to be tagged, with the least significant bit set. Then, there is the addition of 2<<32 to addrvar. As was explained before, the second element of point_array takes up 64 bits in memory, while the elements pointer and Array.length both take up only 32 bits. This means that a write to point_array[1] overwrites both members at once and the 2<<32 just simply sets the Array.length, which is controlled by the most significant 32 bits. Finally, there is the subtraction of 8 from addrvar. This is because the elements pointer does not point straight to the first element, but instead to a FixedDoubleArray structure, which takes up eight bytes and precedes the actual element data in memory.
A dummy WebAssembly program that will get hollowed out and replaced by Magnitude’s shellcode
The final step taken by the exploit is converting the arbitrary read/write primitive into arbitrary code execution. For this, it uses a well-known trick that takes advantage of WebAssembly. When V8 JIT-compiles a WebAssembly function, it places the compiled code into memory pages that are both writable and executable (there now seem to be some new mitigations that aim to prevent this trick, but it is still working against V8 versions vulnerable to CVE-2021-21224). The exploit can therefore locate the code of a JIT-compiled WebAssembly function, overwrite it with its own shellcode and then call the original WebAssembly function from Javascript, which executes the shellcode planted there.
Magnitude’s exploit first creates a dummy WebAssembly module that contains a single function called main, which just returns the number 42 (the original code of this function doesn’t really matter because it will get overwritten with the shellcode anyway). Using a combination of addrof and arb_read, the exploit obtains the address where V8 JIT-compiled the function main. Interestingly, it then constructs a whole new arbitrary write primitive using an ArrayBuffer with a corrupted backing store pointer and uses this newly constructed primitive to write shellcode to the address of main. While it could theoretically use the first arbitrary write primitive to place the shellcode there, it chooses this second method, most likely because it is more reliable. It seems that the first method might crash V8 under some rare circumstances, which makes it not practical for repeated use, such as when it gets called thousands of times to write a large shellcode buffer into memory.
There are two shellcodes embedded in the exploit. The first one contains an exploit for CVE-2021-31956. This one gets executed first and its goal is to steal the SYSTEM token to elevate the privileges of the current process. After the first shellcode returns, the second shellcode gets planted inside the JIT-compiled WebAssembly function and executed. This second shellcode injects Magniber ransomware into some already running process and lets it encrypt the victim’s drives.
CVE-2021-31956
Let’s now turn our attention to the second exploit in the chain, which Magnitude uses to escape the Chromium sandbox. This is an exploit for CVE-2021-31956, a paged pool buffer overflow in the Windows kernel. It was discovered in June 2021 by Boris Larin from Kaspersky, who found it being used as a zero-day in the wild as a part of the PuzzleMaker attack. The Kaspersky blog post about PuzzleMaker briefly describes the vulnerability and the way the attackers chose to exploit it. However, much more information about the vulnerability can be found in a two–part blog series by Alex Plaskett from NCC Group. This blog series goes into great detail and pretty much provides a step-by-step guide on how to exploit the vulnerability. We found that the attackers behind Magnitude followed this guide very closely, even though there are certainly many other approaches that they could have chosen for exploitation. This shows yet again that publishing vulnerability research can be a double-edged sword. While the blog series certainly helped many defend against the vulnerability, it also made it much easier for the attackers to weaponize it.
The vulnerability lies in ntfs.sys, inside the function NtfsQueryEaUserEaList, which is directly reachable from the syscall NtQueryEaFile. This syscall internally allocates a temporary buffer on the paged pool (the size of which is controllable by a syscall parameter) and places there the NTFS Extended Attributes associated with a given file. Individual Extended Attributes are separated by a padding of up to four bytes. By making the padding start directly at the end of the allocated pool chunk, it is possible to trigger an integer underflow which results in NtfsQueryEaUserEaList writing subsequent Extended Attributes past the end of the pool chunk. The idea behind the exploit is to spray the pool so that chunks containing certain Windows Notification Facility (WNF) structures can be corrupted by the overflow. Using some WNF magic that will be explained later, the exploit gains an arbitrary read/write primitive, which it uses to steal the SYSTEM token.
The exploit starts by checking the victim’s Windows build number. Only builds 18362, 18363, 19041, and 19042 (19H1 – 20H2) are supported, and the exploit bails out if it finds itself running on a different build. The build number is then used to determine proper offsets into the _EPROCESS structure as well as to determine correct syscall numbers, because syscalls are invoked directly by the exploit, bypassing the usual syscall stubs in ntdll.
Check for the victim’s Windows build number
Next, the exploit brute-forces file handles, until it finds one on which it can use the NtSetEAFile syscall to set its NTFS Extended Attributes. Two attributes are set on this file, crafted to trigger an overflow of 0x10 bytes into the next pool chunk later when NtQueryEaFile gets called.
Specially crafted NTFS Extended Attributes, designed to cause a paged pool buffer overflow
When the specially crafted NTFS Extended Attributes are set, the exploit proceeds to spray the paged pool with _WNF_NAME_INSTANCE and _WNF_STATE_DATA structures. These structures are sprayed using the syscalls NtCreateWnfStateName and NtUpdateWnfStateData, respectively. The exploit then creates 10 000 extra _WNF_STATE_DATA structures in a row and frees each other one using NtDeleteWnfStateData. This creates holes between _WNF_STATE_DATA chunks, which are likely to get reclaimed on future pool allocations of similar size.
With this in mind, the exploit now triggers the vulnerability using NtQueryEaFile, with a high likelihood of getting a pool chunk preceding a random _WNF_STATE_DATA chunk and thus overflowing into that chunk. If that really happens, the _WNF_STATE_DATA structure will get corrupted as shown below. However, the exploit doesn’t know which _WNF_STATE_DATA structure got corrupted, if any. To find the corrupted structure, it has to iterate over all of them and query its ChangeStamp using NtQueryWnfStateData. If the ChangeStamp contains the magic number 0xcafe, the exploit found the corrupted chunk. In case the overflow does not hit any _WNF_STATE_DATA chunk, the exploit just simply tries triggering the vulnerability again, up to 32 times. Note that in case the overflow didn’t hit a _WNF_STATE_DATA chunk, it might have corrupted a random chunk in the paged pool, which could result in a BSoD. However, during our testing of the exploit, we didn’t get any BSoDs during normal exploitation, which suggests that the pool spraying technique used by the attackers is relatively robust.
The corrupted _WNF_STATE_DATA instance. AllocatedSize and DataSize were both artificially increased, while ChangeStamp got set to an easily recognizable value.
After a successful _WNF_STATE_DATA corruption, more _WNF_NAME_INSTANCE structures get sprayed on the pool, with the idea that they will reclaim the other chunks freed by NtDeleteWnfStateData. By doing this, the attackers are trying to position a _WNF_NAME_INSTANCE chunk after the corrupted _WNF_STATE_DATA chunk in memory. To explain why they would want this, let’s first discuss what they achieved by corrupting the _WNF_STATE_DATA chunk.
The _WNF_STATE_DATA structure can be thought of as a header preceding an actual WnfStateData buffer in memory. The WnfStateData buffer can be read using the syscall NtQueryWnfStateData and written to using NtUpdateWnfStateData. _WNF_STATE_DATA.AllocatedSize determines how many bytes can be written to WnfStateData and _WNF_STATE_DATA.DataSize determines how many bytes can be read. By corrupting these two fields and setting them to a high value, the exploit gains a relative memory read/write primitive, obtaining the ability to read/write memory even after the original WnfStateData buffer. Now it should be clear why the attackers would want a _WNF_NAME_INSTANCE chunk after a corrupted _WNF_STATE_DATA chunk: they can use the overread/overwrite to have full control over a _WNF_NAME_INSTANCE structure. They just need to perform an overread and scan the overread memory for bytes 03 09 A8, which denote the start of their _WNF_NAME_INSTANCE structure. If they want to change something in this structure, they can just modify some of the overread bytes and overwrite them back using NtUpdateWnfStateData.
The exploit scans the overread memory, looking for a _WNF_NAME_INSTANCE header. 0x0903 here represents the NodeTypeCode, while 0xA8 is a preselected NodeByteSize.
What is so interesting about a _WNF_NAME_INSTANCE structure, that the attackers want to have full control over it? Well, first of all, at offset 0x98 there is _WNF_NAME_INSTANCE.CreatorProcess, which gives them a pointer to _EPROCESS relevant to the current process. Kaspersky reported that PuzzleMaker used a separate information disclosure vulnerability, CVE-2021-31955, to leak the _EPROCESS base address. However, the attackers behind Magnitude do not need to use a second vulnerability, because the _EPROCESS address is just there for the taking.
Another important offset is 0x58, which corresponds to _WNF_NAME_INSTANCE.StateData. As the name suggests, this is a pointer to a _WNF_STATE_DATA structure. By modifying this, the attackers can not only enlarge the WnfStateData buffer but also redirect it to an arbitrary address, which gives them an arbitrary read/write primitive. There are some constraints though, such as that the StateData pointer has to point 0x10 bytes before the address that is to be read/written and that there has to be some data there that makes sense when interpreted as a _WNF_STATE_DATA structure.
The StateData pointer gets first set to _EPROCESS+0x28, which allows the exploit to read _KPROCESS.ThreadListHead (interestingly, this value gets leaked using ChangeStamp and DataSize, not through WnfStateData). The ThreadListHead points to _KTHREAD.ThreadListEntry of the first thread, which is the current thread in the context of Chromium exploitation. By subtracting the offset of ThreadListEntry, the exploit gets the _KTHREAD base address for the current thread.
With the base address of _KTHREAD, the exploit points StateData to _KTHREAD+0x220, which allows it to read/write up to three bytes starting from _KTHREAD+0x230. It uses this to set the byte at _KTHREAD+0x232 to zero. On the targeted Windows builds, the offset 0x232 corresponds to _KTHREAD.PreviousMode. Setting its value to SystemMode=0 tricks the kernel into believing that some of the thread’s syscalls are actually originating from the kernel. Specifically, this allows the thread to use the NtReadVirtualMemory and NtWriteVirtualMemory syscalls to perform reads and writes to the kernel address space.
The exploit corrupting _KTHREAD.PreviousMode
As was the case in the Chromium exploit, the attackers here just traded an arbitrary read/write primitive for yet another arbitrary read/write primitive. However, note that the new primitive based on PreviousMode is a significant upgrade compared to the original StateData one. Most importantly, the new primitive is free of the constraints associated with the original one. The new primitive is also more reliable because there are no longer race conditions that could potentially cause a BSoD. Not to mention that just simply calling NtWriteVirtualMemory is much faster and much less awkward than abusing multiple WNF-related syscalls to achieve the same result.
With a robust arbitrary read/write primitive in place, the exploit can finally do its thing and proceed to steal the SYSTEM token. Using the leaked _EPROCESS address from before, it finds _EPROCESS.ActiveProcessLinks, which leads to a linked list of other _EPROCESS structures. It iterates over this list until it finds the System process. Then it reads System’s _EPROCESS.Token and assigns this value (with some of the RefCnt bits masked out) to its own _EPROCESS structure. Finally, the exploit also turns off some mitigation flags in _EPROCESS.MitigationFlags.
Now, the exploit has successfully elevated privileges and can pass control to the other shellcode, which was designed to load Magniber ransomware. But before it does that, the exploit performs many cleanup actions that are necessary to avoid blue screening later on. It iterates over WNF-related structures using TemporaryNamesList from _EPROCESS.WnfContext and fixes all the _WNF_NAME_INSTANCE structures that got overflown into at the beginning of the exploit. It also attempts to fix the _POOL_HEADER of the overflown _WNF_STATE_DATA chunks. Finally, the exploit gets rid of both read/write primitives by setting _KTHREAD.PreviousMode back to UserMode=1 and using one last NtUpdateWnfStateData syscall to restore the corrupted StateData pointer back to its original value.
Fixups performed on previously corrupted _WNF_NAME_INSTANCE structures
Final Thoughts
If this isn’t the first time you’re hearing about Magnitude, you might have noticed that it often exploits vulnerabilities that were previously weaponized by APT groups, who used them as zero-days in the wild. To name a few recent examples, CVE-2021-31956 was exploited by PuzzleMaker, CVE-2021-26411 was used in a high-profile attack targeting security researchers, CVE-2020-0986 was abused in Operation Powerfall, and CVE-2019-1367 was reported to be exploited in the wild by an undisclosed threat actor (who might be DarkHotel APT according to Qihoo 360). The fact that the attackers behind Magnitude are so successful in reproducing complex exploits with no public PoCs could lead to some suspicion that they have somehow obtained under-the-counter access to private zero-day exploit samples. After all, we don’t know much about the attackers, but we do know that they are skilled exploit developers, and perhaps Magnitude is not their only source of income. But before we jump to any conclusions, we should mention that there are other, more plausible explanations for why they should prioritize vulnerabilities that were once exploited as zero-days. First, APT groups usually know what they are doing[citation needed]. If an APT group decides that a vulnerability is worth exploiting in the wild, that generally means that the vulnerability is reliably weaponizable. In a way, the attackers behind Magnitude could abuse this to let the APT groups do the hard work of selecting high-quality vulnerabilities for them. Second, zero-days in the wild usually attract a lot of research attention, which means that there are often detailed writeups that analyze the vulnerability’s root cause and speculate about how it could get exploited. These writeups make exploit development a lot easier compared to more obscure vulnerabilities which attracted only a limited amount of research.
As we’ve shown in this blog post, both Magnitude and Underminer managed to successfully develop exploit chains for Chromium on Windows. However, none of the exploit chains were particularly successful in terms of the number of exploited victims. So what does this mean for the future of exploit kits? We believe that unless some new, hard-to-patch vulnerability comes up, exploit kits are not something that the average Google Chrome user should have to worry about much. After all, it has to be acknowledged that Google does a great job at patching and reducing the browser’s attack surface. Unfortunately, the same cannot be said for all other Chromium-based browsers. We found that a big portion of those that we protected from Underminer were running Chromium forks that were months (or even years) behind on patching. Because of this, we recommend avoiding Chromium forks that are slow in applying security patches from the upstream. Also note that some Chromium forks might have vulnerabilities in their own custom codebase. But as long as the number of users running the vulnerable forks is relatively low, exploit kit developers will probably not even bother with implementing exploits specific just for them.
Finally, we should also mention that it is not entirely impossible for exploit kits to attack using zero-day or n-day exploits. If that were to happen, the attackers would probably carry out a massive burst of malvertising or watering hole campaigns. In such a scenario, even regular Google Chrome users would be at risk. The damage done by such an attack could be enormous, depending on the reaction time of browser developers, ad networks, security companies, LEAs, and other concerned parties. There are basically three ways that the attackers could get their hands on a zero-day exploit: they could either buy it, discover it themselves, or discover it being used by some other threat actor. Fortunately, using some simple math we can see that the campaign would have to be very successful if the attackers wanted to recover the cost of the zero-day, which is likely to discourage most of them. Regarding n-day exploitation, it all boils down to a race if the attackers can develop a working exploit sooner than a patch gets written and rolled out to the end users. It’s a hard race to win for the attackers, but it has been won before. We know of at least twocases when an n-day exploit working against the latest Google Chrome version was dumped on GitHub (this probably doesn’t need to be written down, but dumping such exploits on GitHub is not a very bright idea). Fortunately, these were just renderer exploits and there were no accompanying sandbox escape exploits (which would be needed for full weaponization). But if it is possible to win the race for one exploit, it’s not unthinkable that an attacker could win it for two exploits at the same time.