Normal view

There are new articles available, click to refresh the page.
Before yesterdaySentinelLabs

Radare2 Power Ups | Delivering Faster macOS Malware Analysis With r2 Customization

31 May 2023 at 13:55

In previous posts, we’ve explored how analysts can use radare2 (aka r2) for macOS malware triage, work around anti-analysis tricks, decrypt encrypted strings, and generate function signatures and YARA rules. Like most reversing tools, radare2 can be customized and extended to increase the analyst’s productivity and make analysis and triage much faster.

In this fifth post in the series, we look at some effective ways to power up r2, providing practical examples to get you started on the path to making radare2 even more productive for macOS malware analysis. We’ll cover automation and customization via aliases, macros and functions. Along the way, we’ll also explore how we can effectively implement binary and function diffing with radare2.

Power Up Your .radare2rc Config File With Aliases & Macros

Just as most shells have a “read command” config file (e.g., .bashrc, .zshrc), so r2 has a ~/.radare2rc file in which you can define environment variables, aliases and macros. This file doesn’t exist by default so you need to create it when you make your first customizations.

It’s often said that one of the obstacles to adopting r2 is the steep learning curve, a large part of which is getting muscle-memory familiar with r2’s cryptic commands. One very fast way to flatten that curve is to define macros and aliases for new commands as you learn them – naming any hard-to-remember native commands with your own labels.

Aliases and macros are also useful for chaining oft-used commands together. If you find yourself always running the same commands as your work through your initial triage of a sample, you can save yourself some time and typing by combining those commands into one or more aliases or macros.

An r2 customization to find the entrypoint of x86 dylibs
An r2 customization to find the entrypoint of x86 dylibs

We will look at some useful examples below, but first let’s understand the syntax for aliases and macros.

An alias is defined with a name prefixed by a $ sign, an = operator, and a value in single quotes. Values can be one or more commands, separated by a semi-colon. For example, if you struggle to remember r2’s rather cryptic command names, you could replace them with more memorable command names of your own. Create a file at ~/.radare2rc, add the following line and then save the file.

$libs=’il’

Start a new r2 session. Now, typing $libs at the r2 prompt will run the il command. You can still use il directly as well – as the name suggests, aliases are just alternative names, not replacements, for existing commands.

The $libs macro prints out the linked dynamic libraries in an executable file
The $libs alias prints out the linked dynamic libraries in an executable file

From the Official Radare2 book, we learn that macros are written inside parentheses with each command separated by a semi-colon. The first item in the list is the macro name. By way of example, rather than having a $libs alias, why not print out sections and linked libraries at the same time? This example would do just that:

(secs; iS; il)

Macros are called with the syntax .(macro) like so:

Calling a macro in r2 to print out a binary’s sections and linked libraries
Calling a macro in r2 to print out a binary’s sections and linked libraries

It’s easy to see how you can build on this idea. I use a macro called .(meta) to give me all the basic info about a file’s structure as soon as I’ve loaded it into radare2.

Get all the info you need about a file with the meta macro
Get all the info you need about a file with the meta macro

This macro provides the file hashes in various algos, the compiled language, file size, sections, section entropy and the load commands. If the file under analysis is UPX packed, it will also indicate that, and if the source code is Go it displays the Go Build ID string. The macro is defined as follows, feel free to adopt or adapt it for your needs:

(meta; it; i~file; i~class; i~arch; i~lang; rh; iS md5,entropy; ih~cmd~!cmdsize; il; izz | grep -e Go\ build\ ID -we upx;)

Within the .(meta) macro, notice the command sequence ih~cmd~!cmdsize. This warrants a little explanation. Readers of our previous posts on r2 and macOS malware may recall that the tilde is r2’s internal grep function. The tilde followed by an exclamation mark ~!<expression> filters out the given expression, equivalent to grep -v. You can see the difference in the following image.

Filtering wanted and unwanted information with r2’s ~ command
Filtering wanted and unwanted information with r2’s ~ command

Moreover, note that the .(meta) macro calls out to the system grep utility as well. The ability to utilize any command line utility on the system from within r2 is one of its major advantages over other reversing platforms.

Passing Arguments to radare2 Macros

Many of the things you can do with macros you could also do with Aliases, and vice versa; it’s largely a matter of personal preference. However, note that macros have one neat superpower – you can pass arguments to them.

Here’s a good example: r2 has a command for diffing or comparing code within a sample, either as hex or disassembly (cc and ccd). For some reason (I’m sure there’s a perfectly good one), this function counterintuitively displays the output from the first address given to the right of the output from the second address given. We can ‘correct’ this with a macro that takes the addresses as arguments but swaps their order when it passes them to cc.

(diffs x y; cc $1 @ $0)
The cc command places the output of the first address to the right of the second address. The .(diffs) macro fixes this
The cc command places the output of the first address to the right of the second address. The .(diffs) macro fixes this

Incidentally, the cc command (or our reimplementation of it in a macro) can be very useful for finding common code within samples when writing YARA or other hunting rules, a topic we’ll discuss a bit further below.

Finding IP Address Patterns and Other Useful Artifacts

To find IP address patterns and other useful artifacts in a binary, you can create macros with search regexes.

Here’s a few examples to get your started.

Find IP Address Patterns:

(ip; /e /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/)
A sample of Atomic Stealer quickly gives up its C2 with the help of the .(ip) macro
A sample of Atomic Stealer quickly gives up its C2 with the help of the .(ip) macro

Find Interesting Strings

Search for places where an executable gathers user and local environment information.

(reg; /e /home/i; /e /getenv/i; /e /Users/)

You can automate different searches for XOR instructions with the following r2 macro:

(xor ;  f~xor | sort -k 2 -n; /e /xor byte/i; izz~+xor)
The LockBit for Mac ransomware uses an XOR key of 0x39
The LockBit for Mac ransomware uses an XOR key of 0x39

Testing a File Against Local YARA Rules

For the following two macros, you will need YARA installed locally on the host. This can be done with MacPorts, Homebrew or by installing from Github and following the instructions here.

With YARA installed, it is easy to call it from within r2 to see if a rule you’ve created for a sample will fire. This is a great way to develop and test rules on the fly as you triage new samples.

On my analysis machines, I have my rules stored in a subdirectory of /usr/local/bin, so my macro looks like this:

(yara; !yara -s /usr/local/bin/scan_machos/myyara.yara `o.`)

As yara is an external command, it is prefixed by an exclamation point !. This is how to tell the r2 shell that we want to call an external command line utility, a very useful feature that allows you to bring in all the power of the command line utilities at your disposal directly into r2. The -s option allows us to see which strings hit (and how many times). See man yara for more options. The `o.` command at the end of the macro is an r2 command that returns the file name of the currently loaded binary.

A simple YARA rule to detect Geacon samples called from the r2 command line
A simple YARA rule to detect Geacon samples called from the r2 command line

Since Apple’s own built-in malware blocking tool XProtect also uses YARA rules, you can create a macro to see whether Apple has a rule for your sample. To create an .(xp) macro to check files against Apple’s XProtect database signatures file (remember: YARA must be installed first), use the following macro:

(xp; !yara -w /Library/Apple/System/Library/CoreServices/XProtect.bundle/Contents/Resources/XProtect.yara `o.`)

Don’t be surprised, however, if you don’t get many matches: XProtect’s YARA signature database is thin at best.

Print Your Customizations when radare2 Starts Up

By now, you might be starting to collect quite a list of macros and aliases. How to remember them all? There’s a couple of built-in ways, and we’ll also look at one last .radare2rc customization to help us out with this, too.

From within, r2 you can see all defined aliases and macros by typing $* and (*, respectively.

Printing out aliases and macros with their values
Printing out aliases and macros with their values

We can also have r2 print our entire config file when it starts up by adding a further customization. At the end of the .radare2rc file, try something like this:

echo ENV: ; !cat -v /Users//.radare2rc | sed -e '$ d'; echo;

The sed command after the pipe prevents the last line of the file from being printed – an optional customization you can ignore if you wish. You could also just add the $* and (* commands above to the config file instead, but I like to see the whole file as a reminder of the entire environment.

It can be helpful to automatically print the entire config file out as r2 starts up
It can be helpful to automatically print the entire config file out as r2 starts up

These examples should be enough to get you started creating useful aliases and macros to help speed along your own analysis.

How to Diff Binaries and Binary Functions with radare2

Aliases and macros are useful shortcuts – the command line equivalent to GUI apps’ hotkeys and key chords – but there are other, more powerful ways we can customize radare2 and drive it with custom functions and scripts.

As an example, let’s add the following function to our shell config file (e.g., ~/.zshrc or ~/.bashrc):

rfunc() {
  radiff2 -AC -t 100 $1 $2 2> /dev/null | egrep --color "\bUNMATCH\b|$"
}

This leverages a radare2 tool called radiff2. This tool (among a bunch of others) is installed as part of the radare2 suite. With the function added to our shell config, we’ll start a new Terminal session and call the function directly from the command line rather than from within r2.

$ rfunc file1 file2

The rfunc() function tells us which functions match, which do not, and which are new between any two given binaries. Here’s part of the output from two very different variants of Atomic Stealer:

Two variants of Atomic Stealer. The sendlog function exfiltrates user data
Two variants of Atomic Stealer. The sendlog function exfiltrates user data

To get a graphical output of how two functions differ, let’s begin by using radiff2 directly. This utility has many options and we’ll only explore a few here, but it is well worth digging into deeper.

You can compare two functions or offset addresses in two binaries with the following syntax:

$ radiff2 -g offset1,offset2 file1 file2

Or, in case both binaries use the same function name, e.g., sym._main.sendlog in our example above, you can simply provide the function name instead of the addresses:

$ radiff2 -g <function_name> file1, file2

In this example, I’ll compare the main function of two samples of Genieo adware.

Genieo samples of varying sizes
Genieo samples of varying sizes

As shown in the image above, the files are quite different sizes.

$ radiff2 -g main a1219451eacd57f5ca0165681262478d4b4f829a7f7732f75884d06c2287ef6a 80573de5d79f580c32b43c82b59fbf445b91d6e106b3a4f2f67f2a84f4944433
Partial output of radiff2’s graphical diff engine
Partial output of radiff2’s graphical diff engine

However, the output shows us that the main functions are structured identically and differ only in terms of offset addresses and certain hard coded values. This kind of information is extremely helpful for creating effective signatures for a malware family.

As radiff2 outputs to the Terminal, display can sometimes be tricky. It’s possible to leverage Graphviz and the dot and xdot utilities to produce more readable graphs. Though a deep dive into Graphviz takes us beyond the scope of this post, try installing xdot from brew install xdot and playing around with options such as these:

$ radiff2 -md -g <function_name> file1 file2 | xdot -

As xdot is Python based, I’ve found it can sometimes be temperamental when it comes to escaping strings passed from radiff2 and occassionally spits out “unknown op code” errors. When this happens, one of a few ways you can sidestep xdot and Python is as follows:

$ radiff2 -md -g <function_name> file1 file2 > main.dot
$ dot -Tpng main.dot -o main.png
$ open main.png

These can produce graphical diffs such as the following:


Of course, once you hit on one or more graph workflows that work for you, you can then add them as functions to your shell config file for maximum convenience. Here’s an example:

rdiff () {
	if [ "$#" -eq 4 ]
	then
		radiff2 -A -md -g -t 100 $1,$2 $3 $4 2> /dev/null | tail -n +28 | sed 's/fillcolor="lightgray"/fillcolor="lightblue"/g' | sed 's/fillcolor="yellow",color="black"/fillcolor="#F4C2C2",color="lightgray"/g' | sed 's/"Courier"/"Poppins"/g' | sed 's/color="black"/color="lightgray"/g' | xdot -
	elif [ "$#" -eq 3 ]
	then
		radiff2 -A -md -g -t 100 $1 $2 $3 2> /dev/null | tail -n +28 | sed 's/fillcolor="lightgray"/fillcolor="lightblue"/g' | sed 's/fillcolor="yellow",color="black"/fillcolor="#F4C2C2",color="lightgray"/g' | sed 's/"Courier"/"Poppins"/g' | sed 's/color="black"/color="lightgray"/g' | xdot -
	else
		echo "Wrong number of arguments supplied."
	fi
}

This function allows you to specify either three args (a function name, and two filepaths) or four (two offsets, two filepaths) – beware there’s minimal error checking. Two other things of note: via the -A option, radiff2 passes the files to r2 for analysis. This can improve radiff2‘s diffing output. However, recall that our earlier customization has r2 print out our config file when it runs. We don’t want this output passed to xdot (or dot) or it will cause errors. In my case, my .radare2rc file is 27 lines long, so I use tail -n +28 to start printing from the 28th line. That number will need to be adjusted for the length of your own .radare2rc config file, and you’ll need to remember to adjust the function if you later edit the config file such that it changes length either way. Secondly, note the series of sed commands. These are a quick and dirty way to alter the default colors of the output, so adjust or remove to your liking.

Conclusion

In this post we’ve seen how we can power up radare2 by means of aliases, macros and functions. We’ve learned how these shortcuts and automations can allow us to make r2 easier and more productive to use.

That’s not all there is to powering up radare2, however, as we have yet to explore driving radare2 with scripts via r2pipe to do deeper analysis, decrypt strings and other advanced functions. We cover that in the next post, and if you didn’t already, check out our earlier posts on radare2 as well!

Automating String Decryption and Other Reverse Engineering Tasks in radare2 With r2pipe

21 June 2023 at 13:52

In the previous post in this series, we looked at powering up radare2 with aliases and macros to make our work more productive, but sometimes we need the ability to automate more complex tasks, extend our analyses by bringing in other tools, or process files in batches. Most reverse engineering platforms have some kind of scripting engine to help achieve this kind of heavy lifting and radare2 does, too. In this post, we’ll learn how to drive radare2 with r2pipe and tackle three different challenges that are common to RE automation: decrypting strings, applying comments, and processing files in batches.

Scripting radare2 with C, Go, Swift, Perl, Python, Ruby…

No matter what language you’re most comfortable working in, there’s a good chance that r2pipe supports it. There are 22 supported languages, though they are not all supported equally.

Programming languages supported by radare2’s r2pipe
Programming languages supported by radare2’s r2pipe

C, NodeJS, Python and Swift are the most well-supported languages, but I tend to use Go for speed and brevity, and it lets me hack scripts together rather haphazardly to achieve what I need. When scripting your own reversing sessions, there’s little need to worry about the niceties of programming style or convention as we would do when shipping code for production or other purposes. Although performance can be improved by doing things in one language rather than another, that’s something I rarely need to worry about in practice in my reversing work.

All that’s a preamble to saying that you can – and probably should! – write better scripts than those I’ll show here, but these examples will serve as a good introduction to how you can easily hack your way around problems thanks to r2’s shell integration to get a working solution without worrying too much about “the right” or “the best” way to do it.

Automated String Decryption in OSX.Fairytale

We’ll use a sample of OSX.Fairytale to illustrate automated string decryption. Though I’ll be using Go, you can easily apply the same techniques in whatever other language you prefer.

Like many simple malware families, Fairytale encrypts strings with a combination of base64 and a hard coded XOR key. In this case, the XOR key is 0x30.

OSX.Fairytale uses 0x30 as a hard coded key for XOR decryption
OSX.Fairytale uses 0x30 as a hard coded key for XOR decryption

Once we have determined the XOR key, there’s various simple ways to decrypt a given string or even the whole binary (e.g., cyberchef, or writing your own decryption function), but our eventual aim is to add comments to the disassembly (as well as learn a few useful tricks), so we’ll take a different approach.

Note that radare2 comes with a useful little tool called rahash2 , which among other things, can decrypt strings. Here’s an example you can run on the command line:

% rahash2 -D base64 -s 'H1JZXh9cUUVeU1hTRFw=' | rahash2 -D xor -S 0x30 -
/bin/launchctl%

As we discussed in the previous post, we could easily make this into a function in our .zshrc file. However, one drawback with that approach is r2 won’t let us call such functions from the r2 prompt. We can solve that by creating a standalone executable and saving it in our path, like so:

#!/bin/zsh
if [ "$#" -eq 2 ]; then
	echo $(rahash2 -D xor -S $1 -s $2)
elif [ "$#" -eq 3 ]; then
	echo $(rahash2 -D base64 -s $3 | rahash2 -D xor -S $2 -)
elif [ "$#" -eq 1 ]; then
	echo "
		  # USAGE:
			# rxorb
			# rxorb 0x30 "\|YRBQBI"
			# Use '-b' to base64 decode the string before the xor
			# rxorb -b 0x30 FXAffFlSQlFCSR98UUVeU1hxV1VeREMfFXAeQFxZQ0Q=
		"
else
	echo "INPUT ERROR, type 'rxorb help' for help."
fi

Saving this as /usr/local/bin/rxorb and giving it executable permissions (e.g., via chmod +X) will now make this available to us both on the command line and from within r2, once we open a new shell and new r2 session.

Calling rxorb from within r2 to decrypt individual strings
Calling rxorb from within r2 to decrypt individual strings

Great, we now have a general string decryption tool that we can feed a string, a key and cipher text and we are able to specify whether the cipher needs to be base64 decoded before being XOR’d with the given key. This alone will take care of a lot of use cases!

However, while this works well for manual decryption, it becomes tedious for anything more than a few strings. What would be much better is if we could simply type one command that would iterate over encrypted strings in the binary and either print out all the decrypted text or comment the code where the string is referenced. Ideally, our solution should give us the option to do both.

Let’s see how we can implement that by leveraging radare2’s scripting engine, r2pipe (aka r2p).

Building the Script

We’ll call the Go program “decode.go”, and the first part of it requires importing the r2pipe package from github.

package main                                            
import (
  "fmt"
  "github.com/radareorg/r2pipe-go"
)

var r2p, _ = r2pipe.NewPipe("") 	// Declare r2p as a global

func check(err error) {
     if err != nil {
	panic(err)
     }
}

After the imports, we declare a global variable r2p, which provides a pipe to the r2 instance when we call it from within an r2 session. This global will allow us to send and receive commands to the r2 session. We also implement a generic error function for use throughout the code.

Next, we’ll implement a decrypt function. We could (and probably should) write a native version of this, but since we already have a decrypt function using rahash2 above, we’ll reuse that. This will also allow us to see and solve some other common challenges we might face in other scenarios.

func decryptStrAtLoc(loc string, key string) {
     bytes := fmt.Sprintf("ps @ %s", loc) 		// [1]  
     str, err := r2p.Cmd(bytes)
     check(err)
     decodeCmd := fmt.Sprintf("!rxorb -b %s %s > /tmp/rxorb.txt", key, str) // [2]
     r2p.Cmd(decodeCmd)
} 

The decryptStrAtLoc() function does most of the work in our program. As parameters, it takes an address and the XOR key. We’ve chosen not to return the decrypted string to the caller but instead consume it within the function. We’ll see why shortly.

For each command we want to pass to the r2 session, we first format the command as a string, then pass the command to r2p. Thus, [1] formats a command that returns the bytes at the current address as a string. At [2], we format a command that decodes the string by passing it to the rxorb utility we wrote earlier.

As r2pipe’s Go implementation doesn’t support easy capture of stderr and stdout, we write this to a temporary file, which we’ll consume in the next part of the code. Had we chosen to implement the XOR decryption natively in our code, we could have avoided that, but seeing how to deal with stdout when using r2pipe and Go is a useful exercise for other scripts.

func writeCommentAtLoc(loc string) {
     readCmd := fmt.Sprintf("CCu `!cat -v /tmp/rxorb.txt | sed 's/\\(.*\\)/\"\\1\"/g'` @ %s", loc)    
     r2p.Cmd(readCmd)                                  
}

Our decoded string is now sitting in a file in /tmp. In the function above we do two things with one command: we read the string into a buffer and we write it out as a comment at the disassembly address in the file under analysis. The sed code is another work around for wrapping the string in quotes so that any special characters in the string do not get interpreted by the r2 shell when we pass it back.

func printCommentAtLoc(loc string) {
     pdCmd := fmt.Sprintf("pd 1 @ %s", loc)   // [3]
     pdStr, _ := r2p.Cmd(pdCmd)
     fmt.Println(pdStr)
}

We next implement a function that will print out the disassembly along with the commented string to the r2 prompt. At [3], the “pd 1” command tells r2 to print one line of disassembly from the given address.

Finally, we implement our main() function that will call all this code as well as handle cleaning up the temporary file now that we’re done.

func main() {
     key := "0x30"
     addr, err := r2p.Cmd("s") 			// [4] 's' = return current address
     check(err)
     decryptStrAtLoc(addr, key)
     writeCommentAtLoc(addr)
     printCommentAtLoc(addr)

     delCmd := fmt.Sprintf("!rm /tmp/rxorb.txt")  // clean up the temp file
     r2p.Cmd(delCmd)
     if err != nil {
     	 fmt.Println(err)
     }
     defer r2p.Close()
}

Note that at [4], due to the simplicity of the command, we just supplied the command directly to r2p.Cmd rather than format a separate string. The entire script can be found here.

Using the Script

To use the script, build the decode.go program and take a note of the output path. Open an r2 session with the target binary and at the prompt type:

#!pipe /usr/local/bin/godec/decode # change the path to suit

If you hit return now, you’ll likely see an error and then some disassembly.

The script returns an error from sed
The script returns an error from sed

That’s because we have executed the script while located at an address that does not contain any strings to consume. Let’s find an encrypted string and try again. The r2 command izz~== will output any strings in the binary that contain “==” – a common padding for base64-encoded strings.

 Executing izz~== at the r2 prompt
Executing izz~== at the r2 prompt

Let’s seek to location 0x100016bdb to test our decryption program.

We can see that our decoder has appended a comment containing the decrypted string, which looks like the beginning of a LaunchAgent or LaunchDaemon plist. Great! Let’s try again, this time feeding it all the strings that contain “==” in one go. Try this:

#!pipe /usr/local/bin/godec/decode @@=`izz~==[2]`

Here’s an example of the output:

At this point, since the #!pipe command is awkward to remember and type out every time, you might want to create an alias and/or macro for that.

$dec=#!pipe /usr/local/bin/godec/decode
(script x;  #!pipe $0)

The $dec alias allows us to call this particular script easily, while the script macro allows us to pass in any script path as an argument to the #!pipe command.

Note that we didn’t decode all encrypted strings in the binary. We could iterate over all strings (including non-encrypted ones) with something like $dec @@=`izz~cstring` but that will lead to errors. The right way to approach this would be to add code to our program that determines whether the string at the current address is a valid base64 encoded string or not. We’ll leave that as an exercise for the reader.

Our script could also do with some other improvements: passing the key as an argument would make it more reusable, and of course, there are many points where we lazily use r2 to shell out rather than using Go’s own os package, but for now, this simple script will handle the job it was intended for and is simple to repurpose or build on.

Running a Script Without an Interactive radare2 Prompt

Sometimes you just need to run a script and get the results without needing an interactive r2 prompt. You can tell r2 to execute a script on a binary, either before or after loading the binary, with the -i and -I flags, respectively. The -q option will tell r2 to quit after running the script.

r2 -Iq <script file> <binary>

You can also do the same thing with commands, aliases and macros directly without using a script, using the -c option. For example, this will print out the result of the meta macro without leaving you in an r2 session:

r2 -qc ".(meta)" /bin/ls

Batch Processing Files with a radare2 Script

If you want to process a number of files without having to start an r2 session for each one, you can pass the file path to your script as an argument when you call r2pipe as follows:

func main() {
	args := os.Args
	if len(args) < 2 || len(args) > 2 {
		fmt.Printf("Usage: Provide path to a binary.")
		os.Exit(1)
	}

	argPath := os.Args[1]
	r2p, err := r2pipe.NewPipe(argPath)
	check(err)
	defer r2p.Close()
	r2p.Cmd("aaa") // run analysis
 	
	// do your stuff
	// write results to file or stdout
}

You can now process all files in a folder from the command line with something like:

% for i in ./*; do my_r2pipe_script $i; done 

Conclusion

In this post, we’ve learned a number of useful skills. We’ve seen how to automate tasks like grabbing disassembly, adding comments, and decoding strings, and we have navigated some of the complexities of dealing with stdout when using Go to drive r2pipe.

We’ve looked at how to pass file paths as arguments and how to run scripts, commands and macros without opening an interactive radare2 session. With a good understanding of the r2 commands explored throughout this series, you should now be able to readily adapt these skills to other automation tasks.

References and Further Reading

R2pipe – The Official Radare2 Book
Radare2-r2pipe-api repository
Radare2 Python Scripting
Automating RE Using r2pipe
Decrypting Mirai configuration With radare2
Running r2Pipe Python in batch
Scripting r2 with Pipes

Bloated Binaries | How to Detect and Analyze Large macOS Malware Files

29 August 2023 at 13:48

It wasn’t so long ago that malware authors, much like software developers, were concerned about the size of their code, aiming to keep it as small and compact as possible. Small binaries are less noticeable and can be slipped inside other files or shipped in benign code, attachments and even images. Smaller executables take up less space on disk, are faster to transfer over the wire, and – if they’re written efficiently – can execute their malicious instructions with less tax on the host CPU. In days of small disk drives, slow network connections and underpowered chips, such concerns made good sense and helped malware to avoid detection.

In today’s computer environments, however, storage, bandwidth and processor power are rarely in short supply, and as a result both legitimate programs and malware have increased greatly in size.

While malware executables of several megabytes are now so common they are hardly worthy of mention, some recent malicious programs have taken the invitation to bloat to a new extreme. Malware binaries weighing in at 50MB or more are now widely in use by macOS malware authors, and binaries over 100MB can also be found in some campaigns, typically those involving cryptominers. Such massive file sizes can cause detection problems for some kinds of AV solutions and create triage and reversing challenges for malware analysts.

In this post, we dig into the phenomenon of massive malware binaries on macOS, explaining why they are becoming more common, the problems they cause for detection and analysis, and how defenders can successfully deal with them.

How Widespread are Large macOS Malware Binaries?

It is possible to get a feel for how common large malicious binaries are by hunting in public malware repositories like VirusTotal and filtering by size. For example, if we search for Mach-O binaries over 35MB recognized as malware by 5 or more vendors, the search today returns 524 hits.

Increasing the file size to 50MB or more returns 113 hits, with many of the files returned being samples of Atomic Stealer.

Malicious mach-O files over 50Mb (Source: VirusTotal)
Malicious mach-O files over 50Mb (Source: VirusTotal)

Around 7 samples in the 75MB and 100MB size range are examples of OSX.EvilQuest malware. Adjusting our search for file sizes of 100MB returns over 20 files with five or more vendors detecting as malware; many of these are miners, including a coinminer executable weighing in at 345 MB.

A macOS malware executable over 300MB (Source: VirusTotal)
A macOS malware executable over 300MB (Source: VirusTotal)

However, the problem is wider than just those files that vendors currently recognize as malware. Both detection solutions and analysts have to determine whether an unknown sample is suspicious or malicious, and if we look at the number of Mach-O binaries on VT in general that are over 35MB, we find almost 100,000 samples, with the number of samples over 100MB currently at almost 50,000.

(Source: VirusTotal)

We can even find a single Mach-O binary on VirusTotal with a file size of 600MB. Are there individual binaries larger than that? Almost certainly, but VirusTotal has a file size upload limit of 650MB, so above that we have a data blindspot for both legitimate and malicious files.

From the data we do have, it is clear large executables are a widespread phenomenon, but why are threat actors turning to bloated binaries and what problems do they cause for enterprise security?

Why Are Threat Actors Turning to Supersized Binaries?

There are a number of reasons why threat actors may choose to distribute malware in oversized binaries. Some large binaries such as cryptominers like BirdMiner (aka LoudMiner) are a result of bundling emulation environments such as QEMU in the malware.

Samples of LoudMiner containing the Linux QEMU emulation environment
Samples of LoudMiner containing the Linux QEMU emulation environment

Other large binaries are caused by using cross-platform programming languages like Go and Rust. In order to ensure these programs will run on the intended platform, the runtime, libraries and all other dependencies are compiled into the final payload.

In addition, Apple’s switch to ARM from Intel has resurrected the Universal/FAT binary format, in which two architectures are now compiled into a single binary to ensure that the same program will work regardless of whether the user runs it on an Intel Mac or an Apple silicon Mac. Any binary compiled into the Universal format is effectively doubled in size.

As we shall see in the next section, in some cases threat actors may simply bloat files with junk code to defeat file scanners with file size limits or to thwart analysis by malware researchers.

What Problems Do Outsized Binaries Cause For Detection and Analysis?

Massive individual binaries are a relatively recent phenomenon and they cause a headache for traditional AV scanners that rely on either computing a file’s hash or scanning it for malicious content. The larger the binary the longer it takes to scan, and when scanning across numerous files on a file system, the end result can be a sluggish, unresponsive system as the AV software increasingly hogs the host CPU to complete its task.

The performance problems associated with file scanning are historically one of the most oft-cited reasons for complaints from users and something that the industry has attempted to solve in various ways.

One typical solution employed by many AV scanners is to limit the maximum file size the scanner will accept. In the days when few legitimate programs reached more than 20MB that may have seemed like an acceptable compromise, but given today’s bloated binaries, that’s clearly no longer viable: it would mean that a lot of known malware would go undetected. Threat actors have even been known to bloat files with junk code precisely to defeat file size limits of scanners and malware repositories like VirusTotal, which as we noted above has a max file size upload limit of 650MB.

Massive files are not just a problem for detection software, but also for researchers, reverse engineers and malware analysts. With tens of megabytes of code to analyze, most of which is benign, junk or part of a standard runtime like Go, analysts can have a difficult time identifying exactly which parts of a binary are malicious. This can hamper efforts to find other, possibly undetected, malware samples using the same or similar code and allow threat actors to extend their campaigns without detection.

How to Detect Malware Hidden Inside Massive Binaries

Fortunately, there are solutions to the problem of massive binaries both for detection and analysis. The problems inherent in relying solely on file scanning have been well understood by vendors such as SentinelOne and were part of the paradigm shift that caused such solutions to adopt behavioral detection.

In contrast to a file scanning engine, a behavioral engine examines what a binary does when it is executed rather than examining the file’s content prior to execution. A behavioral approach allows a solution to avoid scanning large amounts of files or files of large sizes and instead determines whether an execution process is involved in malicious activity. Solutions like SentinelOne can thus detect and kill malware regardless of how it is packaged or how large the file is.

Security software that combines multiple detection mechanisms including behavioral and machine learning detection engines is now the standard for enterprise security.


SentinelOne’s Behavioral Engine Detecting Atomic Stealer
SentinelOne’s Behavioral Engine Detecting Atomic Stealer

How to Analyze Large macOS Malware Binaries

Large binaries present malware analysts with a number of challenges. In this section, we will briefly describe a useful technique for finding interesting code among hundreds of thousands of lines of disassembly leveraging YARA and radare2.

Threat hunters are most familiar with using YARA to determine if a sample file contains strings or bytes similar to other known malware families, but we can also use the same technique to find interesting code typical of malware TTPs. Take the following YARA rule, for example:

This rule returns a match if the binary contains certain strings related to disabling or modifying tools or other processes on a device, a typical anti-analysis and evasion technique. We can create a list of rules with various TTP indicators to help us to statically determine what capabilities a file has that may be related to malware behavior. Here is another example of a rule to indicate a binary that contains code related to system discovery.

We can run our YARA rule set on a given binary from within a radare2 session and, by leveraging YARA’s -m and -s switches, obtain a list of possible TTPs and their offsets for further investigation.

Possible TTPs of Malware sample 1909e84ac796730b119c44c676a730e09fce5ded
Possible TTPs of Malware sample 1909e84ac796730b119c44c676a730e09fce5ded

In this example we create a radare2 alias to run our YARA TTP ruleset over the file. The alias is equivalent to the command:

yara -ms ttp.yara 

In radare2, the alias can be defined locally within the current r2 session or more usefully as a global alias in the .radare2rc config file as:

(ttp x;  !yara -$0w <path to>/ttp.yara `o.`)

We provide a starter YARA rule set here that other macOS malware analysts can use as a base from which to develop their own more comprehensive ttp.yara file.

A starter rule set for statically detecting macOS malware TTPs
The SentinelLabs starter rule set for statically detecting macOS malware TTPs

Conclusion

Massive binaries are becoming increasingly common on the macOS platform and defenders need strategies for dealing with them. Malware authors have embraced the idea of distributing huge binaries in part as a tactic for defense evasion and anti-analysis and in part as a result of turning to cross-platform languages that pack a runtime, library and other dependencies in the final payload.

Organizations can detect large malicious binaries by turning to solutions that include behavioral detection and do not rely solely on file scanning. Analysts can implement techniques such as those discussed above to help them triage massive macOS malware samples faster and more efficiently.

YARA Rule set

https://github.com/SentineLabs/macos-ttps-yara

11 Ways to Tweak radare2 for Faster and Easier macOS Malware Analysis

31 October 2023 at 15:08

Our recent eBook on how to use radare2 (r2) for macOS malware analysis focused on providing analysts with a series of guided use cases for typical tasks like string decryption, anti-evasion and automation. Aimed at those seeking to power-up their macOS malware analysis skills, the guide contains lots of tips on using r2, but mostly focuses on working through malware samples exemplifying typical challenges.

In this post, somewhat inspired by a similar post on Ghidra, we look at lowering the learning curve and supercharging productivity for those new to or recently converted to using the r2 platform. While the default settings in r2 may be fine for basic reverse engineering, there is a lot of simple customization we can and should do for a better malware analysis workflow.

Explore and Change the Default Theme

Environment is everything when you need to concentrate and focus, and nothing contributes to this more than the UI appearance and theme. Fortunately, r2 comes packed with a bunch of themes built in which can also be customized, so you don’t need to worry about downloading or installing third-party plugins or code.

First, we’ll see how to explore the available themes, then we’ll see how to set that as the default theme for every launch.

On the r2 command line, type eco , then a space, then tab. You’ll see a list of the built-in theme names.

r2 themes
r2 themes

Explore how the different themes look by typing the name of the theme after eco , hitting return, then executing pdf, x, or V to see how it looks. Rinse and repeat till you find one that you like the look of.

eco monokai; pdf; x
r2's monokai theme
r2’s monokai theme

Once you have your chosen theme, the next step is to make it the default theme. Exit r2 or open a separate Terminal window and use the following command line to create or append the config file at the default location ~/.radare2rc. I used ‘smyck’ here, but change to suit your preference.

cd; echo eco smyck >> .radare2rc

After executing the command, quit and restart r2 to see the change. The prompt can be customized within the chosen theme. Play around with different foreground / background color combinations with variations of:

ec prompt white green
ec prompt cyan darkgray

Turn Off the Jokes!

You may or may not enjoy the “fortune cookies” that appear on each launch of radare2. Some can be funny, others less so, depending on your taste. Be wary that if you’re sharing screenshots of your r2 sessions either publicly or privately, the ‘jokes’ may cause offense to others if you inadvertently capture them.

We can turn them off with a simple command added to our config file.

cd; echo e cfg.fortunes=false >> .radare2rc

Turn On (and Off) the Comments!

r2 comes with some built-in help for new reverse engineers or even experienced reversers who are learning a new architecture.

Compare the default display of the pdf command:

r2 comments

You will likely not want comments on all the time, as they can be distracting, but it can be really useful to turn them on when you come across an unfamiliar instruction or operand.

We can add a couple of aliases to our config file that will allow us to use the commands “$conn” and “$coff” to quickly toggle comments. Add the following commands to the .radare2rc file, and restart r2.

$coff='e asm.describe = false'
$conn='e asm.describe = true'

Indent Code Blocks for Better Visibility

radare2 helps reverse engineers to visualize control flow and in a variety of ways, one of which is by allowing the indentation of blocks in the disassembly to show nested code.

By default, this is turned off and all blocks appear at the same tabular offset, as in the example below.

Block indentation off

We can make it easier to quickly visualize the relationship between blocks of code by turning code indent on.

Indentation on

You could make a pair of aliases to toggle this setting as we did with comments, substituting the value ‘true’ with ‘false’, but for my part I never see a need to turn it off, so I just add the following to my config file.

cd; echo e asm.indent=true >> .radare2rc

Make r2’s Help More Helpful

Help in r2 is summoned with the ? command, but it can be tough finding what we need sometimes. It would make life easier if we could easily grep all the help for a search term of interest.

To do so, add the following code to the .radare2rc config file:

(help x; ?*~$0)

Now, restart r2 and load a binary, say /bin/ls for simplicity. Now compare the output of searching for help on the keyword ‘crypto’:

A macro to make searching the help doc easier
A macro to make searching the help doc easier

Our macro is just a shortcut for ? followed by a wildcard and then grepping for our search term, but it’s a lot easier to remember .(help <searchterm>).

Note that for multi-word search terms, you must escape any spaces in the search string.

.(help hexdump\ columns)
Spaces in the search term need to be escaped

Set the Block Size

Block size is the amount of lines r2 prints out with commands like px. By default it’s set to 0x100, but sometimes that’s not enough to see everything of interest.

The block size can be changed within a session on the command line with b <size>, e.g.

b 0x200
Use the previous macro to get more help about block sizes

A simple alias in our config file is useful for printing out extended block size in one shot:

$x='b 0x200; px'

Sort and Search Functions By Size, XREFS & Other Criteria

In radare2, afl and afll are the go-to commands for viewing function information, but we sometimes want to tailor the output for specific items of interest. Here’s a few different ones I use to help me narrow down various bits of code that might be of interest.

The first two have a dependency on another alias, $fcol, which simply prints out the column headings for the subsequent output from afll:

$fcol='afll\~:0'

Top twenty largest functions in the binary:

$top20='clear; $fcol; afll \| sort -k 3 -nr \| head -n 20'

Top twenty functions with the largest number of XREFS:

$topX='clear; $fcol; afll \| sort -k 14 -nr \| head -n 20'

Functions related to swizzling in Objective-C binaries (shout out to LaurieWired’s recent talk for this idea):

$swiz='afl\~exchangeImplement; afl\~getInstanceMethod; afl\~getClassMethod; afl\~setImplementation'

Print out the functions of interest in a Go binary, ignoring the boilerplate imports:

(gafl; afl | grep -v vendor_golang.org | grep -v runtime | grep -e main -e github | sort -k 4 -nr) 

This time we used a macro rather than an alias. Either will work. Note that with the macro, you don’t need to escape special characters like the pipe or tilde symbols.

Print Calls to and From the Current Function

Understanding the relationships between functions is crucial to discovering malicious behaviour and honing in on parts of a binary we want to use for hunting and detection.

To view all the calls to a current function, the r2 command axg will give a nice graphical view all the way back to main. To view the calls a function makes, use pifc.

If we find these obtuse r2 commands difficult to remember, then of course aliases are our friends:

$callee=’axg’
$calls=’pifc’

However, exploring the nuances of ax and pi through ? and our .(help) macro will return dividends.

We can gain a better understanding of the overall structure of a function with the following macro, which prints out a useful summary of information.

(metaf ;  afiq; echo XREFS:; axg; echo INSTR:; afist; pds)

Edit and Test Yara Rules Within radare2

If you have a local YARA file, you can edit it from within r2 from the command line like so:

!vi <path to yara file>

From here, add or adjust existing rules, save and quit out of the text editor, then call it on the currently loaded binary to test the file against the rules:

!yara -fs <path to yara file> `o.`

The r2 command o. serves as a reference to the currently loaded binary and is useful in a wide variety of aliases and macros.

Let’s define an alias and a macro for the above.

$rules=!vi <path to your yara rules file>
(yara x;  !yara -$0w <path to your yara rules> `o.`)

After restarting r2, we can now edit our YARA rules from within r2 with the $rules command. We can call our rules on the currently loaded file with .(yara f).

Try .(yara m) and .(yara s) and note the differences.

Running YARA rules against the loaded sample

Query VirusTotal about the Current Sample

Once you realize how easy it is to call external command line utilities from within an r2 session, multiple possibilities for faster and easier workflows open up.

Perhaps one of the most oft-used tools for malware analysts is VirusTotal. If you have the VT API tool installed and in your PATH, it’s very easy to integrate this with r2. Again, a simple addition to our config file is all that’s needed:

$vt=!vt file `o.` --include=meaningful_name,tags,popular_threat_classification,first_submission_date,last_submission_date

You can modify what to include to suit your preferences per the VT documentation.

Get results from VirusTotal within r2 session

Check Code Signature of Current Sample

One final tip for anyone that struggles to remember all the various ways to check whether a sample has a valid code signature, whether its notarized and whether its been revoked by Apple…put it all in an alias and run it from within r2!

$codesign='izz~Developer ID; !codesign -dvvv -r - `o.`; !spctl -vvvv -a -t execute `o.`'

Conclusion

Working with r2 can be daunting at first, but the platform is built on simplicity. Thanks to its integration with the command line, with a few customizations, radare2 can be quickly turned into a powerful platform for malware analysts. There are also many plugins for radare2 to augment it with various external decompilers, including Ghidra, work with frameworks like Frida, and (of course) work with AI chat bots.

If you enjoyed this post and haven’t yet checked out the ebook, A Security Practitioner’s Guide to Reversing macOS Malware with Radare2, you can find it here. This free PDF resource covers lots of recent macOS malware and walks through example cases of common reversing tasks, all in radare2.

❌
❌