Normal view

There are new articles available, click to refresh the page.
Today — 8 May 2024Vulnerabily Research

Using benchmarks to speed up Echidna

8 May 2024 at 13:30

By Ben Siraphob

During my time as a Trail of Bits associate last summer, I worked on optimizing the performance of Echidna, Trail of Bits’ open-source smart contract fuzzer, written in Haskell. Through extensive use of profilers and other tools, I was able to pinpoint and debug a massive space leak in one of Echidna’s dependencies, hevm. Now that this problem has been fixed, Echidna and hevm can both expect to use several gigabytes less memory on some test cases compared to before.

In this blog post, I’ll show how I used profiling to identify this deep performance issue in hevm and how we fixed it, improving Echidna’s performance.

Overview of Echidna

Suppose we are keeping track of a fixed supply pool. Users can transfer tokens among themselves or burn tokens as needed. A desirable property of this pool might be that supply never grows; it only stays the same or decreases as tokens are transferred or burned. How might we go about ensuring this property holds? We can try to write up some test scenarios or try to prove it by hand… or we can fuzz the code with Echidna!

How Echidna works

Echidna takes in smart contracts and assertions about their behavior that should always be true, both written in Solidity. Then, using information extracted from the contracts themselves, such as method names and constants, Echidna starts generating random transaction sequences and replaying them over the contracts. It keeps generating longer and new sequences from old ones, such as by splitting them up at random points or changing the parameters in the method calls.

How do we know that these generations of random sequences are covering enough of the code to eventually find a bug? Echidna uses coverage-guided fuzzing—that is, it keeps track of how much code is actually executed from the smart contract and prioritizes sequences that reach more code in order to create new ones. Once it finds a transaction sequence that violates our desired property, Echidna then proceeds to shrink it to try to minimize it. Echidna then dumps all the information into a file for further inspection.

Overview of profiling

The Glasgow Haskell Compiler (GHC) provides various tools and flags that programmers can use to understand performance at various levels of granularity. Here are two:

  • Compiling with profiling: This modifies the compilation process to add a profiling system that adds costs to cost centers. Costs are annotations around expressions that completely measure the computational behavior of those expressions. Usually, we are interested in top-level declarations, essentially functions and values that are exported from a module.
  • Collecting runtime statistics: Adding +RTS -s to a profiled Haskell program makes it show runtime statistics. It’s more coarse than profiling, showing only aggregate statistics about the program, such as total bytes allocated in the heap or bytes copied during garbage collection. After enabling profiling, one can also use the -hT option, which breaks down the heap usage by closure type.

Both of these options can produce human- and machine-readable output for further inspection. For instance, when we compile a program with profiling, we can output JSON that can be displayed in a flamegraph viewer like speedscope. This makes it easy to browse around the data and zoom in to relevant time slices. For runtime statistics, we can use eventlog2html to visualize the heap profile.

Looking at the flamegraph below and others like it led me to conclude that at least from an initial survey, Echidna wasn’t terribly bloated in terms of its memory usage. Indeed, various changes over time have targeted performance directly. (In fact, a Trail of Bits wintern from 2022 found performance issues with its coverage, which were then fixed.) However, notice the large blue regions? That’s hevm, which Echidna uses to evaluate the candidate sequences. Given that Echidna spends the vast majority of its fuzzing time on this task, it makes sense that hevm would take up a lot of computational power. That’s when I turned my attention to looking into performance issues with hevm.

The time use of functions and call stacks in Echidna

Profilers can sometimes be misleading

Profiling is useful, and it helped me find a bug in hevm whose fix led to improved performance in Echidna (which we get to in the next section), but you should also know that it can be misleading.

For example, while profiling hevm, I noticed something unusual. Various optics-related operators (getters and setters) were dominating CPU time and allocations. How could this be? The reason was that the optics library was not properly inlining some of its operators. As a result, if you run this code with profiling enabled, you would see that the % operator takes up the vast majority of allocations and time instead of the increment function, which is actually doing the computation. This isn’t observed when running an optimized binary though, since GHC must have decided to inline the operator anyway. I wrote up this issue in detail and it helped the optics library developers close an issue that was opened last year! This little aside made me realize that I should compile programs with and without profiling enabled going forward to ensure that profiling stays faithful to real-world usage.

Finding my first huge memory leak in hevm

Consider the following program. It repeatedly hashes a number, starting with 0, and writes the hashes somewhere in memory (up to address m). It does this n times.

contract A {
  mapping (uint256 => uint256) public map;
  function myFunction(uint256 n, uint256 m) public {
    uint256 h = 0;
    for (uint i = 0; i 

What should we expect the program to do as we vary the value of n and m? If we hold m fixed and continue increasing the value of n, the memory block up to m should be completely filled. So we should expect that no more memory would be used. This is visualized below:

Holding m fixed and increasing n should eventually fill up m.

Surprisingly, this is not what I observed. The memory used by hevm went up linearly as a function of n and m. So, for some reason, hevm continued to allocate memory even though it should have been reusing it. In fact, this program used so much memory that it could use hundreds of gigabytes of RAM. I wrote up the issue here.

A graph showing allocations growing rapidly

I figured that if this memory issue affects hevm, it would surely affect Echidna as well.

Don't just measure once, measure N times!

Profiling gives you data about time and space for a single run, but that isn't enough to understand what happens as the program runs longer. For example, if you profiled Python’s insertionSort function on arrays with lengths of less than length 20, you might conclude that it's faster than quickSort when asymptotically we know that's not the case.

Similarly, I had some intuition about how "expensive" (from hevm's viewpoint) different Ethereum programs would be, but I didn’t know for sure until I measured the performance of smart contracts running on the EVM. Here's a brief overview of what smart contracts can do and how they interact with the EVM.

  • The EVM consists of a stack, memory, and storage. The stack is limited to 1024 items. The memory and storage are all initialized to 0 and are indexed by an unsigned 256-bit integer.
  • Memory is transient and its lifetime is limited to the scope of a transaction, whereas storage persists across transactions.
  • Contracts can allocate memory in either memory or storage. While writing to storage (persistent blockchain data) is significantly more expensive gas-wise than memory (transient memory per transaction), when we're running a local node we shouldn't expect any performance differences between the two storage types.

I wrote up eight simple smart contracts that would stress these various components. The underlying commonality between all of them is that they were parameterized with a number (n) and are expected to have a linear runtime with respect to that number. Any nonlinear runtime changes would thus indicate outliers. These are the contracts and what they do:

  • simple_loop: Looping and adding numbers
  • primes: Calculation and storage of prime numbers
  • hashes: Repeated hashing
  • hashmem: Repeated hashing and storage
  • balanceTransfer: Repeated transferring of 1 wei to an address
  • funcCall: Repeated function calls
  • contractCreation: Repeated contract creations
  • contractCreationMem: Repeated contract creations and memory

You can find their full source code in this file.

I profiled these contracts to collect information on how they perform with a wide range of n values. I increased n by powers of 2 so that the effects would be more noticeable early on. Here's what I saw:

I immediately noticed that something was definitely going on with the hashes and hashmem test cases. If the contracts’ runtimes increased linearly with increases to n, the hashes and hashmem lines wouldn't have crossed the others. How might we try to prove that? Since we know that each point should increase by roughly double (ignoring a constant term), we can simply plot the ratios of the runtimes from one point to the next and draw a line indicating what we should expect.

Bingo. hashes and hashmem were clearly off the baseline. I then directed my efforts toward profiling those specific examples and looking at any code that they depend on. After additional profiling, it seemed that repeatedly splicing and resplicing immutable bytearrays (to simulate how writes would work in a contract) caused the bytearray-related memory type to explode in size. In essence, hevm was not properly discarding the old versions of the memory.

ST to the rescue!

The fix was conceptually simple and, fortunately, had already been proposed months previously by my manager, Artur Cygan. First, we changed how hevm handles the state in EVM computations:

- type EVM a = State VM a
+ type EVM s a = StateT (VM s) (ST s) a

Then, we went through all the places where hevm deals with EVM memory and implemented a mutable vector that can be modified in place(!) How does this work? In Haskell, computations that manipulate a notion of state are encapsulated in a State monad, but there are no guarantees that only a single memory copy of that state will be there during program execution. Using the ST monad instead allowed us to ensure that the internal state used by the computation is inaccessible to the rest of the program. That way, hevm can get away with destructively updating the state while still treating the program as purely functional.

Here’s what the graphs look like after the PR. The slowdown in the last test case is now around 3 instead of 5.5, and in terms of actual runtime, the linearity is much more apparent. Nice!

Epilogue: Concrete or symbolic?

In the last few weeks of my associate program, I ran more detailed profilings with provenance information. Now we truly get x-ray vision into exactly where memory is being allocated in the program:

A detailed heap profile showing which data constructors use the most memory

What’s with all the Prop terms being generated? hevm has support for symbolic execution, which allows for various forms of static analysis. However, Echidna only ever uses the fully concrete execution. As a result, we never touch the constraints that hevm is generating. This is left for future work, which will hopefully lead to a solution in which hevm can support a more optimized concrete-only mode without compromising on its symbolic aspects.

Final thoughts

In a software project like Echidna, whose effectiveness is proportional to how quickly it can perform its fuzzing, we’re always looking for ways to make it faster without making the code needlessly complex. Doing performance engineering in a setting like Haskell reveals some interesting problems and definitely requires one to be ready to drop down and reason about the behavior of the compilation process and language semantics. It is an art as old as computer science itself.

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

— Donald Knuth

LABScon23 Replay | macOS Components Used in North Korean Crypto-Heists

By: LABScon
8 May 2024 at 10:00

In this unique talk, Proofpoint’s Greg Lesnewich takes us on a tour of recent North Korean APTs targeting macOS devices and offers researchers new techniques for hunting this increasingly active cluster through similarity analysis of Mach-O binaries and linked dynamic libraries.

While many state-aligned threats have dipped their toes into macOS Malware, North Korea has invested serious time and effort into compromising Apple’s desktop operating system. Its operations in macOS environments include both espionage and financial gain. macOS malware analysis is an exciting space, but most blogs on the subject deal with functionality and capability, rather than how to find more similar samples. Analysts are forced to rely on string searching, based on disassembler output or a strings dump; in contrast, executables for Windows have “easy” pivots such as import hashing or rich headers that help analysts to find additional samples without much effort.

This talk introduces some of those easy pivots for Mach-O files, using North Korean samples as an initial case study; along the way, Greg takes us on a tour of the North Korean clusters using Mach-O samples, how those clusters intersect, how their families relate to one another, and shows how some simple pivots can link a group’s families together.

About the Presenter

Greg Lesnewich is senior threat researcher at Proofpoint, working on tracking malicious activity linked to the DPRK (North Korea). Greg has a background in threat intelligence, incident response, and managed detection, and previously built a threat intelligence program for a Fortune 50 financial organization.

About LABScon 2023

This presentation was featured live at LABScon 2023, an immersive 3-day conference bringing together the world’s top cybersecurity minds, hosted by SentinelOne’s research arm, SentinelLabs.

Keep up with all the latest on LABScon 2024 here.

Yesterday — 7 May 2024Vulnerabily Research

Ghidra nanoMIPS ISA module

7 May 2024 at 17:11

Introduction

In late 2023 and early 2024, the NCC Group Hardware and Embedded Systems practice undertook an engagement to reverse engineer baseband firmware on several smartphones. This included MediaTek 5G baseband firmware based on the nanoMIPS architecture. While we were aware of some nanoMIPS modules for Ghidra having been developed in private, there was no publicly available reliable option for us to use at the time, which led us to develop our own nanoMIPS disassembler and decompiler module for Ghidra.

In the interest of time, we focused on implementing the features and instructions that we encountered on actual baseband firmware, and left complex P-Code instruction emulation unimplemented where it was not yet needed. Though the module is a work in progress, it still decompiles the majority of the baseband firmware we’ve analyzed. Combined with debug symbol information included with some MediaTek firmware, it has been very helpful in the reverse engineering process.

Here we will demonstrate how to load a MediaTek baseband firmware into Ghidra for analysis with our nanoMIPS ISA module.

Target firmware

For an example firmware to analyze, we looked up phones likely to include a MediaTek SoC with 5G support. Some relatively recent Motorola models were good candidates. (These devices were not part of our client engagement.)

We found many Android firmware images on https://mirrors.lolinet.com/firmware/lenomola/, including an image for the Motorola Moto Edge 2022, codename Tesla: https://mirrors.lolinet.com/firmware/lenomola/tesla/official/. This model is based on a MediaTek Dimensity 1050 (MT6879) SoC.

There are some carrier-specific variations of the firmware. We’ll randomly choose XT2205-1_TESLA_TMO_12_S2ST32.71-118-4-2-6_subsidy-TMO_UNI_RSU_QCOM_regulatory-DEFAULT_cid50_R1_CFC.xml.zip.

Extracting nanoMIPS firmware

The actual nanoMIPS firmware is in the md1img.img file from the Zip package.

To extract the content of the md1img file we also wrote some Kaitai structure definitions with simple Python wrapper scripts to run the structure parsing and output different sections to individual files. The ksy Kaitai definitions can also be used to interactively explore these files with the Kaitai IDE.

Running md1_extract.py with an --outdir option will extract the files contained within md1img.img:

$ ./md1_extract.py ../XT2205-1_TESLA_TMO_12_S2STS32.71-118-4-2-6-3_subsidy-TMO_UNI_RSU_QCOM_regulatory-DEFAULT_cid50_CFC/md1img.img --outdir ./md1img_out/
extracting files to: ./md1img_out
md1rom: addr=0x00000000, size=43084864
        extracted to 000_md1rom
cert1md: addr=0x12345678, size=1781
        extracted to 001_cert1md
cert2: addr=0x12345678, size=988
        extracted to 002_cert2
md1drdi: addr=0x00000000, size=12289536
        extracted to 003_md1drdi
cert1md: addr=0x12345678, size=1781
        extracted to 004_cert1md
cert2: addr=0x12345678, size=988
        extracted to 005_cert2
md1dsp: addr=0x00000000, size=6776460
        extracted to 006_md1dsp
cert1md: addr=0x12345678, size=1781
        extracted to 007_cert1md
cert2: addr=0x12345678, size=988
        extracted to 008_cert2
md1_filter: addr=0xffffffff, size=300
        extracted to 009_md1_filter
md1_filter_PLS_PS_ONLY: addr=0xffffffff, size=300
        extracted to 010_md1_filter_PLS_PS_ONLY
md1_filter_1_Moderate: addr=0xffffffff, size=300
        extracted to 011_md1_filter_1_Moderate
md1_filter_2_Standard: addr=0xffffffff, size=300
        extracted to 012_md1_filter_2_Standard
md1_filter_3_Slim: addr=0xffffffff, size=300
        extracted to 013_md1_filter_3_Slim
md1_filter_4_UltraSlim: addr=0xffffffff, size=300
        extracted to 014_md1_filter_4_UltraSlim
md1_filter_LowPowerMonitor: addr=0xffffffff, size=300
        extracted to 015_md1_filter_LowPowerMonitor
md1_emfilter: addr=0xffffffff, size=2252
        extracted to 016_md1_emfilter
md1_dbginfodsp: addr=0xffffffff, size=1635062
        extracted to 017_md1_dbginfodsp
md1_dbginfo: addr=0xffffffff, size=1332720
        extracted to 018_md1_dbginfo
md1_mddbmeta: addr=0xffffffff, size=899538
        extracted to 019_md1_mddbmeta
md1_mddbmetaodb: addr=0xffffffff, size=562654
        extracted to 020_md1_mddbmetaodb
md1_mddb: addr=0xffffffff, size=12280622
        extracted to 021_md1_mddb
md1_mdmlayout: addr=0xffffffff, size=8341403
        extracted to 022_md1_mdmlayout
md1_file_map: addr=0xffffffff, size=889
        extracted to 023_md1_file_map

The most relevant files are:

  • md1rom is the nanoMIPS firmware image
  • md1_file_map provides slightly more context on the md1_dbginfo file: its original filename is DbgInfo_NR16.R2.MT6879.TC2.PR1.SP_LENOVO_S0MP1_K6879V1_64_MT6879_NR16_TC2_PR1_SP_V17_P38_03_24_03R_2023_05_19_22_31.xz
  • md1_dbginfo is an XZ compressed binary file containing debug information for md1rom, including symbols

Extracting debug symbols

md1_dbginfo is another binary file format containing symbols and filenames with associated addresses. We’ll rename it and decompress it based on the filename from md1_file_map:

$ cp 018_md1_dbginfo DbgInfo_NR16.R2.MT6879.TC2.PR1.SP_LENOVO_S0MP1_K6879V1_64_MT6879_NR16_TC2_PR1_SP_V17_P38_03_24_03R_2023_05_19_22_31.xz
$ unxz DbgInfo_NR16.R2.MT6879.TC2.PR1.SP_LENOVO_S0MP1_K6879V1_64_MT6879_NR16_TC2_PR1_SP_V17_P38_03_24_03R_2023_05_19_22_31.xz
$ hexdump DbgInfo_NR16.R2.MT6879.TC2.PR1.SP_LENOVO_S0MP1_K6879V1_64_MT6879_NR16_TC2_PR1_SP_V17_P38_03_24_03R_2023_05_19_22_31 | head
00000000  43 41 54 49 43 54 4e 52  01 00 00 00 98 34 56 00  |CATICTNR.....4V.|
00000010  43 41 54 49 01 00 00 00  00 00 00 00 4e 52 31 36  |CATI........NR16|
00000020  2e 52 32 2e 4d 54 36 38  37 39 2e 54 43 32 2e 50  |.R2.MT6879.TC2.P|
00000030  52 31 2e 53 50 00 4d 54  36 38 37 39 5f 53 30 30  |R1.SP.MT6879_S00|
00000040  00 4d 54 36 38 37 39 5f  4e 52 31 36 2e 54 43 32  |.MT6879_NR16.TC2|
00000050  2e 50 52 31 2e 53 50 2e  56 31 37 2e 50 33 38 2e  |.PR1.SP.V17.P38.|
00000060  30 33 2e 32 34 2e 30 33  52 00 32 30 32 33 2f 30  |03.24.03R.2023/0|
00000070  35 2f 31 39 20 32 32 3a  33 31 00 73 00 00 00 2b  |5/19 22:31.s...+|
00000080  ed 53 00 49 4e 54 5f 56  65 63 74 6f 72 73 00 4c  |.S.INT_Vectors.L|
00000090  08 00 00 54 08 00 00 62  72 6f 6d 5f 65 78 74 5f  |...T...brom_ext_|

To extract information from the debug info file, we made another Kaitai definition and wrapper script that extracts symbols and outputs them in a text format compatible with Ghidra’s ImportSymbolsScript.py script:

$ ./mtk_dbg_extract.py md1img_out/DbgInfo_NR16.R2.MT6879.TC2.PR1.SP_LENOVO_S0MP1_K6879V1_64_MT6879_NR16_TC2_PR1_SP_V17_P38_03_24_03R_2023_05_19_22_31 | tee dbg_symbols.txt
INT_Vectors 0x0000084c l
brom_ext_main 0x00000860 l
INT_SetPLL_Gen98 0x00000866 l
PLL_Set_CLK_To_26M 0x000009a2 l
PLL_MD_Pll_Init 0x000009da l
INT_SetPLL 0x000009dc l
INT_Initialize_Phase1 0x027b5c80 l
INT_Initialize_Phase2 0x027b617c l
init_cm 0x027b6384 l
init_cm_wt 0x027b641e l
...

(Currently the script is set to only output label definitions rather than function definitions, as it was unknown if all of the symbols were for functions.)

Loading nanoMIPS firmware into Ghidra

Install the extension

First, we’ll have to install the nanoMIPS module for Ghidra. In the main Ghidra window, go to “File > Install Extensions”, click the “Add Extension” plus button, and select the module Zip file (e.g., ghidra_11.0.3_PUBLIC_20240424_nanomips.zip). Then restart Ghidra.

Initial loading

Load md1rom as a raw binary image. Select 000_md1rom from the md1img.img extract directory and keep “Raw Binary” as the format. For Language, click the “Browse” ellipsis and find the little endian 32-bit nanoMIPS option (nanomips:LE:32:default) using the filter, then click OK.

We’ll load the image at offset 0 so no further options are necessary. Click OK again to load the raw binary.

When Ghidra asks if you want to do an initial auto-analysis, select No. We have to set up a mirrored memory address space at 0x90000000 first.

Memory mapping

Open the “Memory Map” window and click plus for “Add Memory Block”.

We’ll name the new block “mirror”, set the starting address to ram:90000000, the length to match the length of the base image “ram” block (0x2916c40), permissions to read and execute, and the “Block Type” to “Byte Mapped” with a source address of 0 and mapping ratio of 1:1.

Also change the permissions for the original “ram” block to just read and execute. Save the memory map changes and close the “Memory Map” window.

Note that this memory map is incomplete; it’s just the minimal setup required to get disassembly working.

Debug symbols

Next, we’ll load up the debug symbols. Open the Script Manager window and search for ImportSymbolsScript.py. Run the script and select the text file generated by mtk_dbg_extract.py earlier (dbg_symbols.txt). This will create a bunch of labels, most of them in the mirrored address space.

Disassembly

Now we can begin disassembly. There is a jump instruction at address 0 that will get us started, so just select the byte at address 0 and press “d” or right-click and choose “Disassemble”. Thanks to the debug symbols, you may notice this instruction jumps to the INT_Initialize_Phase1 function.

Flow-based disassembly will now start to discover a bunch of code. The initial disassembly can take several minutes to complete.

Then we can run the normal auto-analysis with “Analysis > Auto Analyze…”. This should also discover more code and spend several minutes in disassembly and decompilation. We’ve found that the “Non-Returning Functions” analyzer creates many false positives with the default configuration in these firmware images, which disrupts the code flow, so we recommend disabling it for initial analysis.

The one-shot “Decompiler Parameter ID” analyzer is a good option to run next for better detection of function input types.

Conclusion

Although the module is still a work in progress, the results are already quite useable for analysis and allowed to us to reverse engineer some critical features in baseband processors.

The nanoMIPS Ghidra module and MediaTek binary file unpackers can be found on our GitHub at:

❌
❌